Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multiround benchmarking & enhance LLMService interfaces #54

Merged
merged 37 commits into from
Jan 29, 2025

Conversation

GlebSolovev
Copy link
Collaborator

Core Changes ⚡️

  • Multiround Benchmarking Support 🥊

    Now it is possible to benchmark multiround proof generation, as is our tradition, with comprehensive error handling and logging.

  • LLMService Interface Rework 🕊

    • Simplified error-handling invariants without reducing functionality.
    • Extended proof-generation methods with the ProofGenerationMetadataHolder object, enabling the collection of additional metadata.
    • Improved LLMService constructors for cleaner implementations.
    • Updated GeneratedProof to store raw proof metadata (e.g., statistics returned by the LLMService generation method), facilitating detailed proof analysis—especially useful for benchmarks.
    • Enhanced getter methods for better usability.

Changes to the llm module are thoroughly covered by tests, while the new benchmarking features have been extensively tested manually.


Additional Value 🌟

  • Enhanced benchmarking result interfaces: redesigned with improved typing for safer and more convenient usage.

  • Custom errors: introduced specific error types (IllegalStateError, InvariantFailedError, BenchmarkingError) with wrappers, replacing generic throw Error cases. Refactored the benchmarking module accordingly and ensured custom error declarations are reliable.

  • Stricter and clearer error handling: strengthened and documented error handling in the benchmarking framework.

  • Utility code improvements: got rid of duplicated utilities, introduced better typings (e.g., for theorem rankers, colorization, LLM iterator).

  • Bug Fixes

    • Resolved unawaited promises in affected tests.
    • Fixed error repacking in the OpenAIService.

Also fix custom error classes declarations
Note: it made possible to correctly extract proof generation metadata
Also: improve tests' code, fix `expect(...).toBeRejected()` not being awaited bug
@GlebSolovev GlebSolovev self-assigned this Jan 17, 2025
@GlebSolovev GlebSolovev changed the base branch from main to v2.5.0-dev January 17, 2025 04:13
@GlebSolovev GlebSolovev merged commit 2556ab3 into v2.5.0-dev Jan 29, 2025
3 checks passed
@GlebSolovev GlebSolovev deleted the benchmarking-multiround branch January 29, 2025 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant