Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multiround benchmarking & enhance LLMService interfaces #54

Merged
merged 37 commits into from
Jan 29, 2025
Merged
Changes from 1 commit
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
e81cce1
Implement multiround benchmarking base
GlebSolovev Jan 13, 2025
420563e
Redesign & support comprehensive benchmarking results interfaces
GlebSolovev Jan 13, 2025
e4738c8
Typify theorem rankers & support all implemented
GlebSolovev Jan 13, 2025
d282524
Implement proper JSON serialization for `BenchmarkedItem`
GlebSolovev Jan 13, 2025
e8cfa45
Fix JSON serialization of results tree
GlebSolovev Jan 13, 2025
7811802
Save interim multiround benchmarking results
GlebSolovev Jan 14, 2025
eb832d7
Refactor JSON printers
GlebSolovev Jan 14, 2025
aca01b3
Improve benchmarking item logging
GlebSolovev Jan 14, 2025
9e3b60b
Improve `ValidatedProof` typing
GlebSolovev Jan 14, 2025
2b0160b
Handle nullability in multiround benchmarking
GlebSolovev Jan 14, 2025
aef3cef
Introduce & use throw-error wrappers in benchmarking framework
GlebSolovev Jan 14, 2025
b9aa874
Fix errors after refactor
GlebSolovev Jan 14, 2025
04dbc60
Document & fix round number
GlebSolovev Jan 14, 2025
848a80f
Document `executeBenchmarkingTask` properly
GlebSolovev Jan 14, 2025
d480616
Log multiround benchmarking properly
GlebSolovev Jan 14, 2025
9603cd2
Extend root benchmarking result serialization
GlebSolovev Jan 14, 2025
e6f3fa5
Handle duplicate generated proofs carefully
GlebSolovev Jan 14, 2025
3a89be3
Simplify `LLMService` interface, modes are reworked
GlebSolovev Jan 15, 2025
94d9c0f
Design & support metadata object for `LLMService` calls
GlebSolovev Jan 15, 2025
0e8bd03
Generalize `GeneratedProof` to store generation metadata
GlebSolovev Jan 16, 2025
bdcd6cf
Refactor `GeneratedProof` with getters
GlebSolovev Jan 16, 2025
8b64459
Updated benchmarks with new `LLMService` interface
GlebSolovev Jan 16, 2025
e203800
Fix `OpenAiService` api key error repacking
GlebSolovev Jan 16, 2025
83ec560
Fix declaration of custom `LLMServiceError`-s
GlebSolovev Jan 16, 2025
125881d
Improve LLM iterator typing
GlebSolovev Jan 16, 2025
82cb332
Minor `LLMService` defaults update, mark TODOs
GlebSolovev Jan 16, 2025
c6140db
Rewrite tests according to new `LLMService` interface
GlebSolovev Jan 16, 2025
3a82ef4
Add `plus_comm` test example
GlebSolovev Jan 16, 2025
653ea6a
Fix round number bug in benchmarks
GlebSolovev Jan 16, 2025
4ec4d45
Make logging color defined, support "default"
GlebSolovev Jan 16, 2025
d740ee9
Improve multiround benchmarking logs
GlebSolovev Jan 16, 2025
80f944f
Unify color logging in project scope
GlebSolovev Jan 16, 2025
1e8a67b
Make LLM service identifier human-readable
GlebSolovev Jan 16, 2025
2406462
Improve & document error handling in benchmarks
GlebSolovev Jan 16, 2025
ea8c94c
Fix & prettify error handling in benchmarks
GlebSolovev Jan 16, 2025
f8d821f
Fix multiround benchmarking bug: successful completion did not stop e…
GlebSolovev Jan 16, 2025
c62d09b
Make multiround benchmarking logs clearer
GlebSolovev Jan 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Minor LLMService defaults update, mark TODOs
GlebSolovev committed Jan 17, 2025
commit 82cb3328a75dc6f3dabb46aa650ff15e92ff22d6
Original file line number Diff line number Diff line change
@@ -33,6 +33,7 @@ interface FailureMetadata {
llmServiceError: LLMServiceError;
}

// TODO: document, especially its invariant to be used only once
export class ProofGenerationMetadataHolder {
private _configuration: ConfigurationMetadata | undefined = undefined;
private _success: SuccessMetadata | undefined = undefined;
@@ -121,10 +122,11 @@ export class ProofGenerationMetadataHolder {
currentValue: any | undefined,
propertyName: string
) {
if (currentValue === undefined) {
if (currentValue !== undefined) {
illegalState(
`\`ProofGenerationMetadata\` should be updated with ${propertyName} `,
"only once by the `LLMService` internals"
`\`ProofGenerationMetadata\` is updated with ${propertyName} more than once;\n`,
"Possible reasone: the same `ProofGenerationMetadata` should not be used ",
"more than for one proof generation"
);
}
}
2 changes: 1 addition & 1 deletion src/llm/llmServices/llmService.ts
Original file line number Diff line number Diff line change
@@ -128,7 +128,7 @@ export abstract class LLMServiceImpl<
* @param generationLogsFilePath if it is not specified, a temporary file will be used.
*/
constructor(
eventLogger: EventLogger | undefined,
eventLogger: EventLogger | undefined = undefined,
errorsHandlingMode: ErrorsHandlingMode = ErrorsHandlingMode.RETHROW_ERRORS,
generationLogsFilePath: string | undefined = undefined,
debugLogs: boolean = false
Original file line number Diff line number Diff line change
@@ -21,6 +21,7 @@ export class ParsingError extends Error {
}
}

// TODO: support `proofGenerationType` if ever needed
export class LoggerRecord {
/**
* Even though this value is in millis, effectively it represents seconds.