feat: Add progress logging during eval_set resume scanning#3339
feat: Add progress logging during eval_set resume scanning#3339QuantumLove wants to merge 1 commit intoUKGovernmentBEIS:mainfrom
Conversation
|
f9f7349 to
5213d03
Compare
When eval_set() resumes and scans S3 for existing logs, this can take 0-90 minutes for large eval sets with no visibility. This adds progress logging using the existing ReadEvalLogsProgress infrastructure. Changes: - Add EvalSetScanProgress class implementing ReadEvalLogsProgress - Wire progress logging into try_eval() function - Log at reasonable intervals (every 100 files or 10%, whichever is larger) - Add @OverRide decorators for type safety (consistent with codebase) - Add comprehensive unit tests Log output example: INFO: Found 1,234 eval log files in s3://..., reading headers... INFO: Reading eval logs: 100/1234 ... INFO: Resume scan complete: 1,234 logs read in 45.3s (1,100 completed, 134 pending) Co-Authored-By: Claude <[email protected]>
5213d03 to
c91877f
Compare
|
There are two things that are potentially slow.
The progress here is only for step 1 - @QuantumLove are you still seeing that part be slow enough that log progress is needed? In my testing it was down to ~12s for a 600 log data set. The output above shows 45s for 1200 log files. If you are still seeing this take many minutes I'd be interested in understanding the scenario more. I would expect #2 to still be the slower part of the process, but maybe doesn't happen as much that we reuse partial logs. If the evalset setup is still slow enough that we generally need progress, I think we should probably update the evalset textual UI to show it. |
|
Hey @jjallaire and @ransomr, thank you for the quick reply amd happy to hear about the performance enhancements. We will upgrade to the latest version and report back what we see. Stay tuned! |
|
Note that eval set actually has a phase where there is no textual UI (which only occurs during actual task execution) so it would be completely fine to do this progress at the terminal if we do in fact need it. |
Summary
When
eval_set()resumes and scans S3 for existing logs, this can take 0-90 minutes for large eval sets. Currently there's no visibility into this process.This PR adds progress logging using the existing
ReadEvalLogsProgressinfrastructure that already exists but wasn't wired intoeval_set().Changes
EvalSetScanProgressclass that implementsReadEvalLogsProgresstry_eval()function ineval_set()Log Output Example
Testing
EvalSetScanProgressclass (7 tests)test_eval_set.pytests pass🤖 Generated with Claude Code