Skip to content

perf: batched validate/map#3329

Open
rasmusfaber wants to merge 1 commit intoUKGovernmentBEIS:mainfrom
rasmusfaber:faber/s3-perf-improvements
Open

perf: batched validate/map#3329
rasmusfaber wants to merge 1 commit intoUKGovernmentBEIS:mainfrom
rasmusfaber:faber/s3-perf-improvements

Conversation

@rasmusfaber
Copy link
Contributor

@rasmusfaber rasmusfaber commented Feb 25, 2026

This PR contains:

  • New features
  • Changes to dev-tools e.g. CI config / github tooling
  • Docs
  • Bug fixes
  • Code refactor

What is the current behavior? (You can also link to an open issue here)

The log-headers FastAPI endpoint validates all files first in a task group, then maps them sequentially with await in a list comprehension.

What is the new behavior?

Batches _validate_read and _map_file into a single concurrent task group, so validation and mapping happen together per file rather than in two separate passes.

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

No.

Other information:

Extracted from #3213.

@rasmusfaber rasmusfaber marked this pull request as ready for review February 25, 2026 15:52
@rasmusfaber rasmusfaber force-pushed the faber/s3-perf-improvements branch 2 times, most recently from d65cec8 to e32a9a3 Compare February 27, 2026 09:18
@rasmusfaber rasmusfaber changed the title perf: S3 readahead caching and batched validate/map perf: batched validate/map Feb 27, 2026
Batch _validate_read and _map_file calls concurrently in the
log-headers endpoint instead of validating then mapping sequentially.

The readahead caching for S3 ZipFile reads was dropped as
read_log_sample now uses AsyncZipReader instead of synchronous ZipFile.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@rasmusfaber rasmusfaber force-pushed the faber/s3-perf-improvements branch from e32a9a3 to 36a13bb Compare February 27, 2026 09:26
@rasmusfaber
Copy link
Contributor Author

#3189 introduced a better fix for the zip file readahead with the AsyncZipReader, so this now only includes the parallel validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant