Presigned URL support for S3 log files#3324
Presigned URL support for S3 log files#3324rasmusfaber wants to merge 17 commits intoUKGovernmentBEIS:mainfrom
Conversation
00fb032 to
1d8c4d0
Compare
ceb958a to
19e2c66
Compare
8599bf8 to
bda4b12
Compare
c93d201 to
97a388c
Compare
3734d92 to
8039504
Compare
| return !!loading || sampleStatus === "loading"; | ||
| } else if (showActivity === "log") { | ||
| return !!loading; | ||
| return !!loading || sampleStatus === "loading"; |
There was a problem hiding this comment.
This seems odd (specifically showing 'log' activity when samples are loading) - was this truly needed?
There was a problem hiding this comment.
The hasActivity flag gates the ActivityBar rendering entirely. If it's false, neither the determinate progress bar nor the indeterminate spinner renders, even if sampleProgress has a valid value.
When a user is in the log view (showActivity="log") and clicks a sample to load, loading is already false (the log finished loading), so only sampleStatus === "loading" keeps the navbar progress bar visible during the download. Removing it would hide the progress bar for sample downloads in the log view.
There's also an edge case: when a log has exactly 1 sample, it's displayed inline via InlineSampleDisplay (which has its own ActivityBar), but the navbar indicator would still disappear.
But perhaps there is something that needs renaming or a comment here?
There was a problem hiding this comment.
The original model would've been that the 'header' region of the sample would show immediately, since it is renderable using only a sample summary which should be present before the sample is loaded. This view as a sample responsive activity bar between the header region and the sample contents - the intent is that this conveys the loading of the sample.
This seems to be working as intended for inline samples, but non-inline sample display appears to be not behaving this way... I'm looking into this separately....
There was a problem hiding this comment.
I've fixed this issue here:
#3444
The behavior changed when I switched us to using the SampleDetailView (and it was incorrect). This correctly shows the sample header immediately and shows progress beneath that header when samples load.
There was a problem hiding this comment.
Thanks, that is cleaner.
I have removed the changes to ApplicationNavBar.tsx.
345fc08 to
f717fe3
Compare
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Replace get_log_size with get_log_info across view-server, static-http, and vscode API clients. The new LogInfo type includes an optional direct_url field for S3 presigned URLs. Co-Authored-By: Claude Opus 4.6 <[email protected]>
When log-info returns a direct_url (S3 presigned URL), byte-range requests go directly to S3. Falls back to server proxy on failure. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Replaced by log-info which returns size + optional direct_url. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
S3 buckets with SSE-KMS require SigV4. Add config_kwargs with signature_version s3v4 to the default S3 filesystem options. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Eliminate the 30-byte header pre-fetch in readFile by estimating the total fetch size from the central directory's filenameLength + 256 bytes of extra field padding, then fetching once. Re-fetches only if the actual extra field exceeds the estimate (rare). Also cache the opened ZIP in get_log_details so subsequent get_log_sample calls reuse it. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add asyncJsonParseBytes() that transfers raw Uint8Array directly to the Web Worker pool, eliminating ~800ms of main-thread string allocation for large eval log files (~20MB). Co-Authored-By: Claude Opus 4.6 <[email protected]>
Thread onProgress callback from parallel chunk fetcher through the remote ZIP/log file readers to the UI, showing download progress for samples over 20MB instead of only an indeterminate animation. Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Wrap presigned URL generation in try/except so failures degrade gracefully instead of crashing the /log-info/ endpoint - Move ProgressCallback type to client/api/types.ts to fix inverted dependency (api layer was importing from remote/remoteZipFile) - Move fetchRange to utils/http.ts canonical location, re-export from remoteZipFile for backward compat - Simplify fetchBytes wrapper in remoteLogFile and add console.warn on presigned URL fallback for CORS debugging - Deduplicate JSON/JSON5 parse fallback in asyncJsonParseBytes - Fix progress double-counting on ZIP re-fetch path - Guard against division by zero in progress display - Add truncated ZIP header length check - Add VSCode backward compat: fall back to eval_log_size if extension doesn't support eval_log_info yet Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Lower parallel chunk threshold from 20MB to 8MB and increase max concurrent chunks from 6 to 10. Benchmarking across 20-200MB files showed 8MB chunks with 10 concurrency achieves ~4x speedup over single requests for 200MB files (cross-region). Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Restore /api/log-size endpoint for VS Code extension compatibility - Rename DownloadProgress to Progress with complete/total fields - Pass content length directly to openRemoteZipFile instead of wrapper fn - Add clarifying comment for fsspec async _url convention Co-Authored-By: Claude Opus 4.6 <[email protected]>
f717fe3 to
2369c8e
Compare
Sample download progress is handled by InlineSampleDisplay's own ActivityBar, so the navbar bar should only show log-level loading. Co-Authored-By: Claude Opus 4.6 <[email protected]>
4132bb4 to
ddce505
Compare
This PR contains:
What is the current behavior? (You can also link to an open issue here)
Large S3 hosted
.eval-files can be very slow to display, and there is no progress bar.What is the new behavior?
Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
The/log-size/API endpoint is replaced by/log-info/with a different response shape (JSON object instead of bare integer). The VSCode extension JSON-RPC method is renamed fromeval_log_sizetoeval_log_info, but the frontend includes a backward-compatible fallback.Other information: