Skip to content

feat!(backend): file sharing - prevent overwrites of uploaded file#6381

Open
anna-parker wants to merge 8 commits into
mainfrom
file_sharing_nooverwrite
Open

feat!(backend): file sharing - prevent overwrites of uploaded file#6381
anna-parker wants to merge 8 commits into
mainfrom
file_sharing_nooverwrite

Conversation

@anna-parker

@anna-parker anna-parker commented May 8, 2026

Copy link
Copy Markdown
Contributor

resolves #4056

The backend now adds "If-None-Match": "*" as a header when requesting presigned URLs on behalf of a user, this prevents writes to the S3 if the S3 already has data - preventing accidental overwrites.

Suggested by @tombch, see details in https://docs.aws.amazon.com/AmazonS3/latest/userguide/conditional-writes.html https://security.stackexchange.com/a/286617

Note that this header is not required for multi-part S3 uploads as the request complete-multipart-upload prevents future modifications of the S3 using the presigned URL.

Breaking change

Clients using presigned URLs (i.e. requested via the /files/request-upload endpoint) now need to add "If-None-Match": "*" to the header when submitting data using the presigned URL (this is because AWS and other S3 providers will block uploads to S3 buckets that do not use the same headers as in the created presigned URL.

PR Checklist

  • Document how users should handle case where upload to S3 is disconnected: confirmed that when I break off an upload with ctrl C I can submit again to the S3 bucket, only on successful submissions does the new header apply.
  • The implemented feature is covered by appropriate, automated tests.
  • Any manual testing that has been done is documented (i.e. what exactly was tested?)

🚀 Preview: https://file-sharing-nooverwrite.loculus.org

@claude claude Bot added backend related to the loculus backend component preprocessing Issues related to the preprocessing component labels May 8, 2026
@anna-parker anna-parker changed the title feat: prevent overwrites of uploaded file feat(backend): file sharing - prevent overwrites of uploaded file May 8, 2026
@anna-parker anna-parker changed the title feat(backend): file sharing - prevent overwrites of uploaded file feat!(backend): file sharing - prevent overwrites of uploaded file May 8, 2026
@anna-parker anna-parker marked this pull request as ready for review May 9, 2026 09:13
@claude

claude Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Claude finished @anna-parker's task in 2m 59s —— View job


Code Review

  • Gather context and understand the PR
  • Read changed files
  • Review code quality, security, testing, and documentation
  • Post final review

The approach is correct: signing the presigned URL with If-None-Match: * and returning the required headers to clients is exactly the right way to implement this — clients can't independently determine which headers are required since they're embedded in the signature.

The test updates are thorough and consistent: all uploadFile call sites now pass fileIdAndUrl.headers, and the new uploadFile helper on SubmissionConvenienceClient is a clean refactor. The Python FileUploadInfo dataclass correctly gains a headers field.

A few issues noted via inline comments:

preprocessing/nextclade/src/loculus_preprocessing/backend.py:219upload_embl_file_to_presigned_url hardcodes "If-None-Match": "*" rather than using upload_info.headers from request_upload(). The headers returned by the server are already available at the call site in prepro.py but are discarded. If required headers change server-side, this function will silently drift. Fix this →

RequestUploadEndpointTest.kt:88 — The test verifies a single upload succeeds with If-None-Match: *, but there's no test for the overwrite-prevention behaviour (a second upload to the same URL should return 412). Also: the header is hardcoded rather than read from the response JSON, so the test doesn't verify the end-to-end client flow as documented.

RequestUploadEndpointTest.kt — Tests asserting response structure check fileId and url but not the new headers field. Since this is a breaking change for clients, asserting headers contains {"If-None-Match": "*"} would be valuable.

FileTypes.kt:30 — Minor: the headers default is hardcoded independently of what S3Service signs into the URL. A future change to S3Service would need to update both places. Not a blocker for this PR but worth keeping in mind.

Remaining checklist items: The PR checklist still has open items (docs for disconnected-upload case, manual testing documentation). The disconnected-upload scenario (user gets presigned URL but upload never completes) is worth documenting since clients will now get 412 on retry with the same URL — they'd need to call /files/request-upload again to get a fresh URL.

Comment thread preprocessing/nextclade/src/loculus_preprocessing/backend.py
Comment thread backend/src/main/kotlin/org/loculus/backend/api/FileTypes.kt
…m response (#6386)

Addresses [@anna-parker's review on
#6381](#6381 (comment)).

## Summary
- Refactors `GIVEN a request for a URL THEN returns a valid presigned
URL` to read the required upload headers from the `headers` field on the
response JSON instead of hardcoding `If-None-Match: *`. This exercises
the same client-side flow that real callers (e.g. the preprocessing
pipeline) are expected to follow.
- Adds a new test, `GIVEN a presigned URL has been used to upload THEN a
second upload to the same URL fails`, that uses the same presigned URL
twice and asserts the second PUT is rejected with HTTP 412 — the
overwrite-prevention guarantee that motivated #6381.

## Test plan
- [x] `./gradlew test --tests
'org.loculus.backend.controller.files.RequestUploadEndpointTest'` — all
16 tests pass, including the new one (412 returned by MinIO on the
second PUT).
- [x] `./gradlew ktlintFormat` — no changes.

This PR is targeted at the `file_sharing_nooverwrite` branch so it can
land alongside #6381.

🚀 Preview: Add `preview` label to enable

Co-authored-by: theosanderson-agent <theo@theo.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@anna-parker anna-parker requested a review from maverbiest June 10, 2026 14:24
@maverbiest maverbiest added the preview Triggers a deployment to argocd label Jun 11, 2026
@maverbiest

maverbiest commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

I deployed a preview off this branch to see if the {If-None-Match: *} header prevents second write requests as expected. It seems to be working as intended!

Setup

  1. As the testuser, I made a submitting group via the web interface (which got groupId 2)
  2. I figured out the names of the keycloak pod (https://authentication-file-sharing-nooverwrite.loculus.org/) and minio pod for the preview deployment
  3. I got a token from keycloak to make requests from the command line as the testuser

Testing

# requesting a file upload as the testuser
➜  loculus git:(s3-garbage-collection) ✗ curl -X POST "https://backend-file-sharing-nooverwrite.loculus.org/files/request-upload?groupId=2&numberFiles=1" -H "Authorization: Bearer $TOKEN"
[{"fileId":"cad7567e-913f-4fbc-bd2e-a2ad44db294d","url":"https://s3-file-sharing-nooverwrite.loculus.org/loculus-preview-private/files/cad7567e-913f-4fbc-bd2e-a2ad44db294d?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20260611T122529Z&X-Amz-SignedHeaders=host%3Bif-none-match&X-Amz-Credential=8LRKJBFQ3G38BIJ9KCHS%2F20260611%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Expires=1800&X-Amz-Signature=d9c32cad6f1936de9a95580fb0deed6cf73d8853df1deedbb7ea058485235859","headers":{"If-None-Match":"*"}}]%                                               
                                
# first: attempt to upload without 'If-None-Match' header; this fails
➜  loculus git:(s3-garbage-collection) ✗ curl -X PUT --upload-file ./test_file.txt "https://s3-file-sharing-nooverwrite.loculus.org/loculus-preview-private/files/cad7567e-913f-4fbc-bd2e-a2ad44db294d?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20260611T122529Z&X-Amz-SignedHeaders=host%3Bif-none-match&X-Amz-Credential=8LRKJBFQ3G38BIJ9KCHS%2F20260611%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Expires=1800&X-Amz-Signature=d9c32cad6f1936de9a95580fb0deed6cf73d8853df1deedbb7ea058485235859"
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>There were headers present in the request which were not signed</Message><Key>files/cad7567e-913f-4fbc-bd2e-a2ad44db294d</Key><BucketName>loculus-preview-private</BucketName><Resource>/loculus-preview-private/files/cad7567e-913f-4fbc-bd2e-a2ad44db294d</Resource><RequestId>18B8067B6686D9BA</RequestId><HostId>dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8</HostId></Error>%             

# second: upload with 'If-None-Match' header; succeeds
➜  loculus git:(s3-garbage-collection) ✗ curl -X PUT --upload-file ./test_file.txt "https://s3-file-sharing-nooverwrite.loculus.org/loculus-preview-private/files/cad7567e-913f-4fbc-bd2e-a2ad44db294d?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20260611T122529Z&X-Amz-SignedHeaders=host%3Bif-none-match&X-Amz-Credential=8LRKJBFQ3G38BIJ9KCHS%2F20260611%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Expires=1800&X-Amz-Signature=d9c32cad6f1936de9a95580fb0deed6cf73d8853df1deedbb7ea058485235859" -H "If-None-Match: *"

# (make sure it succeeded)
➜  loculus git:(s3-garbage-collection) ✗ echo $?
0

# third: try another upload; rejected
➜  loculus git:(s3-garbage-collection) ✗ curl -X PUT --upload-file ./test_file.txt "https://s3-file-sharing-nooverwrite.loculus.org/loculus-preview-private/files/cad7567e-913f-4fbc-bd2e-a2ad44db294d?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20260611T122529Z&X-Amz-SignedHeaders=host%3Bif-none-match&X-Amz-Credential=8LRKJBFQ3G38BIJ9KCHS%2F20260611%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Expires=1800&X-Amz-Signature=d9c32cad6f1936de9a95580fb0deed6cf73d8853df1deedbb7ea058485235859" -H "If-None-Match: *"
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>PreconditionFailed</Code><Message>At least one of the pre-conditions you specified did not hold</Message><Key>files/cad7567e-913f-4fbc-bd2e-a2ad44db294d</Key><BucketName>loculus-preview-private</BucketName><Resource>/loculus-preview-private/files/cad7567e-913f-4fbc-bd2e-a2ad44db294d</Resource><RequestId>18B8068BCAF0A935</RequestId><HostId>dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8</HostId></Error>%                  
   
# finally: stat the file to double check it exists on S3                                                                                                                                                                                                                                                                                                                                                                                                                 
➜  loculus git:(s3-garbage-collection) ✗ kubectl exec -n prev-file-sharing-nooverwrite minio-78d9dbd5b-cq776 -- mc stat "local/loculus-preview-private/files/cad7567e-913f-4fbc-bd2e-a2ad44db294d"
Name      : cad7567e-913f-4fbc-bd2e-a2ad44db294d
Date      : 2026-06-11 12:27:12 UTC
Size      : 20 B
ETag      : 4221d002ceb5d3c9e9137e495ceaa647
Type      : file
Metadata  :
  Content-Type: binary/octet-stream

Comment thread backend/src/main/kotlin/org/loculus/backend/service/files/S3Service.kt Outdated

@maverbiest maverbiest left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! It does seem slightly brittle to have to add the headers in a couple of different places, maybe it would be nice to pass them as arguments like suggested in other comments?

@anna-parker

Copy link
Copy Markdown
Contributor Author

thanks for the thorough testing!

@theosanderson theosanderson removed the preview Triggers a deployment to argocd label Jun 12, 2026
@anna-parker anna-parker added the preview Triggers a deployment to argocd label Jun 15, 2026
@anna-parker anna-parker requested a review from maverbiest June 15, 2026 12:16
@anna-parker

Copy link
Copy Markdown
Contributor Author

Confirmed that when I break off the upload to a presigned URL and stat the file in minion it does not exist, and I can upload again, and once that is completed I can see the file in the stat:

kubectl exec -n prev-file-sharing-nooverwrite minio-55c59dbf89-8kh77 -- mc stat "local/loculus-preview-private/files/509e7474-a567-445f-aa04-0a9fe96d9e22"
mc: <ERROR> Unable to stat `local/loculus-preview-private/files/509e7474-a567-445f-aa04-0a9fe96d9e22`. Object does not exist.
command terminated with exit code 1
(base) aparker-adm@eve-501 ~ % curl -X PUT --upload-file ./Downloads/mpox_large.fasta "https://s3-file-sharing-nooverwrite.loculus.org/loculus-preview-private/files/509e7474-a567-445f-aa04-0a9fe96d9e22?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20260615T130306Z&X-Amz-SignedHeaders=host%3Bif-none-match&X-Amz-Credential=8LRKJBFQ3G38BIJ9KCHS%2F20260615%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Expires=1800&X-Amz-Signature=e4fa8bc80ab6c3faf1ff8269e67633cf94057c77c558350200a02faa5b90696f" -H "If-None-Match: *"  
(base) aparker-adm@eve-501 ~ % kubectl exec -n prev-file-sharing-nooverwrite minio-55c59dbf89-8kh77 -- mc stat "local/loculus-preview-private/files/509e7474-a567-445f-aa04-0a9fe96d9e22"
Name      : 509e7474-a567-445f-aa04-0a9fe96d9e22
Date      : 2026-06-15 13:12:22 UTC 
Size      : 2.2 GiB 
ETag      : 5bd59559b0a8b409d7cbe9a76b43a5b4 
Type      : file 
Metadata  :
  Content-Type: binary/octet-stream 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend related to the loculus backend component preprocessing Issues related to the preprocessing component preview Triggers a deployment to argocd

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make files uneditable after submission

3 participants