-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support incremental uploads #1430
Comments
I don't think we can really do this as:
I think you should look into client-side workaround for this |
tagging @XciD @coyotte508 @Kakulukian just in case (but i think it's a long term subject) |
We'd need to create a new cloud backend service that the user can stream arbitrary content to, including files over 5GB, and that can then compute the size / sha of the files. IMO it'd be a new service separate from moon-landing, and it needs dev time |
with the S3 upload url, it's not possible to push in multipart directly from the client ? |
Ah... I thought you needed to know the file size beforehand + have a minimum file size but apparently not. I guess you can use the old multipart endpoint @mariosasko , It's deprecated and will get interrupted every time the hub reloads, but maybe it's possible to put it in a separate kube pod @XciD Edit: see the old code - you can reuse it maybe: huggingface_hub/src/huggingface_hub/hf_api.py Line 1728 in a8b6f14
|
The current problem is that before getting a S3 upload url, we need to send to the server the size and sha of the file in order to know if the client should upload it as regular or LFS. |
If I'm not mistaken, this also means we cannot address huggingface/datasets#5045 (uploading Parquet shards iteratively in And, |
For the record, what we've been thinking a little bit about recently would be to move away from git hosting a little bit, and potentially either:
In all cases, this is all very long term |
It would be great to support incremental uploads to avoid a temporary file creation in
HfFileSystemFile._initiate_upload
and to be more aligned withfsspec
's philosophy (see huggingface/hffs#1 (comment))When uploading a
HfFileSystemFile
, the file contents are not known in advance, meaning we can't compute the file'ssha
andsize
, which are needed to fetch the upload mode or compute the number of parts in the multi-part upload mode on themoon-landing
side, etc.Fixing this would probably require a new endpoint that accepts file contents in chunks, computes their GIT metadata, and writes them to a repo (as a regular or an LFS file).
(cc @julien-c @coyotte508)
The text was updated successfully, but these errors were encountered: