Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot get a repo info which just created in a same thread. #1945

Open
LittleApple-fp16 opened this issue Dec 30, 2023 · 7 comments
Open

Cannot get a repo info which just created in a same thread. #1945

LittleApple-fp16 opened this issue Dec 30, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@LittleApple-fp16
Copy link

LittleApple-fp16 commented Dec 30, 2023

Describe the bug

when upload models to a new repository then get its info or using glob to extraction path right away
if these things are in a same thread, repository will be completely invisible to the thread unless using a new thread

another solution is to create a new fs instance without using token
image

Reproduction

No response

Logs

No response

System info

- huggingface_hub version: 0.19.4
- Platform: Windows-10-10.0.19044-SP0
- Python version: 3.10.6
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: E:\HuggingfaceHome\token
- Has saved token ?: True
- Who am I ?: LittleApple-fp16
- Configured git credential helpers: manager-core
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.1.2+cu118
- Jinja2: 3.1.2
- Graphviz: N/A
- Pydot: N/A
- Pillow: 9.5.0
- hf_transfer: N/A
- gradio: 4.7.1
- tensorboard: N/A
- numpy: 1.26.1
- pydantic: 2.4.2
- aiohttp: 3.8.6
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: E:\HuggingfaceHome\hub
- HF_ASSETS_CACHE: E:\HuggingfaceHome\assets
- HF_TOKEN_PATH: E:\HuggingfaceHome\token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
@LittleApple-fp16 LittleApple-fp16 added the bug Something isn't working label Dec 30, 2023
@julien-c
Copy link
Member

julien-c commented Jan 2, 2024

There might be a race condition and/or stale caching server-side.

In the meantime (to confirm or infirm this) can you check whether adding a e.g. 1sec delay in the same thread exhibits the same issue?

@Wauplin
Copy link
Contributor

Wauplin commented Jan 3, 2024

Could you post a reproducible example for this issue @LittleApple-fp16. I'm not sure to understand if it's 1. a server-side caching issue because repo creation and "fs.exists" calls are made very closely or 2. a client-side caching issue because HfFileSystem has cached the information that the file doesn't exist before the file has been created.

To test if the problem comes from 1., a 1s delay as suggested by @julien-c would be a good option.
To test if the problem comes from 2., a good idea would be to reproduce the error but instead of HfFileSystem.exists, use HfApi.file_exists(...). For the record, HfFileSystem is HF-compatible implementation of fsspec that is built to reproduce kinda a file system. It uses internally a cache system to avoid requesting multiple times the existence or not of some files. if you create a file using a HfFileSystem object, its internal cache is invalidated. But if you create the file separately without invalidating the fs cache, then you get inconsistencies. Please be aware of this limitation when using it. HfApi object on the other hand is never caching its results, meaning you have full control on what requests are made.


(small comment: HF_TOKEN environment variable is automatically read from your environment. This means that HfFileSystem(token=os.environ.get("HF_TOKEN") and HfFileSystem() are strictly equivalent. In both cases, your requests will be authenticated.)

@LittleApple-fp16
Copy link
Author

LittleApple-fp16 commented Jan 8, 2024

Thanks for your reply. I've tried many possible approaches to this error, including long-time delay and breakpoint debugging, but did not work, this also verifies what I said that this seems to only occur under certain conditions in the same thread.

@Wauplin
Copy link
Contributor

Wauplin commented Jan 8, 2024

@LittleApple-fp16 Can you provide a reproducible example then? If delays and breakpoints do not solve this problem, I am quite confident it comes from the cache inside HfFileSystem object. But to confirm that I would need to reproduce it myself. The screenshot shared in the issue description doesn't show how the file is created.

@LittleApple-fp16
Copy link
Author

LittleApple-fp16 commented Jan 9, 2024

@Wauplin Yeah i've tried reproduce it here. You can see the difference between Run Test and Run Post Test In New Thread steps. i just using same operation in new python file.

@LittleApple-fp16
Copy link
Author

LittleApple-fp16 commented Jan 9, 2024

Could you post a reproducible example for this issue @LittleApple-fp16. I'm not sure to understand if it's 1. a server-side caching issue because repo creation and "fs.exists" calls are made very closely or 2. a client-side caching issue because HfFileSystem has cached the information that the file doesn't exist before the file has been created.

To test if the problem comes from 1., a 1s delay as suggested by @julien-c would be a good option. To test if the problem comes from 2., a good idea would be to reproduce the error but instead of HfFileSystem.exists, use HfApi.file_exists(...). For the record, HfFileSystem is HF-compatible implementation of fsspec that is built to reproduce kinda a file system. It uses internally a cache system to avoid requesting multiple times the existence or not of some files. if you create a file using a HfFileSystem object, its internal cache is invalidated. But if you create the file separately without invalidating the fs cache, then you get inconsistencies. Please be aware of this limitation when using it. HfApi object on the other hand is never caching its results, meaning you have full control on what requests are made.

(small comment: HF_TOKEN environment variable is automatically read from your environment. This means that HfFileSystem(token=os.environ.get("HF_TOKEN") and HfFileSystem() are strictly equivalent. In both cases, your requests will be authenticated.)

Also tried HfApi.file_exists() here.
HfApi can solve this problem, but HfFS with no token param also.

@Wauplin
Copy link
Contributor

Wauplin commented Jan 12, 2024

Thanks for the reproducible example @LittleApple-fp16. Something is off here, I'll investigate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants