Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: Consistency check failed: file should be of size 28754810880 but has size 26303841200 #2464

Closed
anguoyuan opened this issue Aug 20, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@anguoyuan
Copy link

Describe the bug

I am downloading a dataset through:
from datasets import load_dataset, DatasetDict
dataset = load_dataset("nyu-visionx/Cambrian-Alignment",cache_dir="~\bucket-llm-1\datasets", download_mode='force_redownload')
.

The error message is:
Downloading data: 91%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 26.3G/28.8G [21:12<10:59:50, 61.9kB/s]Traceback (most recent call last):
File "", line 1, in
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/load.py", line 2628, in load_dataset
builder_instance.download_and_prepare(
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/builder.py", line 1029, in download_and_prepare
self._download_and_prepare(
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/builder.py", line 1791, in _download_and_prepare
super()._download_and_prepare(
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/builder.py", line 1102, in _download_and_prepare
split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/packaged_modules/webdataset/webdataset.py", line 61, in _split_generators
data_files = dl_manager.download(self.config.data_files)
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/download/download_manager.py", line 257, in download
downloaded_path_or_paths = map_nested(
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 494, in map_nested
mapped = [
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 495, in
map_nested(
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 511, in map_nested
mapped = [
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 512, in
_single_map_nested((function, obj, batched, batch_size, types, None, True, None))
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 380, in _single_map_nested
return [mapped_item for batch in iter_batched(data_struct, batch_size) for mapped_item in function(batch)]
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 380, in
return [mapped_item for batch in iter_batched(data_struct, batch_size) for mapped_item in function(batch)]
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/download/download_manager.py", line 313, in _download_batched
return [
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/download/download_manager.py", line 314, in
self._download_single(url_or_filename, download_config=download_config)
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/download/download_manager.py", line 323, in _download_single
out = cached_path(url_or_filename, download_config=download_config)
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 211, in cached_path
output_path = get_from_cache(
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 689, in get_from_cache
fsspec_get(
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 395, in fsspec_get
fs.get_file(path, temp_file.name, callback=callback)
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/huggingface_hub/hf_file_system.py", line 648, in get_file
http_get(
File "/home/anguoyuan111/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 578, in http_get
raise EnvironmentError(
OSError: Consistency check failed: file should be of size 28754810880 but has size 26303841200 ((…)be6153f2e0126fa516d57cc791b3/sbu558k.tar).
We are sorry for the inconvenience. Please retry with force_download=True.
If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.

Reproduction

No response

Logs

No response

System info

- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 1.26.4
- pydantic: 2.8.2
- aiohttp: 3.10.3
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /home/anguoyuan111/.cache/huggingface/hub
- HF_ASSETS_CACHE: /home/anguoyuan111/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/anguoyuan111/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
{'huggingface_hub version': '0.24.5', 'Platform': 'Linux-5.19.0-1022-gcp-x86_64-with-glibc2.35', 'Python version': '3.10.14', 'Running in iPython ?': 'No', 'Running in notebook ?': 'No', 'Running in Google Colab ?': 'No', 'Token path ?': '/home/anguoyuan111/.cache/huggingface/token', 'Has saved token ?': False, 'Configured git credential helpers': '', 'FastAI': 'N/A', 'Tensorflow': 'N/A', 'Torch': '2.4.0', 'Jinja2': '3.1.4', 'Graphviz': 'N/A', 'keras': 'N/A', 'Pydot': 'N/A', 'Pillow': '10.4.0', 'hf_transfer': 'N/A', 'gradio': 'N/A', 'tensorboard': 'N/A', 'numpy': '1.26.4', 'pydantic': '2.8.2', 'aiohttp': '3.10.3', 'ENDPOINT': 'https://huggingface.co', 'HF_HUB_CACHE': '/home/anguoyuan111/.cache/huggingface/hub', 'HF_ASSETS_CACHE': '/home/anguoyuan111/.cache/huggingface/assets', 'HF_TOKEN_PATH': '/home/anguoyuan111/.cache/huggingface/token', 'HF_HUB_OFFLINE': False, 'HF_HUB_DISABLE_TELEMETRY': False, 'HF_HUB_DISABLE_PROGRESS_BARS': None, 'HF_HUB_DISABLE_SYMLINKS_WARNING': False, 'HF_HUB_DISABLE_EXPERIMENTAL_WARNING': False, 'HF_HUB_DISABLE_IMPLICIT_TOKEN': False, 'HF_HUB_ENABLE_HF_TRANSFER': False, 'HF_HUB_ETAG_TIMEOUT': 10, 'HF_HUB_DOWNLOAD_TIMEOUT': 10}
@anguoyuan anguoyuan added the bug Something isn't working label Aug 20, 2024
@hanouticelina
Copy link
Contributor

hanouticelina commented Oct 31, 2024

closed as duplicate of #2372

@hanouticelina hanouticelina closed this as not planned Won't fix, can't repro, duplicate, stale Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants