meta-llama/Llama-3.3-70B-Instruct: OSError: Consistency check failed

### Describe the bug

When trying to deploy the model with vllm, I get the following error: `OSError: Consistency check failed: file should be of size 4966157056 but has size 4966842368 (model-00014-of-00030.safetensors).`

I have tried to set `FORCE_DOWNLOAD` and `HF_HUB_ENABLE_HF_TRANSFER` to true and have enough space for the model (only uses about 25% of available space).

### Reproduction

vllm serve meta-llama/Llama-3.3-70B-Instruct
      args:
        - --download-dir
        - /data
        - --max-model-len
        - "65536"
        - --max-logprobs
        - "5"
        - --trust-remote-code
        - --disable-log-requests
        - --use-v2-block-manager
        - --enforce-eager
        - --tensor-parallel-size
        - "4"

### Logs

```shell
(VllmWorkerProcess pid=358) INFO 01-23 21:46:38 weight_utils.py:243] Using model weights format ['*.safetensors']
ERROR 01-23 21:59:31 engine.py:366] Consistency check failed: file should be of size 4966157056 but has size 4966842368 (model-00014-of-00030.safetensors).
ERROR 01-23 21:59:31 engine.py:366] We are sorry for the inconvenience. Please retry with `force_download=True`.
ERROR 01-23 21:59:31 engine.py:366] If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.
ERROR 01-23 21:59:31 engine.py:366] Traceback (most recent call last):
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
ERROR 01-23 21:59:31 engine.py:366]     engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 01-23 21:59:31 engine.py:366]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args
ERROR 01-23 21:59:31 engine.py:366]     return cls(ipc_path=ipc_path,
ERROR 01-23 21:59:31 engine.py:366]            ^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 71, in __init__
ERROR 01-23 21:59:31 engine.py:366]     self.engine = LLMEngine(*args, **kwargs)
ERROR 01-23 21:59:31 engine.py:366]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 347, in __init__
ERROR 01-23 21:59:31 engine.py:366]     self.model_executor = executor_class(vllm_config=vllm_config, )
ERROR 01-23 21:59:31 engine.py:366]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/distributed_gpu_executor.py", line 26, in __init__
ERROR 01-23 21:59:31 engine.py:366]     super().__init__(*args, **kwargs)
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 36, in __init__
ERROR 01-23 21:59:31 engine.py:366]     self._init_executor()
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 114, in _init_executor
ERROR 01-23 21:59:31 engine.py:366]     self._run_workers("load_model",
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 195, in _run_workers
ERROR 01-23 21:59:31 engine.py:366]     driver_worker_output = driver_worker_method(*args, **kwargs)
ERROR 01-23 21:59:31 engine.py:366]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 152, in load_model
ERROR 01-23 21:59:31 engine.py:366]     self.model_runner.load_model()
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1074, in load_model
ERROR 01-23 21:59:31 engine.py:366]     self.model = get_model(vllm_config=self.vllm_config)
ERROR 01-23 21:59:31 engine.py:366]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 12, in get_model
ERROR 01-23 21:59:31 engine.py:366]     return loader.load_model(vllm_config=vllm_config)
ERROR 01-23 21:59:31 engine.py:366]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 334, in load_model
ERROR 01-23 21:59:31 engine.py:366]     model.load_weights(self._get_all_weights(model_config, model))
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 586, in load_weights
ERROR 01-23 21:59:31 engine.py:366]     loader.load_weights(
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 229, in load_weights
ERROR 01-23 21:59:31 engine.py:366]     autoloaded_weights = list(self._load_module("", self.module, weights))
ERROR 01-23 21:59:31 engine.py:366]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 181, in _load_module
ERROR 01-23 21:59:31 engine.py:366]     for child_prefix, child_weights in self._groupby_prefix(weights):
ERROR 01-23 21:59:31 engine.py:366]                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 103, in _groupby_prefix
ERROR 01-23 21:59:31 engine.py:366]     for prefix, group in itertools.groupby(weights_by_parts,
ERROR 01-23 21:59:31 engine.py:366]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 101, in <genexpr>
ERROR 01-23 21:59:31 engine.py:366]     for weight_name, weight_data in weights)
ERROR 01-23 21:59:31 engine.py:366]                                     ^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 588, in <genexpr>
ERROR 01-23 21:59:31 engine.py:366]     for name, loaded_weight in weights)
ERROR 01-23 21:59:31 engine.py:366]                                ^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 313, in _get_all_weights
ERROR 01-23 21:59:31 engine.py:366]     yield from self._get_weights_iterator(primary_weights)
ERROR 01-23 21:59:31 engine.py:366]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 272, in _get_weights_iterator
ERROR 01-23 21:59:31 engine.py:366]     hf_folder, hf_weights_files, use_safetensors = self._prepare_weights(
ERROR 01-23 21:59:31 engine.py:366]                                                    ^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 228, in _prepare_weights
ERROR 01-23 21:59:31 engine.py:366]     hf_folder = download_weights_from_hf(
ERROR 01-23 21:59:31 engine.py:366]                 ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/weight_utils.py", line 247, in download_weights_from_hf
ERROR 01-23 21:59:31 engine.py:366]     hf_folder = snapshot_download(
ERROR 01-23 21:59:31 engine.py:366]                 ^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
ERROR 01-23 21:59:31 engine.py:366]     return fn(*args, **kwargs)
ERROR 01-23 21:59:31 engine.py:366]            ^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/_snapshot_download.py", line 291, in snapshot_download
ERROR 01-23 21:59:31 engine.py:366]     _inner_hf_hub_download(file)
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/_snapshot_download.py", line 267, in _inner_hf_hub_download
ERROR 01-23 21:59:31 engine.py:366]     return hf_hub_download(
ERROR 01-23 21:59:31 engine.py:366]            ^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
ERROR 01-23 21:59:31 engine.py:366]     return fn(*args, **kwargs)
ERROR 01-23 21:59:31 engine.py:366]            ^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 862, in hf_hub_download
ERROR 01-23 21:59:31 engine.py:366]     return _hf_hub_download_to_cache_dir(
ERROR 01-23 21:59:31 engine.py:366]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 1011, in _hf_hub_download_to_cache_dir
ERROR 01-23 21:59:31 engine.py:366]     _download_to_tmp_and_move(
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 1545, in _download_to_tmp_and_move
ERROR 01-23 21:59:31 engine.py:366]     http_get(
ERROR 01-23 21:59:31 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 446, in http_get
ERROR 01-23 21:59:31 engine.py:366]     raise EnvironmentError(
ERROR 01-23 21:59:31 engine.py:366] OSError: Consistency check failed: file should be of size 4966157056 but has size 4966842368 (model-00014-of-00030.safetensors).
ERROR 01-23 21:59:31 engine.py:366] We are sorry for the inconvenience. Please retry with `force_download=True`.
ERROR 01-23 21:59:31 engine.py:366] If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.
ERROR 01-23 21:59:31 multiproc_worker_utils.py:116] Worker VllmWorkerProcess pid 358 died, exit code: -15
INFO 01-23 21:59:31 multiproc_worker_utils.py:120] Killing local vLLM worker processes
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 368, in run_mp_engine
    raise e
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
    engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args
    return cls(ipc_path=ipc_path,
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 71, in __init__
    self.engine = LLMEngine(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 347, in __init__
    self.model_executor = executor_class(vllm_config=vllm_config, )
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/distributed_gpu_executor.py", line 26, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 36, in __init__
    self._init_executor()
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 114, in _init_executor
    self._run_workers("load_model",
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 195, in _run_workers
    driver_worker_output = driver_worker_method(*args, **kwargs)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 152, in load_model
    self.model_runner.load_model()
  File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1074, in load_model
    self.model = get_model(vllm_config=self.vllm_config)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 12, in get_model
    return loader.load_model(vllm_config=vllm_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 334, in load_model
    model.load_weights(self._get_all_weights(model_config, model))
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 586, in load_weights
    loader.load_weights(
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 229, in load_weights
    autoloaded_weights = list(self._load_module("", self.module, weights))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 181, in _load_module
    for child_prefix, child_weights in self._groupby_prefix(weights):
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 103, in _groupby_prefix
    for prefix, group in itertools.groupby(weights_by_parts,
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 101, in <genexpr>
    for weight_name, weight_data in weights)
                                    ^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 588, in <genexpr>
    for name, loaded_weight in weights)
                               ^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 313, in _get_all_weights
    yield from self._get_weights_iterator(primary_weights)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 272, in _get_weights_iterator
    hf_folder, hf_weights_files, use_safetensors = self._prepare_weights(
                                                   ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 228, in _prepare_weights
    hf_folder = download_weights_from_hf(
                ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/weight_utils.py", line 247, in download_weights_from_hf
    hf_folder = snapshot_download(
                ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/_snapshot_download.py", line 291, in snapshot_download
    _inner_hf_hub_download(file)
  File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/_snapshot_download.py", line 267, in _inner_hf_hub_download
    return hf_hub_download(
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 862, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 1011, in _hf_hub_download_to_cache_dir
    _download_to_tmp_and_move(
  File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 1545, in _download_to_tmp_and_move
    http_get(
  File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 446, in http_get
    raise EnvironmentError(
OSError: Consistency check failed: file should be of size 4966157056 but has size 4966842368 (model-00014-of-00030.safetensors).
We are sorry for the inconvenience. Please retry with `force_download=True`.
If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.
[rank0]:[W123 21:59:33.617452324 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
```

### System info

```shell
- huggingface_hub version: 0.26.2
- Platform: Linux-3.10.0-1160.25.1.el7.x86_64-x86_64-with-glibc2.35
- Python version: 3.12.7
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Running in Google Colab Enterprise ?: No
- Token path ?: /root/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: BalazsHoranyi
- Configured git credential helpers: 
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.5.1
- Jinja2: 3.1.4
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 10.4.0
- hf_transfer: 0.1.8
- gradio: N/A
- tensorboard: N/A
- numpy: 1.26.4
- pydantic: 2.9.2
- aiohttp: 3.11.2
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /root/.cache/huggingface/hub
- HF_ASSETS_CACHE: /root/.cache/huggingface/assets
- HF_TOKEN_PATH: /root/.cache/huggingface/token
- HF_STORED_TOKENS_PATH: /root/.cache/huggingface/stored_tokens
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: True
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

meta-llama/Llama-3.3-70B-Instruct: OSError: Consistency check failed #2779

Describe the bug

Reproduction

Logs

System info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

meta-llama/Llama-3.3-70B-Instruct: OSError: Consistency check failed #2779

Description

Describe the bug

Reproduction

Logs

System info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions