Open
Description
Describe the bug
When trying to deploy the model with vllm, I get the following error: OSError: Consistency check failed: file should be of size 4966157056 but has size 4966842368 (model-00014-of-00030.safetensors).
I have tried to set FORCE_DOWNLOAD
and HF_HUB_ENABLE_HF_TRANSFER
to true and have enough space for the model (only uses about 25% of available space).
Reproduction
vllm serve meta-llama/Llama-3.3-70B-Instruct
args:
- --download-dir
- /data
- --max-model-len
- "65536"
- --max-logprobs
- "5"
- --trust-remote-code
- --disable-log-requests
- --use-v2-block-manager
- --enforce-eager
- --tensor-parallel-size
- "4"
Logs
(VllmWorkerProcess pid=358) INFO 01-23 21:46:38 weight_utils.py:243] Using model weights format ['*.safetensors']
ERROR 01-23 21:59:31 engine.py:366] Consistency check failed: file should be of size 4966157056 but has size 4966842368 (model-00014-of-00030.safetensors).
ERROR 01-23 21:59:31 engine.py:366] We are sorry for the inconvenience. Please retry with `force_download=True`.
ERROR 01-23 21:59:31 engine.py:366] If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.
ERROR 01-23 21:59:31 engine.py:366] Traceback (most recent call last):
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
ERROR 01-23 21:59:31 engine.py:366] engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args
ERROR 01-23 21:59:31 engine.py:366] return cls(ipc_path=ipc_path,
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 71, in __init__
ERROR 01-23 21:59:31 engine.py:366] self.engine = LLMEngine(*args, **kwargs)
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 347, in __init__
ERROR 01-23 21:59:31 engine.py:366] self.model_executor = executor_class(vllm_config=vllm_config, )
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/distributed_gpu_executor.py", line 26, in __init__
ERROR 01-23 21:59:31 engine.py:366] super().__init__(*args, **kwargs)
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 36, in __init__
ERROR 01-23 21:59:31 engine.py:366] self._init_executor()
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 114, in _init_executor
ERROR 01-23 21:59:31 engine.py:366] self._run_workers("load_model",
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 195, in _run_workers
ERROR 01-23 21:59:31 engine.py:366] driver_worker_output = driver_worker_method(*args, **kwargs)
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 152, in load_model
ERROR 01-23 21:59:31 engine.py:366] self.model_runner.load_model()
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1074, in load_model
ERROR 01-23 21:59:31 engine.py:366] self.model = get_model(vllm_config=self.vllm_config)
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 12, in get_model
ERROR 01-23 21:59:31 engine.py:366] return loader.load_model(vllm_config=vllm_config)
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 334, in load_model
ERROR 01-23 21:59:31 engine.py:366] model.load_weights(self._get_all_weights(model_config, model))
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 586, in load_weights
ERROR 01-23 21:59:31 engine.py:366] loader.load_weights(
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 229, in load_weights
ERROR 01-23 21:59:31 engine.py:366] autoloaded_weights = list(self._load_module("", self.module, weights))
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 181, in _load_module
ERROR 01-23 21:59:31 engine.py:366] for child_prefix, child_weights in self._groupby_prefix(weights):
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 103, in _groupby_prefix
ERROR 01-23 21:59:31 engine.py:366] for prefix, group in itertools.groupby(weights_by_parts,
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 101, in <genexpr>
ERROR 01-23 21:59:31 engine.py:366] for weight_name, weight_data in weights)
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 588, in <genexpr>
ERROR 01-23 21:59:31 engine.py:366] for name, loaded_weight in weights)
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 313, in _get_all_weights
ERROR 01-23 21:59:31 engine.py:366] yield from self._get_weights_iterator(primary_weights)
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 272, in _get_weights_iterator
ERROR 01-23 21:59:31 engine.py:366] hf_folder, hf_weights_files, use_safetensors = self._prepare_weights(
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 228, in _prepare_weights
ERROR 01-23 21:59:31 engine.py:366] hf_folder = download_weights_from_hf(
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/weight_utils.py", line 247, in download_weights_from_hf
ERROR 01-23 21:59:31 engine.py:366] hf_folder = snapshot_download(
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
ERROR 01-23 21:59:31 engine.py:366] return fn(*args, **kwargs)
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/_snapshot_download.py", line 291, in snapshot_download
ERROR 01-23 21:59:31 engine.py:366] _inner_hf_hub_download(file)
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/_snapshot_download.py", line 267, in _inner_hf_hub_download
ERROR 01-23 21:59:31 engine.py:366] return hf_hub_download(
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
ERROR 01-23 21:59:31 engine.py:366] return fn(*args, **kwargs)
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 862, in hf_hub_download
ERROR 01-23 21:59:31 engine.py:366] return _hf_hub_download_to_cache_dir(
ERROR 01-23 21:59:31 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 1011, in _hf_hub_download_to_cache_dir
ERROR 01-23 21:59:31 engine.py:366] _download_to_tmp_and_move(
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 1545, in _download_to_tmp_and_move
ERROR 01-23 21:59:31 engine.py:366] http_get(
ERROR 01-23 21:59:31 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 446, in http_get
ERROR 01-23 21:59:31 engine.py:366] raise EnvironmentError(
ERROR 01-23 21:59:31 engine.py:366] OSError: Consistency check failed: file should be of size 4966157056 but has size 4966842368 (model-00014-of-00030.safetensors).
ERROR 01-23 21:59:31 engine.py:366] We are sorry for the inconvenience. Please retry with `force_download=True`.
ERROR 01-23 21:59:31 engine.py:366] If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.
ERROR 01-23 21:59:31 multiproc_worker_utils.py:116] Worker VllmWorkerProcess pid 358 died, exit code: -15
INFO 01-23 21:59:31 multiproc_worker_utils.py:120] Killing local vLLM worker processes
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 368, in run_mp_engine
raise e
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args
return cls(ipc_path=ipc_path,
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 71, in __init__
self.engine = LLMEngine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 347, in __init__
self.model_executor = executor_class(vllm_config=vllm_config, )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/distributed_gpu_executor.py", line 26, in __init__
super().__init__(*args, **kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 36, in __init__
self._init_executor()
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 114, in _init_executor
self._run_workers("load_model",
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 195, in _run_workers
driver_worker_output = driver_worker_method(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 152, in load_model
self.model_runner.load_model()
File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1074, in load_model
self.model = get_model(vllm_config=self.vllm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 12, in get_model
return loader.load_model(vllm_config=vllm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 334, in load_model
model.load_weights(self._get_all_weights(model_config, model))
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 586, in load_weights
loader.load_weights(
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 229, in load_weights
autoloaded_weights = list(self._load_module("", self.module, weights))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 181, in _load_module
for child_prefix, child_weights in self._groupby_prefix(weights):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 103, in _groupby_prefix
for prefix, group in itertools.groupby(weights_by_parts,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 101, in <genexpr>
for weight_name, weight_data in weights)
^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 588, in <genexpr>
for name, loaded_weight in weights)
^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 313, in _get_all_weights
yield from self._get_weights_iterator(primary_weights)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 272, in _get_weights_iterator
hf_folder, hf_weights_files, use_safetensors = self._prepare_weights(
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 228, in _prepare_weights
hf_folder = download_weights_from_hf(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/weight_utils.py", line 247, in download_weights_from_hf
hf_folder = snapshot_download(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/_snapshot_download.py", line 291, in snapshot_download
_inner_hf_hub_download(file)
File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/_snapshot_download.py", line 267, in _inner_hf_hub_download
return hf_hub_download(
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 862, in hf_hub_download
return _hf_hub_download_to_cache_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 1011, in _hf_hub_download_to_cache_dir
_download_to_tmp_and_move(
File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 1545, in _download_to_tmp_and_move
http_get(
File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 446, in http_get
raise EnvironmentError(
OSError: Consistency check failed: file should be of size 4966157056 but has size 4966842368 (model-00014-of-00030.safetensors).
We are sorry for the inconvenience. Please retry with `force_download=True`.
If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.
[rank0]:[W123 21:59:33.617452324 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
System info
- huggingface_hub version: 0.26.2
- Platform: Linux-3.10.0-1160.25.1.el7.x86_64-x86_64-with-glibc2.35
- Python version: 3.12.7
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Running in Google Colab Enterprise ?: No
- Token path ?: /root/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: BalazsHoranyi
- Configured git credential helpers:
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.5.1
- Jinja2: 3.1.4
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 10.4.0
- hf_transfer: 0.1.8
- gradio: N/A
- tensorboard: N/A
- numpy: 1.26.4
- pydantic: 2.9.2
- aiohttp: 3.11.2
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /root/.cache/huggingface/hub
- HF_ASSETS_CACHE: /root/.cache/huggingface/assets
- HF_TOKEN_PATH: /root/.cache/huggingface/token
- HF_STORED_TOKENS_PATH: /root/.cache/huggingface/stored_tokens
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: True
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10