Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: 英伟达最新驱动555.85,vllm运行报错 #5035

Closed
gaye746560359 opened this issue May 24, 2024 · 9 comments
Closed

[Bug]: 英伟达最新驱动555.85,vllm运行报错 #5035

gaye746560359 opened this issue May 24, 2024 · 9 comments
Labels
bug Something isn't working stale

Comments

@gaye746560359
Copy link

gaye746560359 commented May 24, 2024

2024-05-24 23:49:38 WARNING 05-24 15:49:38 utils.py:327] Not found nvcc in /usr/local/cuda. Skip cuda version check! 2024-05-24 23:49:38 INFO 05-24 15:49:38 config.py:379] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. But it may cause slight accuracy drop without scaling factors. FP8_E5M2 (without scaling) is only supported on cuda version greater than 11.8. On ROCm (AMD GPU), FP8_E4M3 is instead supported for common inference criteria. 2024-05-24 23:49:38 WARNING 05-24 15:49:38 config.py:405] Possibly too large swap space. 4.00 GiB out of the 9.71 GiB total CPU memory is allocated for the swap space. 2024-05-24 23:49:38 INFO 05-24 15:49:38 llm_engine.py:100] Initializing an LLM engine (v0.4.2) with config: model='shenzhi-wang/Llama3-8B-Chinese-Chat', speculative_config=None, tokenizer='shenzhi-wang/Llama3-8B-Chinese-Chat', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=fp8, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=gpt-3.5-turbo) 2024-05-24 23:49:39 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2024-05-24 23:49:39 INFO 05-24 15:49:39 utils.py:660] Found nccl from library /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 2024-05-24 23:49:39 WARNING 05-24 15:49:39 utils.py:465] Using 'pin_memory=False' as WSL is detected. This may slow down the performance. 2024-05-24 23:49:39 Traceback (most recent call last): 2024-05-24 23:49:39 File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main 2024-05-24 23:49:39 return _run_code(code, main_globals, None, 2024-05-24 23:49:39 File "/usr/lib/python3.10/runpy.py", line 86, in _run_code 2024-05-24 23:49:39 exec(code, run_globals) 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 168, in <module> 2024-05-24 23:49:39 engine = AsyncLLMEngine.from_engine_args( 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 366, in from_engine_args 2024-05-24 23:49:39 engine = cls( 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 324, in __init__ 2024-05-24 23:49:39 self.engine = self._init_engine(*args, **kwargs) 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 442, in _init_engine 2024-05-24 23:49:39 return engine_class(*args, **kwargs) 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 160, in __init__ 2024-05-24 23:49:39 self.model_executor = executor_class( 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 41, in __init__ 2024-05-24 23:49:39 self._init_executor() 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 23, in _init_executor 2024-05-24 23:49:39 self._init_non_spec_worker() 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 67, in _init_non_spec_worker 2024-05-24 23:49:39 self.driver_worker = self._create_worker() 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 59, in _create_worker 2024-05-24 23:49:39 wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank, 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 131, in init_worker 2024-05-24 23:49:39 self.worker = worker_class(*args, **kwargs) 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 73, in __init__ 2024-05-24 23:49:39 self.model_runner = ModelRunner( 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 145, in __init__ 2024-05-24 23:49:39 self.attn_backend = get_attn_backend( 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/attention/selector.py", line 25, in get_attn_backend 2024-05-24 23:49:39 backend = _which_attn_to_use(dtype) 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/vllm/attention/selector.py", line 67, in _which_attn_to_use 2024-05-24 23:49:39 if torch.cuda.get_device_capability()[0] < 8: 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 430, in get_device_capability 2024-05-24 23:49:39 prop = get_device_properties(device) 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 444, in get_device_properties 2024-05-24 23:49:39 _lazy_init() # will define _get_device_properties 2024-05-24 23:49:39 File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 293, in _lazy_init 2024-05-24 23:49:39 torch._C._cuda_init() 2024-05-24 23:49:39 RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found

@gaye746560359 gaye746560359 added the bug Something isn't working label May 24, 2024
@HelloCard
Copy link

wsl以及cudaGetDeviceCount错误……我遇到过一个类似的故障,原因是wsl外面安装了向日葵远程工具,向日葵创建的虚拟显卡干扰了wsl里面的vllm。

@gaye746560359
Copy link
Author

wsl以及cudaGetDeviceCount错误……我遇到过一个类似的故障,原因是wsl外面安装了向日葵远程工具,向日葵创建的虚拟显卡干扰了wsl里面的vllm。

我卸载了向日葵,运行还是报错

@MarioLiebisch
Copy link

Stumbled over this issue while looking around to see if there have been any fixes.

I just checked the Nvidia driver feedback thread and it's actually a listed known issue:

PyTorch-CUDA Docker not compatible with CUDA 12.5/GRD 555.85 [4668302]

@cliffwoolley
Copy link

Please see NVIDIA/nvidia-container-toolkit#520 .

@gaye746560359
Copy link
Author

请参阅NVIDIA/nvidia-container-toolkit#520

Docker Desktop 时遇到此症状,则表示正在修复(升级捆绑的 nvidia-container-toolkit);预计什么时候修复更新?

@cliffwoolley
Copy link

请参阅NVIDIA/nvidia-container-toolkit#520

Docker Desktop 时遇到此症状,则表示正在修复(升级捆绑的 nvidia-container-toolkit);预计什么时候修复更新?

We are moving as quickly with it as we can, but I don't have an ETA yet, which is why I didn't list one in the other issue.

@cliffwoolley
Copy link

Docker Desktop 4.31 was released yesterday and includes NVIDIA Container Toolkit 1.15.0, which resolves this issue.

Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Oct 26, 2024
Copy link

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

4 participants