-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: The API crashes after sending a chat completion request that includes seed
.
#773
Comments
Hi @ZoneTwelve thanks for the issue. $ git clone https://github.com/HabanaAI/vllm-fork.git |
Yes off course, I will do that for you, but that might take some time. |
@PatrykWo I'm sorry about the bad news. I getting some error while running the branch you have provided and here the error. Error log issue-773 | Traceback (most recent call last):
issue-773 | File "/usr/lib/python3.10/runpy.py", line 187, in _run_module_as_main
issue-773 | mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
issue-773 | File "/usr/lib/python3.10/runpy.py", line 110, in _get_module_details
issue-773 | __import__(pkg_name)
issue-773 | File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.4.post2+gaudi000-py3.10.egg/vllm/__init__.py", line 7, in <module>
issue-773 | from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
issue-773 | File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.4.post2+gaudi000-py3.10.egg/vllm/engine/arg_utils.py", line 11, in <module>
issue-773 | from vllm.config import (CacheConfig, ConfigFormat, DecodingConfig,
issue-773 | File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.4.post2+gaudi000-py3.10.egg/vllm/config.py", line 16, in <module>
issue-773 | from vllm.model_executor.layers.quantization import QUANTIZATION_METHODS
issue-773 | File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.4.post2+gaudi000-py3.10.egg/vllm/model_executor/layers/quantization/__init__.py", line 5, in <module>
issue-773 | from vllm.model_executor.layers.quantization.awq_marlin import AWQMarlinConfig
issue-773 | File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.4.post2+gaudi000-py3.10.egg/vllm/model_executor/layers/quantization/awq_marlin.py", line 6, in <module>
issue-773 | import vllm.model_executor.layers.fused_moe # noqa
issue-773 | File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.4.post2+gaudi000-py3.10.egg/vllm/model_executor/layers/fused_moe/__init__.py", line 34, in <module>
issue-773 | import vllm.model_executor.layers.fused_moe.fused_marlin_moe # noqa
issue-773 | File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.4.post2+gaudi000-py3.10.egg/vllm/model_executor/layers/fused_moe/fused_marlin_moe.py", line 8, in <module>
issue-773 | from vllm.model_executor.layers.fused_moe.fused_moe import (
issue-773 | File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.4.post2+gaudi000-py3.10.egg/vllm/model_executor/layers/fused_moe/fused_moe.py", line 18, in <module>
issue-773 | from vllm_hpu_extension.ops import scaled_fp8_quant
issue-773 | File "/usr/local/lib/python3.10/dist-packages/vllm_hpu_extension/ops.py", line 9, in <module>
issue-773 | import habana_frameworks.torch as htorch
issue-773 | File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/__init__.py", line 54, in <module>
issue-773 | import habana_frameworks.torch.core
issue-773 | File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/__init__.py", line 114, in <module>
issue-773 | import_compilers()
issue-773 | File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/dynamo/compile_backend/backends.py", line 39, in import_compilers
issue-773 | from .compilers import hpu_inference_compiler, hpu_training_compiler_bw, hpu_training_compiler_fw
issue-773 | File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/dynamo/compile_backend/compilers.py", line 27, in <module>
issue-773 | from .freezing_passes import freeze
issue-773 | File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/dynamo/compile_backend/freezing_passes.py", line 28, in <module>
issue-773 | from torch._inductor.freezing import discard_traced_gm_params, invalidate_eager_modules, replace_params_with_constants
issue-773 | File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/freezing.py", line 15, in <module>
issue-773 | from torch._inductor.fx_passes.freezing_patterns import freezing_passes
issue-773 | File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/fx_passes/freezing_patterns.py", line 5, in <module>
issue-773 | from torch._inductor.compile_fx import fake_tensor_prop
issue-773 | File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 49, in <module>
issue-773 | from torch._inductor.debug import save_args_for_compile_fx_inner
issue-773 | File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/debug.py", line 26, in <module>
issue-773 | from . import config, ir # noqa: F811, this is needed
issue-773 | File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/ir.py", line 77, in <module>
issue-773 | from .runtime.hints import ReductionHint
issue-773 | File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/runtime/hints.py", line 36, in <module>
issue-773 | attr_desc_fields = {f.name for f in fields(AttrsDescriptor)}
issue-773 | File "/usr/lib/python3.10/dataclasses.py", line 1198, in fields
issue-773 | raise TypeError('must be called with a dataclass type or instance') from None
issue-773 | TypeError: must be called with a dataclass type or instance
issue-773 exited with code 1 |
@ZoneTwelve it's not bad news rather confirmation that it's not related with a build. Thanks. We are on it. |
Your current environment
Crash after send a conversation requests that contains `seed`
Executed using the Docker image vllm-fork:9af82c
with the base gaudi image vault.habana.ai/gaudi-docker/1.19.1/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest
After deploying a Phi-4 Model on the Gaudi 2 development environment, I encountered an issue with my data generation code, resulting in the following error.
After sending the request without
seed
, the API server functions properly.MITMProxy flow record:
Model Input Dumps
No response
🐛 Describe the bug
Explanation:
The
seed
value in the payload may not be accepted by the server, leading to the error. You can try removing the"seed": 12345
field to resolve this issue. The error occurs because theseed
parameter might conflict with the server's internal handling of random number generation or other processes. Removing theseed
parameter should allow the request to be processed correctly.Before submitting a new issue...
The text was updated successfully, but these errors were encountered: