-
Notifications
You must be signed in to change notification settings - Fork 608
Closed
Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
- 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
structed_output cannot be used in cu118 with the lated docker images
Reproduction
- create docker container and start it.
docker create --name zhulintest --label zhulintest --workdir /__w/lmdeploy/lmdeploy --gpus=all --ipc=host --user root -e PIP_CACHE_DIR=/root/.cache/pip -e NVIDIA_DISABLE_REQUIRE=1 -e "NO_PROXY=localhost,127.0.0.1" -e "no_proxy=localhost,127.0.0.1" -v "/nvme":"/nvme" -v "/mnt":"/mnt" --entrypoint "tail" openmmlab/lmdeploy:latest-cu11 "-f" "/dev/null"
docker start XXX
docker exec -it XXX bash
- start api server
lmdeploy serve api_server internlm/internlm2_5-7b-chat --server-port 23333 --backend pytorch --tp 1 --session-len 128000
- run sript
from openai import OpenAI
client = OpenAI(
api_key='YOUR_API_KEY',
base_url="http://10.140.54.48:23333/v1"
)
prompt="Make a self-introduction please."
guide = {
'type': 'object',
'properties': {
'name': {
'type': 'string'
},
'skills': {
'type': 'array',
'items': {
'type': 'string',
'maxLength': 10
},
'minItems': 3
},
'work history': {
'type': 'array',
'items': {
'type': 'object',
'properties': {
'company': {
'type': 'string'
},
'duration': {
'type': 'string'
}
},
'required': ['company']
}
}
},
'required': ['name', 'skills', 'work history']
}
response_format=dict(type='json_schema', json_schema=dict(name='test',schema=guide))
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
model=model_name,
messages=[
{"role": "user", "content": prompt},
],
response_format=response_format
)
print(response)
Environment
sys.platform: linux
Python: 3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5,6,7: NVIDIA A100-SXM4-80GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.8, V11.8.89
GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.5.1+cu118
PyTorch compiling details: PyTorch built with:
- GCC 9.3
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX512
- CUDA Runtime 11.8
- NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_90,code=sm_90
- CuDNN 90.1
- Magma 2.6.1
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,
TorchVision: 0.20.1+cu118
LMDeploy: 0.7.0.post2+
transformers: 4.48.1
gradio: 5.13.1
fastapi: 0.115.7
pydantic: 2.10.6
triton: 3.1.0
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 mlx5_2 mlx5_3 CPU Affinity NUMA Affinity
GPU0 X NV12 NV12 NV12 NV12 NV12 NV12 NV12 PXB NODE 0-31,64-95 0
GPU1 NV12 X NV12 NV12 NV12 NV12 NV12 NV12 PXB NODE 0-31,64-95 0
GPU2 NV12 NV12 X NV12 NV12 NV12 NV12 NV12 NODE PXB 0-31,64-95 0
GPU3 NV12 NV12 NV12 X NV12 NV12 NV12 NV12 NODE PXB 0-31,64-95 0
GPU4 NV12 NV12 NV12 NV12 X NV12 NV12 NV12 SYS SYS 32-63,96-127 1
GPU5 NV12 NV12 NV12 NV12 NV12 X NV12 NV12 SYS SYS 32-63,96-127 1
GPU6 NV12 NV12 NV12 NV12 NV12 NV12 X NV12 SYS SYS 32-63,96-127 1
GPU7 NV12 NV12 NV12 NV12 NV12 NV12 NV12 X SYS SYS 32-63,96-127 1
mlx5_2 PXB PXB NODE NODE SYS SYS SYS SYS X NODE
mlx5_3 NODE NODE PXB PXB SYS SYS SYS SYS NODE X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
Error traceback
2025-02-08 04:49:35,937 - lmdeploy - ERROR - engine.py:904 - Task <MainLoopBackground> failed
Traceback (most recent call last):
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 899, in __task_callback
task.result()
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 857, in _async_loop_background
await self._async_step_background(
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 735, in _async_step_background
next_token_ids = await self.async_sampling_logits(logits, all_ids, guided_input_ids, sampling_inputs,
File "/opt/lmdeploy/lmdeploy/utils.py", line 234, in __tmp
return (await func(*args, **kwargs))
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 532, in async_sampling_logits
logits = await logits_processor(all_ids, guided_input_ids, split_logits)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/logits_process.py", line 338, in __call__
scores = _guided_sampling(sampling_inputs.response_formats, scores, guided_input_ids, self.tokenizer)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/logits_process.py", line 106, in _guided_sampling
from .guided_process import _get_guided_logits_processor
File "/opt/lmdeploy/lmdeploy/pytorch/engine/guided_process.py", line 23, in <module>
from outlines.fsm.guide import CFGGuide, Generate, RegexGuide, Write
File "/opt/py3/lib/python3.10/site-packages/outlines/__init__.py", line 5, in <module>
import outlines.types
File "/opt/py3/lib/python3.10/site-packages/outlines/types/__init__.py", line 1, in <module>
from . import airports, countries
File "/opt/py3/lib/python3.10/site-packages/outlines/types/airports.py", line 4, in <module>
from pyairports.airports import AIRPORT_LIST
File "/opt/py3/lib/python3.10/site-packages/pyairports/airports.py", line 1, in <module>
from pkg_resources import resource_string
File "/opt/py3/lib/python3.10/site-packages/pkg_resources/__init__.py", line 3292, in <module>
def _initialize_master_working_set():
File "/opt/py3/lib/python3.10/site-packages/pkg_resources/__init__.py", line 3266, in _call_aside
f(*args, **kwargs)
File "/opt/py3/lib/python3.10/site-packages/pkg_resources/__init__.py", line 3304, in _initialize_master_working_set
working_set = WorkingSet._build_master()
File "/opt/py3/lib/python3.10/site-packages/pkg_resources/__init__.py", line 600, in _build_master
ws.require(__requires__)
File "/opt/py3/lib/python3.10/site-packages/pkg_resources/__init__.py", line 937, in require
needed = self.resolve(parse_requirements(requirements))
File "/opt/py3/lib/python3.10/site-packages/pkg_resources/__init__.py", line 798, in resolve
dist = self._resolve_dist(
File "/opt/py3/lib/python3.10/site-packages/pkg_resources/__init__.py", line 839, in _resolve_dist
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'nvidia-nccl-cu11==2.21.5; platform_system == "Linux" and platform_machine == "x86_64"' distribution was not found and is required by torch
2025-02-08 04:49:35,938 - lmdeploy - ERROR - async_engine.py:777 - session 1 finished, reason "error"
Metadata
Metadata
Assignees
Labels
No labels