[Bug] structed_output cannot be used in cu118 with the lated docker images

### Checklist

- [ ] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

### Describe the bug

structed_output cannot be used in cu118 with the lated docker images

### Reproduction

1. create  docker container and start it. 
```
docker create --name zhulintest --label zhulintest --workdir /__w/lmdeploy/lmdeploy --gpus=all --ipc=host --user root -e PIP_CACHE_DIR=/root/.cache/pip -e NVIDIA_DISABLE_REQUIRE=1 -e "NO_PROXY=localhost,127.0.0.1" -e "no_proxy=localhost,127.0.0.1"  -v "/nvme":"/nvme" -v "/mnt":"/mnt" --entrypoint "tail" openmmlab/lmdeploy:latest-cu11 "-f" "/dev/null"
docker start XXX 
docker exec -it XXX bash
```
2. start api server
`lmdeploy serve api_server internlm/internlm2_5-7b-chat --server-port 23333 --backend pytorch --tp 1 --session-len 128000`
3. run sript

```
from openai import OpenAI

client = OpenAI(
        api_key='YOUR_API_KEY',
        base_url="http://10.140.54.48:23333/v1"
        )

prompt="Make a self-introduction please."

guide = {
    'type': 'object',
    'properties': {
        'name': {
            'type': 'string'
        },
        'skills': {
            'type': 'array',
            'items': {
                'type': 'string',
                'maxLength': 10
            },
            'minItems': 3
        },
        'work history': {
            'type': 'array',
            'items': {
                'type': 'object',
                'properties': {
                    'company': {
                        'type': 'string'
                    },
                    'duration': {
                        'type': 'string'
                    }
                },
                'required': ['company']
            }
        }
    },
    'required': ['name', 'skills', 'work history']
}

response_format=dict(type='json_schema',  json_schema=dict(name='test',schema=guide))

model_name = client.models.list().data[0].id
response = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "user", "content": prompt},
    ],
    response_format=response_format
)
print(response)



```

### Environment

```Shell
sys.platform: linux
Python: 3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5,6,7: NVIDIA A100-SXM4-80GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.8, V11.8.89
GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.5.1+cu118
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_90,code=sm_90
  - CuDNN 90.1
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.20.1+cu118
LMDeploy: 0.7.0.post2+
transformers: 4.48.1
gradio: 5.13.1
fastapi: 0.115.7
pydantic: 2.10.6
triton: 3.1.0
NVIDIA Topology: 
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    mlx5_2  mlx5_3  CPU Affinity    NUMA Affinity
GPU0     X      NV12    NV12    NV12    NV12    NV12    NV12    NV12    PXB     NODE    0-31,64-95      0
GPU1    NV12     X      NV12    NV12    NV12    NV12    NV12    NV12    PXB     NODE    0-31,64-95      0
GPU2    NV12    NV12     X      NV12    NV12    NV12    NV12    NV12    NODE    PXB     0-31,64-95      0
GPU3    NV12    NV12    NV12     X      NV12    NV12    NV12    NV12    NODE    PXB     0-31,64-95      0
GPU4    NV12    NV12    NV12    NV12     X      NV12    NV12    NV12    SYS     SYS     32-63,96-127    1
GPU5    NV12    NV12    NV12    NV12    NV12     X      NV12    NV12    SYS     SYS     32-63,96-127    1
GPU6    NV12    NV12    NV12    NV12    NV12    NV12     X      NV12    SYS     SYS     32-63,96-127    1
GPU7    NV12    NV12    NV12    NV12    NV12    NV12    NV12     X      SYS     SYS     32-63,96-127    1
mlx5_2  PXB     PXB     NODE    NODE    SYS     SYS     SYS     SYS      X      NODE
mlx5_3  NODE    NODE    PXB     PXB     SYS     SYS     SYS     SYS     NODE     X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks
```

### Error traceback

```Shell
2025-02-08 04:49:35,937 - lmdeploy - ERROR - engine.py:904 - Task <MainLoopBackground> failed
Traceback (most recent call last):
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 899, in __task_callback
    task.result()
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 857, in _async_loop_background
    await self._async_step_background(
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 735, in _async_step_background
    next_token_ids = await self.async_sampling_logits(logits, all_ids, guided_input_ids, sampling_inputs,
  File "/opt/lmdeploy/lmdeploy/utils.py", line 234, in __tmp
    return (await func(*args, **kwargs))
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 532, in async_sampling_logits
    logits = await logits_processor(all_ids, guided_input_ids, split_logits)
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/logits_process.py", line 338, in __call__
    scores = _guided_sampling(sampling_inputs.response_formats, scores, guided_input_ids, self.tokenizer)
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/logits_process.py", line 106, in _guided_sampling
    from .guided_process import _get_guided_logits_processor
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/guided_process.py", line 23, in <module>
    from outlines.fsm.guide import CFGGuide, Generate, RegexGuide, Write
  File "/opt/py3/lib/python3.10/site-packages/outlines/__init__.py", line 5, in <module>
    import outlines.types
  File "/opt/py3/lib/python3.10/site-packages/outlines/types/__init__.py", line 1, in <module>
    from . import airports, countries
  File "/opt/py3/lib/python3.10/site-packages/outlines/types/airports.py", line 4, in <module>
    from pyairports.airports import AIRPORT_LIST
  File "/opt/py3/lib/python3.10/site-packages/pyairports/airports.py", line 1, in <module>
    from pkg_resources import resource_string
  File "/opt/py3/lib/python3.10/site-packages/pkg_resources/__init__.py", line 3292, in <module>
    def _initialize_master_working_set():
  File "/opt/py3/lib/python3.10/site-packages/pkg_resources/__init__.py", line 3266, in _call_aside
    f(*args, **kwargs)
  File "/opt/py3/lib/python3.10/site-packages/pkg_resources/__init__.py", line 3304, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/opt/py3/lib/python3.10/site-packages/pkg_resources/__init__.py", line 600, in _build_master
    ws.require(__requires__)
  File "/opt/py3/lib/python3.10/site-packages/pkg_resources/__init__.py", line 937, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/opt/py3/lib/python3.10/site-packages/pkg_resources/__init__.py", line 798, in resolve
    dist = self._resolve_dist(
  File "/opt/py3/lib/python3.10/site-packages/pkg_resources/__init__.py", line 839, in _resolve_dist
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'nvidia-nccl-cu11==2.21.5; platform_system == "Linux" and platform_machine == "x86_64"' distribution was not found and is required by torch
2025-02-08 04:49:35,938 - lmdeploy - ERROR - async_engine.py:777 - session 1 finished, reason "error"
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] structed_output cannot be used in cu118 with the lated docker images #3120

Checklist

Describe the bug

Reproduction

Environment

Error traceback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] structed_output cannot be used in cu118 with the lated docker images #3120

Description

Checklist

Describe the bug

Reproduction

Environment

Error traceback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions