vllm参数配置bug #3070

Liguoz · 2025-03-15T11:15:59Z

System Info / 系統信息

enable_chunked_prefill每次配置成false启动完毕，打开启动会重新变成none
enable_prefix_cache 默认参数拼写错误，应该是enable_prefix_caching

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

xinference launch --model-name custom-DeepSeek-R1-Distill-Qwen-14B --model-type LLM --model-engine vLLM --model-format pytorch --size-in-billions 14 --quantization none --n-gpu auto --replica 1 --n-worker 1 --gpu-idx 0,1,2,3 --gpu_memory_utilization 0.95 --enforce_eager true

Reproduction / 复现过程

启动模型的时候可见

Expected behavior / 期待表现

参数配置符合预期

qinxuye · 2025-03-17T02:47:52Z

关于 enable_prefix_caching：

#2998 (comment)

欢迎提交 PR 修复。

XprobeBot added the gpu label Mar 15, 2025

XprobeBot added this to the v1.x milestone Mar 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm参数配置bug #3070

vllm参数配置bug #3070

Liguoz commented Mar 15, 2025

qinxuye commented Mar 17, 2025

vllm参数配置bug #3070

vllm参数配置bug #3070

Comments

Liguoz commented Mar 15, 2025

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

qinxuye commented Mar 17, 2025