We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable_chunked_prefill每次配置成false启动完毕,打开启动会重新变成none enable_prefix_cache 默认参数拼写错误,应该是enable_prefix_caching
最新的
xinference launch --model-name custom-DeepSeek-R1-Distill-Qwen-14B --model-type LLM --model-engine vLLM --model-format pytorch --size-in-billions 14 --quantization none --n-gpu auto --replica 1 --n-worker 1 --gpu-idx 0,1,2,3 --gpu_memory_utilization 0.95 --enforce_eager true
启动模型的时候可见
参数配置符合预期
The text was updated successfully, but these errors were encountered:
关于 enable_prefix_caching:
#2998 (comment)
欢迎提交 PR 修复。
Sorry, something went wrong.
No branches or pull requests
System Info / 系統信息
enable_chunked_prefill每次配置成false启动完毕,打开启动会重新变成none
enable_prefix_cache 默认参数拼写错误,应该是enable_prefix_caching
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
最新的
The command used to start Xinference / 用以启动 xinference 的命令
xinference launch --model-name custom-DeepSeek-R1-Distill-Qwen-14B --model-type LLM --model-engine vLLM --model-format pytorch --size-in-billions 14 --quantization none --n-gpu auto --replica 1 --n-worker 1 --gpu-idx 0,1,2,3 --gpu_memory_utilization 0.95 --enforce_eager true
Reproduction / 复现过程
启动模型的时候可见
Expected behavior / 期待表现
参数配置符合预期
The text was updated successfully, but these errors were encountered: