W8A8量化VLM模型,对应的torch_dtype不一致 #3219
zhuchen1109
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
使用w8a8量化vlm模型,命令如下:
lmdeploy lite smooth_quant /opt/hostVolumes/zhuchen/models/Qwen2.5-3B-Instruct --work-dir /opt/hostVolumes/zhuchen/models/Qwen2.5-3B-Instruct-w8 --quant-dtype int8 --dtype bfloat16
原模型config.json中定义的torch_dtype是bfloat16。使用以上命令量化后,config.json中被写入的torch_dtype是float16。这样精度会不一致。这会导致推理时,当temperature设置极小值,如e-6时,会使得logits计算scores会溢出。llm模型没有这个问题。
请教这个如何解决?
Beta Was this translation helpful? Give feedback.
All reactions