关于多模态模型推理启用prefix cache #2823
Unanswered
zhuchen1109
asked this question in
Q&A
Replies: 1 comment 2 replies
-
vlm 的情况下,暂未支持 prefix caching |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
我使用internvl-8b模型,因为我的prompt system会很长,我想开启来做推理加速,现在开启prefix cache会有些问题,因为图片token只是padding,很大概率被match住,我想问下,如果我修改代码来保证image部分不被match,是不是prefix cache对于我这个任务来说是有效的?
Beta Was this translation helpful? Give feedback.
All reactions