-
-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend][Bugfix] support prefill decode disaggregation on deepseek #14824
[Frontend][Bugfix] support prefill decode disaggregation on deepseek #14824
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
@billishyahao code style check failed, please re-format your code follow the guide here |
16b2028
to
0ed934c
Compare
Signed-off-by: billishyahao <[email protected]>
Co-authored-by: Zhai Feiyue <[email protected]> Signed-off-by: billishyahao <[email protected]>
Co-authored-by: Zhai Feiyue <[email protected]> Signed-off-by: billishyahao <[email protected]>
Signed-off-by: billishyahao <[email protected]>
Signed-off-by: billishyahao <[email protected]>
layer.self_attn.attn._k_scale, | ||
layer.self_attn.attn._v_scale, | ||
) | ||
if hasattr(model_config, "kv_lora_rank"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we should base this on the use_mla flag instead of the presence of kv_lora_rank
so that it respects the VLLM_MLA_DISABLE
env var
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @LucasWilkinson Thanks for the comment. According to this, I introduced those flags to make sure we are safe on both VLLM_MLA_DISABLE=0 and VLLM_MLA_DISABLE=1.
self.is_deepseek_mla = config.model_config.is_deepseek_mla
self.use_mla_opt = not envs.VLLM_MLA_DISABLE
…ll decode disaggregation Signed-off-by: billishyahao <[email protected]>
Good point! Fixed this. @ZhaiFeiyue |
Signed-off-by: billishyahao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to me from an MLA perspective but im not spun up on prefill decode disaggregation (cc @KuntaiDu )
Hi @KuntaiDu , Could you please take a look at the code change? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic LGTM, some small suggestions on improving the code readability.
layer.self_attn.attn._k_scale, | ||
layer.self_attn.attn._v_scale, | ||
) | ||
if self.is_deepseek_mla and self.use_mla_opt: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great if we can move this logic to a separate utility file, and move model-specific logics there, say creating a file called model_aware_kv_ops.py
and have something like
put_kv_to_vllm(kv_cache, keys, values, **kwargs)
I talked to @billishyahao and he/she will follow-up a PR on this, so we're good to go! |
Thank you! @KuntaiDu for the comments from PD disaggregation design perspective. As we talk offline, I am going to keep function of current patch, and there is another xPyD PR: [Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore by ShangmingCai · Pull Request #12957 · vllm-project/vllm . It introduced new mooncakestore connector, which also need this dispatcher for different models. I plan to make a following up code change to add a unify function |
Sure thing. If this pr gets merged, you can support it for MooncakeStoreConnector too, I will review it. |
Sounds good! Thank you @ShangmingCai |
…llm-project#14824) Signed-off-by: billishyahao <[email protected]> Co-authored-by: Zhai Feiyue <[email protected]> Signed-off-by: cj <[email protected]>
This patch aims to provides the following:
After applying the PR, we got the correct output from 1P1D case on deepseek v2 :