Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend][Bugfix] support prefill decode disaggregation on deepseek #14824

Merged
merged 7 commits into from
Mar 20, 2025

Conversation

billishyahao
Copy link
Contributor

@billishyahao billishyahao commented Mar 14, 2025

This patch aims to provides the following:

  1. fix for issue [Bug]: 'DeepseekV2Model' object has no attribute 'config' when enabling P/D Disaggregation #14324
  2. support prefill decode disaggregation on deepseek

After applying the PR, we got the correct output from 1P1D case on deepseek v2 :

curl -X POST -s http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{
"model": "deepseek-ai/DeepSeek-V2-Lite",
"prompt": "San Jose is a",
"max_tokens": 50,
"temperature": 0
}'

{"id":"cmpl-e85d4162e19942d5a5f0a31304b31188","object":"text_completion","created":1741965377,"model":"deepseek-ai/DeepSeek-V2-Lite","choices":[{"index":0,"text":" city in the Bay Area of California. It is the largest city in Northern California and the third largest in the state. It is the county seat of Santa Clara County.\nSan Jose is a major center for high-tech industry, with many major","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":5,"total_tokens":55,"completion_tokens":50,"prompt_tokens_details":null}}

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@ZhaiFeiyue
Copy link
Contributor

@billishyahao code style check failed, please re-format your code follow the guide here

@billishyahao billishyahao changed the title [ds][feat] support prefill decode disaggregation on deepseek [Frontend][Bugfix] support prefill decode disaggregation on deepseek Mar 15, 2025
@billishyahao billishyahao force-pushed the billhe/dspd_upstream branch from 16b2028 to 0ed934c Compare March 15, 2025 05:07
billishyahao and others added 5 commits March 15, 2025 05:12
Co-authored-by: Zhai Feiyue <[email protected]>
Signed-off-by: billishyahao <[email protected]>
Co-authored-by: Zhai Feiyue <[email protected]>
Signed-off-by: billishyahao <[email protected]>
Signed-off-by: billishyahao <[email protected]>
Signed-off-by: billishyahao <[email protected]>
layer.self_attn.attn._k_scale,
layer.self_attn.attn._v_scale,
)
if hasattr(model_config, "kv_lora_rank"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should base this on the use_mla flag instead of the presence of kv_lora_rank so that it respects the VLLM_MLA_DISABLE env var

Copy link
Contributor Author

@billishyahao billishyahao Mar 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @LucasWilkinson Thanks for the comment. According to this, I introduced those flags to make sure we are safe on both VLLM_MLA_DISABLE=0 and VLLM_MLA_DISABLE=1.

self.is_deepseek_mla = config.model_config.is_deepseek_mla
self.use_mla_opt = not envs.VLLM_MLA_DISABLE

@mergify mergify bot added the documentation Improvements or additions to documentation label Mar 15, 2025
@billishyahao
Copy link
Contributor Author

@billishyahao code style check failed, please re-format your code follow the guide here

Good point! Fixed this. @ZhaiFeiyue

Signed-off-by: billishyahao <[email protected]>
@LucasWilkinson LucasWilkinson requested review from KuntaiDu and removed request for ZhaiFeiyue March 16, 2025 20:35
Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to me from an MLA perspective but im not spun up on prefill decode disaggregation (cc @KuntaiDu )

@billishyahao
Copy link
Contributor Author

Hi @KuntaiDu , Could you please take a look at the code change? Thanks!

Copy link
Collaborator

@KuntaiDu KuntaiDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic LGTM, some small suggestions on improving the code readability.

layer.self_attn.attn._k_scale,
layer.self_attn.attn._v_scale,
)
if self.is_deepseek_mla and self.use_mla_opt:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if we can move this logic to a separate utility file, and move model-specific logics there, say creating a file called model_aware_kv_ops.py and have something like
put_kv_to_vllm(kv_cache, keys, values, **kwargs)

@KuntaiDu
Copy link
Collaborator

I talked to @billishyahao and he/she will follow-up a PR on this, so we're good to go!

@KuntaiDu KuntaiDu enabled auto-merge (squash) March 20, 2025 02:53
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 20, 2025
@billishyahao
Copy link
Contributor Author

Thank you! @KuntaiDu for the comments from PD disaggregation design perspective. As we talk offline, I am going to keep function of current patch, and there is another xPyD PR: [Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore by ShangmingCai · Pull Request #12957 · vllm-project/vllm . It introduced new mooncakestore connector, which also need this dispatcher for different models. I plan to make a following up code change to add a unify function put_kv_to_vllm .

@ShangmingCai
Copy link
Contributor

Thank you! @KuntaiDu for the comments from PD disaggregation design perspective. As we talk offline, I am going to keep function of current patch, and there is another xPyD PR: [Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore by ShangmingCai · Pull Request #12957 · vllm-project/vllm . It introduced new mooncakestore connector, which also need this dispatcher for different models. I plan to make a following up code change to add a unify function put_kv_to_vllm .

Sure thing. If this pr gets merged, you can support it for MooncakeStoreConnector too, I will review it.

@vllm-bot vllm-bot merged commit 742369d into vllm-project:main Mar 20, 2025
44 of 48 checks passed
@billishyahao
Copy link
Contributor Author

Thank you! @KuntaiDu for the comments from PD disaggregation design perspective. As we talk offline, I am going to keep function of current patch, and there is another xPyD PR: [Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore by ShangmingCai · Pull Request #12957 · vllm-project/vllm . It introduced new mooncakestore connector, which also need this dispatcher for different models. I plan to make a following up code change to add a unify function put_kv_to_vllm .

Sure thing. If this pr gets merged, you can support it for MooncakeStoreConnector too, I will review it.

Sounds good! Thank you @ShangmingCai

cjsdurj pushed a commit to cjsdurj/vllm that referenced this pull request Mar 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants