[Frontend][Bugfix] support prefill decode disaggregation on deepseek #14824

billishyahao · 2025-03-14T15:19:46Z

This patch aims to provides the following:

fix for issue [Bug]: 'DeepseekV2Model' object has no attribute 'config' when enabling P/D Disaggregation #14324
support prefill decode disaggregation on deepseek

After applying the PR, we got the correct output from 1P1D case on deepseek v2 :

curl -X POST -s http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{
"model": "deepseek-ai/DeepSeek-V2-Lite",
"prompt": "San Jose is a",
"max_tokens": 50,
"temperature": 0
}'

{"id":"cmpl-e85d4162e19942d5a5f0a31304b31188","object":"text_completion","created":1741965377,"model":"deepseek-ai/DeepSeek-V2-Lite","choices":[{"index":0,"text":" city in the Bay Area of California. It is the largest city in Northern California and the third largest in the state. It is the county seat of Santa Clara County.\nSan Jose is a major center for high-tech industry, with many major","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":5,"total_tokens":55,"completion_tokens":50,"prompt_tokens_details":null}}

github-actions · 2025-03-14T15:19:59Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vllm/distributed/kv_transfer/kv_connector/simple_connector.py

ZhaiFeiyue · 2025-03-15T02:45:24Z

@billishyahao code style check failed, please re-format your code follow the guide here

Signed-off-by: billishyahao <[email protected]>

Co-authored-by: Zhai Feiyue <[email protected]> Signed-off-by: billishyahao <[email protected]>

Signed-off-by: billishyahao <[email protected]>

LucasWilkinson · 2025-03-15T05:26:03Z

vllm/distributed/kv_transfer/kv_connector/simple_connector.py

-                    layer.self_attn.attn._k_scale,
-                    layer.self_attn.attn._v_scale,
-                )
+                if hasattr(model_config, "kv_lora_rank"):


Ideally we should base this on the use_mla flag instead of the presence of kv_lora_rank so that it respects the VLLM_MLA_DISABLE env var

Hi @LucasWilkinson Thanks for the comment. According to this, I introduced those flags to make sure we are safe on both VLLM_MLA_DISABLE=0 and VLLM_MLA_DISABLE=1.

self.is_deepseek_mla = config.model_config.is_deepseek_mla self.use_mla_opt = not envs.VLLM_MLA_DISABLE

…ll decode disaggregation Signed-off-by: billishyahao <[email protected]>

billishyahao · 2025-03-15T15:54:52Z

@billishyahao code style check failed, please re-format your code follow the guide here

Good point! Fixed this. @ZhaiFeiyue

vllm/distributed/kv_transfer/kv_connector/simple_connector.py

Signed-off-by: billishyahao <[email protected]>

LucasWilkinson

LGTM to me from an MLA perspective but im not spun up on prefill decode disaggregation (cc @KuntaiDu )

billishyahao · 2025-03-19T06:56:08Z

Hi @KuntaiDu , Could you please take a look at the code change? Thanks!

KuntaiDu

The logic LGTM, some small suggestions on improving the code readability.

KuntaiDu · 2025-03-20T02:44:06Z

vllm/distributed/kv_transfer/kv_connector/simple_connector.py

-                    layer.self_attn.attn._k_scale,
-                    layer.self_attn.attn._v_scale,
-                )
+                if self.is_deepseek_mla and self.use_mla_opt:


Would be great if we can move this logic to a separate utility file, and move model-specific logics there, say creating a file called model_aware_kv_ops.py and have something like
put_kv_to_vllm(kv_cache, keys, values, **kwargs)

KuntaiDu · 2025-03-20T02:53:27Z

I talked to @billishyahao and he/she will follow-up a PR on this, so we're good to go!

billishyahao · 2025-03-20T02:58:00Z

Thank you! @KuntaiDu for the comments from PD disaggregation design perspective. As we talk offline, I am going to keep function of current patch, and there is another xPyD PR: [Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore by ShangmingCai · Pull Request #12957 · vllm-project/vllm . It introduced new mooncakestore connector, which also need this dispatcher for different models. I plan to make a following up code change to add a unify function put_kv_to_vllm .

ShangmingCai · 2025-03-20T06:27:16Z

Thank you! @KuntaiDu for the comments from PD disaggregation design perspective. As we talk offline, I am going to keep function of current patch, and there is another xPyD PR: [Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore by ShangmingCai · Pull Request #12957 · vllm-project/vllm . It introduced new mooncakestore connector, which also need this dispatcher for different models. I plan to make a following up code change to add a unify function put_kv_to_vllm .

Sure thing. If this pr gets merged, you can support it for MooncakeStoreConnector too, I will review it.

billishyahao · 2025-03-20T07:23:36Z

Thank you! @KuntaiDu for the comments from PD disaggregation design perspective. As we talk offline, I am going to keep function of current patch, and there is another xPyD PR: [Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore by ShangmingCai · Pull Request #12957 · vllm-project/vllm . It introduced new mooncakestore connector, which also need this dispatcher for different models. I plan to make a following up code change to add a unify function put_kv_to_vllm .

Sure thing. If this pr gets merged, you can support it for MooncakeStoreConnector too, I will review it.

Sounds good! Thank you @ShangmingCai

…llm-project#14824) Signed-off-by: billishyahao <[email protected]> Co-authored-by: Zhai Feiyue <[email protected]> Signed-off-by: cj <[email protected]>

billishyahao mentioned this pull request Mar 14, 2025

[Bug]: 'DeepseekV2Model' object has no attribute 'config' when enabling P/D Disaggregation #14324

Open

1 task

ZhaiFeiyue reviewed Mar 14, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/simple_connector.py Outdated Show resolved Hide resolved

ZhaiFeiyue reviewed Mar 14, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/simple_connector.py Outdated Show resolved Hide resolved

ZhaiFeiyue reviewed Mar 14, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/simple_connector.py Outdated Show resolved Hide resolved

ZhaiFeiyue reviewed Mar 14, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/simple_connector.py Outdated Show resolved Hide resolved

ZhaiFeiyue reviewed Mar 14, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/simple_connector.py Outdated Show resolved Hide resolved

billishyahao changed the title ~~[ds][feat] support prefill decode disaggregation on deepseek~~ [Frontend][Bugfix] support prefill decode disaggregation on deepseek Mar 15, 2025

billishyahao force-pushed the billhe/dspd_upstream branch from 16b2028 to 0ed934c Compare March 15, 2025 05:07

billishyahao and others added 5 commits March 15, 2025 05:12

[ds][feat] support prefill decode disaggregation on deepseek

0204ffb

Signed-off-by: billishyahao <[email protected]>

eliminate log

ba32387

Co-authored-by: Zhai Feiyue <[email protected]> Signed-off-by: billishyahao <[email protected]>

eliminate log

e356261

Co-authored-by: Zhai Feiyue <[email protected]> Signed-off-by: billishyahao <[email protected]>

eliminate log

707c762

Signed-off-by: billishyahao <[email protected]>

fix lint

0ed934c

Signed-off-by: billishyahao <[email protected]>

LucasWilkinson reviewed Mar 15, 2025

View reviewed changes

mergify bot added the documentation Improvements or additions to documentation label Mar 15, 2025

fix VLLM_MLA_DISABLE=1 scenario and add examples for deepseekv2 prefi…

8ff78ca

…ll decode disaggregation Signed-off-by: billishyahao <[email protected]>

LucasWilkinson reviewed Mar 15, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/simple_connector.py Outdated Show resolved Hide resolved

improvement

39aa51a

Signed-off-by: billishyahao <[email protected]>

billishyahao requested review from LucasWilkinson and ZhaiFeiyue March 16, 2025 05:58

LucasWilkinson requested review from KuntaiDu and removed request for ZhaiFeiyue March 16, 2025 20:35

LucasWilkinson approved these changes Mar 16, 2025

View reviewed changes

simon-mo approved these changes Mar 17, 2025

View reviewed changes

KuntaiDu reviewed Mar 20, 2025

View reviewed changes

KuntaiDu enabled auto-merge (squash) March 20, 2025 02:53

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 20, 2025

vllm-bot merged commit 742369d into vllm-project:main Mar 20, 2025
44 of 48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend][Bugfix] support prefill decode disaggregation on deepseek #14824

[Frontend][Bugfix] support prefill decode disaggregation on deepseek #14824

billishyahao commented Mar 14, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Mar 14, 2025

ZhaiFeiyue commented Mar 15, 2025

LucasWilkinson Mar 15, 2025

billishyahao Mar 15, 2025 •

edited

Loading

billishyahao commented Mar 15, 2025

LucasWilkinson left a comment

billishyahao commented Mar 19, 2025

KuntaiDu left a comment

KuntaiDu Mar 20, 2025

KuntaiDu commented Mar 20, 2025

billishyahao commented Mar 20, 2025

ShangmingCai commented Mar 20, 2025

billishyahao commented Mar 20, 2025

[Frontend][Bugfix] support prefill decode disaggregation on deepseek #14824

[Frontend][Bugfix] support prefill decode disaggregation on deepseek #14824

Conversation

billishyahao commented Mar 14, 2025 • edited by github-actions bot Loading

github-actions bot commented Mar 14, 2025

ZhaiFeiyue commented Mar 15, 2025

LucasWilkinson Mar 15, 2025

Choose a reason for hiding this comment

billishyahao Mar 15, 2025 • edited Loading

Choose a reason for hiding this comment

billishyahao commented Mar 15, 2025

LucasWilkinson left a comment

Choose a reason for hiding this comment

billishyahao commented Mar 19, 2025

KuntaiDu left a comment

Choose a reason for hiding this comment

KuntaiDu Mar 20, 2025

Choose a reason for hiding this comment

KuntaiDu commented Mar 20, 2025

billishyahao commented Mar 20, 2025

ShangmingCai commented Mar 20, 2025

billishyahao commented Mar 20, 2025

billishyahao commented Mar 14, 2025 •

edited by github-actions bot

Loading

billishyahao Mar 15, 2025 •

edited

Loading