-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[bugfix] fix wrong dcp_local_seq_lens calc
#27518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Qiu <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a critical bug in the calculation of local sequence lengths for decode requests under distributed context parallelism (dcp_local_seq_lens). The previous logic failed when the sequence length was perfectly divisible by the world size, incorrectly assigning an extra token to every rank. The fix implements the standard and correct method for distributing remainder tokens, ensuring the calculation is accurate in all scenarios. This change is essential for the correctness of distributed computations.
| dcp_local_seq_lens[:num_decodes] = seq_lens[ | ||
| :num_decodes | ||
| ] // self.dcp_world_size + ( | ||
| self.dcp_rank <= (seq_lens[:num_decodes] - 1) % self.dcp_world_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous logic for distributing remainder tokens, self.dcp_rank <= (seq_lens[:num_decodes] - 1) % self.dcp_world_size, was flawed. When a sequence length L is perfectly divisible by the world size W, (L - 1) % W evaluates to W - 1. This caused the condition to be true for all ranks, incorrectly incrementing each rank's local sequence length by one.
The new logic, self.dcp_rank < seq_lens[:num_decodes] % self.dcp_world_size, correctly handles this. When L % W == 0, no rank gets an extra token. When there is a remainder, it is correctly distributed among the first L % W ranks. This change fixes the bug and ensures correct behavior for all cases.
| self.dcp_rank <= (seq_lens[:num_decodes] - 1) % self.dcp_world_size | |
| self.dcp_rank < seq_lens[:num_decodes] % self.dcp_world_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
When the
seq_lensis exactly divisible by thedcp_world_size, it causes thedcp_local_seq_lenson all ranks to be incremented by one. Fix this calculation logic.CC @youzhedian @minosfuture @youkaichao