Skip to content

Conversation

@pisceskkk
Copy link

@pisceskkk pisceskkk commented Oct 26, 2025

When the seq_lens is exactly divisible by the dcp_world_size, it causes the dcp_local_seq_lens on all ranks to be incremented by one. Fix this calculation logic.

CC @youzhedian @minosfuture @youkaichao

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify bot added the v1 label Oct 26, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a critical bug in the calculation of local sequence lengths for decode requests under distributed context parallelism (dcp_local_seq_lens). The previous logic failed when the sequence length was perfectly divisible by the world size, incorrectly assigning an extra token to every rank. The fix implements the standard and correct method for distributing remainder tokens, ensuring the calculation is accurate in all scenarios. This change is essential for the correctness of distributed computations.

dcp_local_seq_lens[:num_decodes] = seq_lens[
:num_decodes
] // self.dcp_world_size + (
self.dcp_rank <= (seq_lens[:num_decodes] - 1) % self.dcp_world_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The previous logic for distributing remainder tokens, self.dcp_rank <= (seq_lens[:num_decodes] - 1) % self.dcp_world_size, was flawed. When a sequence length L is perfectly divisible by the world size W, (L - 1) % W evaluates to W - 1. This caused the condition to be true for all ranks, incorrectly incrementing each rank's local sequence length by one.

The new logic, self.dcp_rank < seq_lens[:num_decodes] % self.dcp_world_size, correctly handles this. When L % W == 0, no rank gets an extra token. When there is a remainder, it is correctly distributed among the first L % W ranks. This change fixes the bug and ensures correct behavior for all cases.

Suggested change
self.dcp_rank <= (seq_lens[:num_decodes] - 1) % self.dcp_world_size
self.dcp_rank < seq_lens[:num_decodes] % self.dcp_world_size

Copy link
Contributor

@minosfuture minosfuture left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@youzhedian youzhedian mentioned this pull request Oct 31, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants