-
Notifications
You must be signed in to change notification settings - Fork 624
[Bugfix] Fix bug with establishing the flashcomm2 and pp communication domains. #4458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes a bug in the flashcomm2 communication domain setup by incorporating pipeline parallelism (pp). The logic for calculating global ranks and forming communication groups has been updated to account for the pipeline parallel size. My review includes one suggestion to improve the readability and maintainability of the complex rank calculation logic, which I consider important for this critical part of the code.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Signed-off-by: zzhx1 <[email protected]>
Signed-off-by: zzhx1 <[email protected]>
…ifications. Co-authored-by: Levi-JQ <[email protected]> Signed-off-by: zzhx1 <[email protected]>
Signed-off-by: zzhx1 <[email protected]>
|
@ApsarasX Please check this PR and help to merge. |
|
@wangxiyuan Please help merge this pr, this is a bugfix for the flashcomm2 communication domain. |
…n domains. (vllm-project#4458) ### What this PR does / why we need it? The previous implementation of the flashcomm2 communication domain did not consider pp(pipeline parallel), which caused problems when enabling pp and flashcomm2. This PR fixes this issue. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: zzhx1 <[email protected]> Co-authored-by: Levi-JQ <[email protected]>
What this PR does / why we need it?
The previous implementation of the flashcomm2 communication domain did not consider pp(pipeline parallel), which caused problems when enabling pp and flashcomm2. This PR fixes this issue.