-
Notifications
You must be signed in to change notification settings - Fork 613
[P/D] add layerwise connector CI #4468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
6419299 to
8b166ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds CI configurations for multi-node tests with a layerwise connector for DeepSeek-V3 and Qwen3 models. My review focuses on the correctness and consistency of these new YAML configuration files.
I've found a few issues:
- The
Qwen3-235B-W8A8-layerwise.yamlfile seems to be an incomplete copy of the DeepSeek-V3 configuration. It's missing flags for setting execution modes (--enforce-eagerfor the producer and graph mode config for the consumer) and has an emptybenchmarkssection. These omissions will likely lead to incorrect or incomplete CI test runs. - The
DeepSeek-V3-layerwise.yamlfile contains a misleading comment at the top, describing a 4-node setup while the configuration is for 2 nodes.
I've provided suggestions to fix these inconsistencies and complete the configuration.
| # Suppose we have **4 nodes** running a 2P1D setup (2 Prefillers + 1 Decoder): | ||
| # ┌───────────────┬───────────────┬───────────────┬───────────────┐ | ||
| # │ node0 │ node1 │ node2 │ node3 │ | ||
| # │ Prefiller #1 │ Prefiller #2 │ Decoder │ Decoder │ | ||
| # └───────────────┴───────────────┴───────────────┴───────────────┘ | ||
| # For the prefiller nodes. the hosts should be node0 and node1 | ||
| # For the decoder nodes. we only have 1 decoder node(dp+tp+ep across node2 and node3. Where node3 is running with headless mode) | ||
| # So the prefiller_host_index is [0, 1], and the decoder_host_index is [2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment describes a 4-node setup, but the configuration below uses num_nodes: 2. This is misleading. Please update the comment to reflect the actual 2-node (1 prefiller, 1 decoder) setup being configured.
# Suppose we have **2 nodes** running a 1P1D setup (1 Prefiller + 1 Decoder):
# ┌───────────────┬───────────────┐
# │ node0 │ node1 │
# │ Prefiller │ Decoder │
# └───────────────┴───────────────┘
# For the prefiller node, the host is node0.
# For the decoder node, the host is node1.
# So the prefiller_host_index is [0], and the decoder_host_index is [1].| --data-parallel-size 2 | ||
| --data-parallel-size-local 2 | ||
| --tensor-parallel-size 8 | ||
| --seed 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The producer (prefill node) deployment is missing the --enforce-eager flag. In disaggregated prefill setups, it's common to run the prefill stage in eager mode for flexibility, while the decode stage runs in graph mode for performance. The DeepSeek-V3-layerwise.yaml configuration in this PR follows this pattern. For consistency and to ensure correct behavior, please add this flag.
--seed 1024
--enforce-eager| --enable-expert-parallel | ||
| --trust-remote-code | ||
| --no-enable-prefix-caching | ||
| --gpu-memory-utilization 0.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The consumer (decode node) deployment is missing the configuration to enable graph mode (--additional-config '{\"torchair_graph_config\":{\"enabled\":true}}'). This is inconsistent with the DeepSeek-V3-layerwise.yaml config, where the decoder is explicitly configured to run in graph mode for better performance. Please add this configuration for consistency and to leverage performance optimizations.
--gpu-memory-utilization 0.9
--additional-config '{\"torchair_graph_config\":{\"enabled\":true}}'| } | ||
| } | ||
| }' | ||
| benchmarks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmarks section is empty. For the CI to run a meaningful test, it needs benchmark configuration, such as an accuracy test. This seems to be an oversight, as the DeepSeek-V3-layerwise.yaml file includes a benchmark configuration. Please add the appropriate benchmark configuration, similar to the other test file.
benchmarks:
acc:
case_type: accuracy
dataset_path: vllm-ascend/gsm8k-lite
request_conf: vllm_api_general_chat
dataset_conf: gsm8k/gsm8k_gen_0_shot_cot_chat_prompt
max_out_len: 4096
batch_size: 512
baseline: 95
threshold: 5Signed-off-by: Fager10086 <[email protected]>
8b166ed to
6562cb9
Compare
Signed-off-by: Fager10086 <[email protected]>
Signed-off-by: Fager10086 <[email protected]>
4621ac2 to
408677e
Compare
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?