Skip to content

Conversation

@Fager10086
Copy link

@Fager10086 Fager10086 commented Nov 26, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds CI configurations for multi-node tests with a layerwise connector for DeepSeek-V3 and Qwen3 models. My review focuses on the correctness and consistency of these new YAML configuration files.

I've found a few issues:

  • The Qwen3-235B-W8A8-layerwise.yaml file seems to be an incomplete copy of the DeepSeek-V3 configuration. It's missing flags for setting execution modes (--enforce-eager for the producer and graph mode config for the consumer) and has an empty benchmarks section. These omissions will likely lead to incorrect or incomplete CI test runs.
  • The DeepSeek-V3-layerwise.yaml file contains a misleading comment at the top, describing a 4-node setup while the configuration is for 2 nodes.

I've provided suggestions to fix these inconsistencies and complete the configuration.

Comment on lines +4 to +11
# Suppose we have **4 nodes** running a 2P1D setup (2 Prefillers + 1 Decoder):
# ┌───────────────┬───────────────┬───────────────┬───────────────┐
# │ node0 │ node1 │ node2 │ node3 │
# │ Prefiller #1 │ Prefiller #2 │ Decoder │ Decoder │
# └───────────────┴───────────────┴───────────────┴───────────────┘
# For the prefiller nodes. the hosts should be node0 and node1
# For the decoder nodes. we only have 1 decoder node(dp+tp+ep across node2 and node3. Where node3 is running with headless mode)
# So the prefiller_host_index is [0, 1], and the decoder_host_index is [2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The comment describes a 4-node setup, but the configuration below uses num_nodes: 2. This is misleading. Please update the comment to reflect the actual 2-node (1 prefiller, 1 decoder) setup being configured.

# Suppose we have **2 nodes** running a 1P1D setup (1 Prefiller + 1 Decoder):
#   ┌───────────────┬───────────────┐
#   │   node0       │   node1       │
#   │   Prefiller   │   Decoder     │
#   └───────────────┴───────────────┘
# For the prefiller node, the host is node0.
# For the decoder node, the host is node1.
# So the prefiller_host_index is [0], and the decoder_host_index is [1].

--data-parallel-size 2
--data-parallel-size-local 2
--tensor-parallel-size 8
--seed 1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The producer (prefill node) deployment is missing the --enforce-eager flag. In disaggregated prefill setups, it's common to run the prefill stage in eager mode for flexibility, while the decode stage runs in graph mode for performance. The DeepSeek-V3-layerwise.yaml configuration in this PR follows this pattern. For consistency and to ensure correct behavior, please add this flag.

        --seed 1024
        --enforce-eager

--enable-expert-parallel
--trust-remote-code
--no-enable-prefix-caching
--gpu-memory-utilization 0.9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The consumer (decode node) deployment is missing the configuration to enable graph mode (--additional-config '{\"torchair_graph_config\":{\"enabled\":true}}'). This is inconsistent with the DeepSeek-V3-layerwise.yaml config, where the decoder is explicitly configured to run in graph mode for better performance. Please add this configuration for consistency and to leverage performance optimizations.

        --gpu-memory-utilization 0.9
        --additional-config '{\"torchair_graph_config\":{\"enabled\":true}}'

}
}
}'
benchmarks:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The benchmarks section is empty. For the CI to run a meaningful test, it needs benchmark configuration, such as an accuracy test. This seems to be an oversight, as the DeepSeek-V3-layerwise.yaml file includes a benchmark configuration. Please add the appropriate benchmark configuration, similar to the other test file.

benchmarks:
  acc:
    case_type: accuracy
    dataset_path: vllm-ascend/gsm8k-lite
    request_conf: vllm_api_general_chat
    dataset_conf: gsm8k/gsm8k_gen_0_shot_cot_chat_prompt
    max_out_len: 4096
    batch_size: 512
    baseline: 95
    threshold: 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant