Skip to content

Conversation

@Toneymiller
Copy link

@Toneymiller Toneymiller commented Nov 26, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: zxwang <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Triton kernel for RMS Normalization to support fused bias addition, which is a good performance optimization. My review has identified a few critical issues in the implementation of the new Triton kernel and its wrapper function: a hardcoded device property which should be dynamic, passing None to the kernel which will likely cause a crash, and a leftover debug print statement. Additionally, the new code path involving the Triton kernel does not appear to be covered by unit tests. It would be beneficial to add tests for this new functionality to ensure its correctness and prevent future regressions. Please address the comments to improve the robustness and correctness of the implementation.

Comment on lines +134 to +135
# _, num_vectorcore = get_device_properties()
num_vectorcore = 40
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The num_vectorcore is hardcoded to 40. The code should use the get_device_properties() function to dynamically fetch this value, as intended by the commented-out line. Hardcoding device properties makes the code less portable and may lead to suboptimal performance on different hardware.

Suggested change
# _, num_vectorcore = get_device_properties()
num_vectorcore = 40
_, num_vectorcore = get_device_properties()

Comment on lines +147 to +148
residual.stride(0) if residual is not None else None,
residual_out.stride(0) if residual is not None else None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Passing None for stride arguments to a Triton kernel can lead to a TypeError at runtime. When residual is None, None is passed for stride_z_row and stride_z_out_row. A dummy integer value, such as 0, should be passed instead.

Suggested change
residual.stride(0) if residual is not None else None,
residual_out.stride(0) if residual is not None else None,
residual.stride(0) if residual is not None else 0,
residual_out.stride(0) if residual_out is not None else 0,


x_hat = x * rstd
x_hat = x_hat.to(original_dtype)
tl.device_print("[Row %d]xxxdtype: %s", row_idx, x_hat.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

A debug print statement tl.device_print is present in the Triton kernel. This should be removed from production code as it can generate a lot of output and may impact performance.

Suggested change
tl.device_print("[Row %d]xxxdtype: %s", row_idx, x_hat.dtype)
# tl.device_print("[Row %d]xxxdtype: %s", row_idx, x_hat.dtype)

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

zxwang added 2 commits November 26, 2025 17:20
Signed-off-by: zxwang <[email protected]>
Signed-off-by: zxwang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant