Skip to content

Conversation

@iKunHvv
Copy link

@iKunHvv iKunHvv commented Nov 26, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to address potential precision issues with torch.cumsum by casting the input to float32. While the intention is correct, the implementation introduces a subtle bug by modifying the probs_sort tensor's data type. This affects subsequent operations that rely on the original data type. My review includes a suggestion to fix this while keeping the benefit of higher precision for the cumulative sum calculation.

Comment on lines 54 to 57
old_dtype = probs_sort.dtype
probs_sort = probs_sort.to(torch.float32)
cumprob = torch.cumsum(probs_sort, dim=-1)
cumprob = cumprob.to(old_dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While casting to float32 for cumsum is a good idea to prevent precision loss, reassigning probs_sort to a float32 tensor creates an issue. The modified probs_sort is used later in probs_sort.gather on line 62. This results in top_p_cutoff having float32 dtype, which then gets compared to probs (likely float16 or bfloat16) on line 63. This can lead to unexpected behavior and potential performance degradation due to implicit type promotion.

It's better to perform the type conversion for the cumsum operation without reassigning probs_sort.

Suggested change
old_dtype = probs_sort.dtype
probs_sort = probs_sort.to(torch.float32)
cumprob = torch.cumsum(probs_sort, dim=-1)
cumprob = cumprob.to(old_dtype)
cumprob = torch.cumsum(probs_sort.to(torch.float32), dim=-1).to(probs_sort.dtype)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant