-
Notifications
You must be signed in to change notification settings - Fork 604
[BugFix] Adapted Qwen3-Next to v0.11.2 #4477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: drslark <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adapts the Qwen3-Next model to vLLM v0.11.2. The changes include enabling a previously skipped test and fixing a variable shadowing bug in the model implementation. The fix for the shadowing bug is correct and prevents a runtime error. I have identified an additional potential critical issue: an IndexError could occur if batch_size is zero, which would crash the server. I've provided a comment with a code suggestion to make the code more robust against this edge case.
| tar_dtype = temp_core_attn_out[0].dtype | ||
| tar_device = temp_core_attn_out[0].device | ||
| tar_shape = list(temp_core_attn_out[0].shape) | ||
| tar_shape[1] = non_spec_query_start_loc[-1] | ||
| core_attn_out_non_spec = torch.empty(tar_shape, | ||
| dtype=tar_dtype, | ||
| device=tar_device) | ||
| for b_idx in range(batch_size): | ||
| cur_core_attn_out = core_attn_out[b_idx] | ||
| cur_core_attn_out = temp_core_attn_out[b_idx] | ||
| start, end = non_spec_query_start_loc[ | ||
| b_idx], non_spec_query_start_loc[b_idx + 1] | ||
| core_attn_out_non_spec[:, start:end, ...] = cur_core_attn_out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a potential IndexError here. If batch_size is 0, temp_core_attn_out will be an empty list, and accessing temp_core_attn_out[0] at line 708 will raise an exception. While it seems unlikely for batch_size to be 0 when num_prefills > 0, it's safer to guard against this to prevent a server crash.
Additionally, torch.cat(last_recurrent_state, dim=0) at line 720 will also fail if last_recurrent_state is an empty list (when batch_size is 0).
I suggest wrapping this block and line 720 in a check for batch_size > 0 and handling the batch_size == 0 case separately by creating empty tensors for core_attn_out_non_spec and last_recurrent_state.
Here is a suggested implementation for lines 708-719. Please note that line 720 should also be moved inside the if batch_size > 0: block.
if batch_size > 0:
tar_dtype = temp_core_attn_out[0].dtype
tar_device = temp_core_attn_out[0].device
tar_shape = list(temp_core_attn_out[0].shape)
tar_shape[1] = non_spec_query_start_loc[-1]
core_attn_out_non_spec = torch.empty(tar_shape,
dtype=tar_dtype,
device=tar_device)
for b_idx in range(batch_size):
cur_core_attn_out = temp_core_attn_out[b_idx]
start, end = non_spec_query_start_loc[
b_idx], non_spec_query_start_loc[b_idx + 1]
core_attn_out_non_spec[:, start:end, ...] = cur_core_attn_out
else:
num_v_heads = self.num_v_heads // self.tp_size
core_attn_out_non_spec = torch.empty(
(1, 0, num_v_heads, self.head_v_dim),
dtype=ssm_state.dtype,
device=ssm_state.device
)
What this PR does / why we need it?
Adapted Qwen3-Next to
v0.11.2.For the program:
The output was:
The output now is:
Does this PR introduce any user-facing change?
N/A
How was this patch tested?