Skip to content

Fix Qwen3.5 MTP sanitize norm shift#1320

Closed
xxxkkw wants to merge 1 commit into
ml-explore:mainfrom
xxxkkw:qwen35-mtp-sanitize-norm
Closed

Fix Qwen3.5 MTP sanitize norm shift#1320
xxxkkw wants to merge 1 commit into
ml-explore:mainfrom
xxxkkw:qwen35-mtp-sanitize-norm

Conversation

@xxxkkw
Copy link
Copy Markdown

@xxxkkw xxxkkw commented May 28, 2026

Summary

  • Stop using stripped mtp.* weights as a signal to shift Qwen3.5 norm weights by +1 during sanitize.
  • Keep the raw-checkpoint norm shift tied to unsanitized Conv1d layout conversion.
  • Cover both qwen3_5 and qwen3_5_moe wrapper paths so already-converted checkpoints with MTP weights can be loaded without shifting norms twice.

Environment

  • OS: macOS Darwin 25.4.0 arm64
  • Hardware: Apple M1 Max, 32 GB unified memory
  • Python: local project virtual environment

Testing

  • python -m unittest discover -s tests -p test_models.py -k qwen3_5_family_convert_then_load_norm_not_shift_twice
    • Result: 1 passed

Benchmark / profiling notes

  • This is a sanitize/load correctness fix and does not change inference kernels.
  • The regression test reproduces the load-sanitize path with already-converted norm weights plus stripped MTP weights; before this change the MTP key alone triggered an extra norm shift.

Context

Drop stripped MTP weights without treating their presence as a raw-checkpoint signal that shifts normalized weights during load sanitize.
@nastya236 nastya236 added the bug Something isn't working label Jun 4, 2026
@nastya236
Copy link
Copy Markdown
Collaborator

Thanks for flagging this! This one can be closed when we merge #1198.
Closing this PR for now.

@nastya236 nastya236 closed this Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants