Skip to content

fix: preserve Qwen3.5 broadcast weight names#2690

Open
samsja wants to merge 9 commits into
mainfrom
exp/qwen35-kl-wordle
Open

fix: preserve Qwen3.5 broadcast weight names#2690
samsja wants to merge 9 commits into
mainfrom
exp/qwen35-kl-wordle

Conversation

@samsja
Copy link
Copy Markdown
Member

@samsja samsja commented Jun 2, 2026

What changed

image
  • Preserves Qwen3.5 HF hub weight names during live broadcast and checkpoint export.
  • Routes NCCL/filesystem broadcast and weight checkpoint export through a shared helper.
  • Updates the training monitor skill with the Qwen3.5 skipped-weight failure mode.

Why

Qwen3.5 VLM checkpoints already use model.language_model... HF hub naming. Calling transformers.core_model_loading.revert_weight_conversion on those keys rewrites them into the wrong namespace for vLLM live reload. In the failing runs, vLLM logged skipped keys like language_model.language_model..., so inference was not receiving updated LM/linear-attention weights after trainer updates.

With the bypass, raw HF keys are sent for model_type = "qwen3_5"; vLLM maps them to its internal language_model.model... keys and the weight updates load.

Validation

  • uv sync --all-extras
  • uv run ruff check src/prime_rl/trainer/weights.py src/prime_rl/trainer/ckpt.py src/prime_rl/trainer/rl/broadcast/nccl.py src/prime_rl/trainer/rl/broadcast/filesystem.py src/prime_rl/trainer/models/qwen3_5_moe/modeling_qwen3_5_moe.py
  • Slurm run 23166 reached trainer step 9 with Mismatch KL between 0.0005 and 0.0010; old skipped-weight logs were gone.

W&B: https://wandb.ai/primeintellect/wordle/runs/6f6cfafdf0274166ad038e7e79375f29


Note

Medium Risk
Transformers version and weight-export/broadcast behavior affect Qwen3.5 inference sync; dependency pin changes the whole training stack’s HF behavior.

Overview
Pins Transformers to 5.6.2 on PyPI (replacing the git pin and >=5.1.0.dev0 override) so training, checkpoints, and vLLM live reload share a single release aligned with Qwen3.5 fixes.

In the diff, broadcast/checkpoint paths still call revert_weight_conversion for non–PrimeRL models; only redundant inline comments were removed in ckpt.py and filesystem.py. Qwen3.5 workaround docstrings in model.py now say they can drop once an official Transformers release includes the fixes (not a specific git commit).

The start-run skill adds steps to verify verifier env packages import under uv run and how to wire missing local envs into pyproject.toml before rl launches.

Reviewed by Cursor Bugbot for commit 1143117. Bugbot is set up for automated code reviews on this repo. Configure here.

@samsja samsja force-pushed the exp/qwen35-kl-wordle branch from 3a3a31f to 13ee170 Compare June 2, 2026 18:15
@samsja samsja force-pushed the exp/qwen35-kl-wordle branch from 13ee170 to 099f5a5 Compare June 2, 2026 20:42
@samsja samsja changed the title exp: add qwen35 kl debug configs fix: preserve Qwen3.5 broadcast weight names Jun 2, 2026
@samsja samsja marked this pull request as ready for review June 3, 2026 01:57
Comment thread pyproject.toml Outdated
"science-env",
"simpleqa-verified",
"tau2-bench",
"wordle",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intended?

Copy link
Copy Markdown
Collaborator

@S1ro1 S1ro1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm boss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants