[Weight Transfer]: Support layerwise reloading for online quantization by S1ro1 · Pull Request #2464 · PrimeIntellect-ai/prime-rl

S1ro1 · 2026-05-09T23:04:11Z

Note

Medium Risk
Touches inference weight hot-reload and NCCL/filesystem broadcast flows; incorrect flag propagation or vLLM layerwise reload behavior could break online weight updates or degrade inference stability, especially under FP8 quantization.

Overview
Adds support for vLLM layerwise weight reloading during online inference weight updates, primarily to enable online FP8 quantization (inference.quantization=fp8_per_block).

Configuration is extended with quantization and a weight_broadcast.layerwise flag (propagated/validated across trainer/orchestrator/inference), auto-enabling layerwise reload when quantization is set and forbidding incompatible combinations (e.g., layerwise vs quantize_in_weight_transfer).

Inference weight update RPCs now carry the layerwise flag end-to-end (server → worker, and orchestrator → /init_broadcaster), and weight loading is refactored to a shared load_weights_checkpoint_or_layerwise helper that uses vLLM’s initialize_layerwise_reload/finalize_layerwise_reload plus a workaround to run FP8 online conversion without torch.compile when needed.

^{Reviewed by Cursor Bugbot for commit 3974fd9. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 0683bba. Configure here.}

samsja

LGTM sir

S1ro1 and others added 3 commits May 9, 2026 23:45

Feat: initial online quant

4817c2d

Fix: third_party to .gitignore (#2463)

3c2464a

Feat: layerwise reloading

40fd87c

S1ro1 force-pushed the online-quant branch from 3a9f99d to 40fd87c Compare May 9, 2026 23:44

Merge branch 'main' into online-quant

0683bba

cursor Bot reviewed May 9, 2026

View reviewed changes

Comment thread src/prime_rl/inference/vllm/worker/nccl.py Outdated

nit

3974fd9

samsja approved these changes May 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Weight Transfer]: Support layerwise reloading for online quantization#2464

[Weight Transfer]: Support layerwise reloading for online quantization#2464
S1ro1 wants to merge 5 commits into
mainfrom
online-quant

S1ro1 commented May 9, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

samsja left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

S1ro1 commented May 9, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

samsja left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

S1ro1 commented May 9, 2026 •

edited by cursor Bot

Loading