Support routed experts replay for vLLM P/D by S1ro1 · Pull Request #2474 · PrimeIntellect-ai/prime-rl

S1ro1 · 2026-05-11T21:44:43Z

Summary

Adds Prime-RL support for the compact split routed-experts path used by the patched vLLM/router/verifiers stack:

consumes split prompt/completion routed experts as compact (shape, bytes) payloads only
removes support for the old nested-list/base85 routed-experts paths
aligns and pads routed experts fail-fast, allowing only the expected missing final-token entry
replays experts through trainer MoE layers using the sparse MoE-layer index emitted by vLLM
pads/truncates routed experts correctly during sample and micro-batch preparation
exposes inference.routed_experts_replay_max_blocks for vLLM routed-experts replay cache sizing
retries the W&B shared-mode non-primary init race surfaced by GPU integration CI
records the required vLLM fork state and wheel build in src/prime_rl/inference/vllm_state.md

Cross-repo PRs:

vLLM fork: Support routed-experts replay with prefix caching S1ro1/vllm#3
router: Merge routed experts for vLLM P/D responses router#28
verifiers: Compose split routed experts from vLLM responses verifiers#1349

Pinned stack:

verifiers: 162cffb
vllm: 0.20.2rc1.dev214+g24c0208fc.precompiled
vLLM wheel: https://github.com/PrimeIntellect-ai/prime-rl/releases/download/v0.5.0/vllm-0.20.2rc1.dev214%2Bg24c0208fc.precompiled-cp312-cp312-linux_x86_64.whl
vLLM wheel sha256: 437a618dd32400d2636e17de266061aa6685001653e6b9a78751e3ae53036e51
vllm-router: kept at upstream 0.1.22 with a TODO: update router wheel when ready comment; P/D routed experts require router PR Add qwq #28 once the wheel is published

Local validation:

uv sync --all-extras
uv run ruff check .
uv run ruff format --check .
PYTEST_OUTPUT_DIR=/tmp/outputs uv run pytest tests/unit -m "not gpu" (342 passed, 65 deselected)
uv run ruff check src/prime_rl/utils/monitor/wandb.py
uv run ruff format --check src/prime_rl/utils/monitor/wandb.py

Note

Medium Risk
Medium risk because it changes the routed-experts HTTP/transport payload format and replay indexing across multiple trainer models, and pins a custom vllm wheel/verifiers revision that can affect inference behavior.

Overview
Adds end-to-end support for split prompt vs completion routed-experts in the vLLM P/D path by emitting prompt_routed_experts plus per-choice routed_experts from /inference/v1/generate using a compact base64 int16 payload, and removing the chat endpoint’s custom routed-experts capture override.

Switches trainer/orchestrator transport from nested lists to a new RoutedExperts bytes+shape struct (transport/routed_experts.py), with strict alignment/concat/slice/padding helpers applied during trajectory interleaving, batch packing, and tensor materialization.

Updates MoE model router-replay to index routed experts by sparse MoE-layer order (not decoder layer index) and asserts layer-count consistency, and exposes inference.routed_experts_replay_max_blocks for the vLLM replay cache.

Pins the patched stack by updating verifiers and the x86_64 vllm wheel source, and documents the required vLLM fork/build contract in inference/vllm_state.md; unit tests are updated to match the new payload/validation behavior.

^{Reviewed by Cursor Bugbot for commit 7c8adcf. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit cd4ec22. Configure here.}

cursor · 2026-05-12T19:19:42Z

-            if prefix_len > 0 and prefix_len <= len(step_routed):
-                sample.routed_experts[prefix_len - 1] = step_routed[prefix_len - 1]
-            sample.routed_experts.extend(step_routed[prefix_len:])
+            sample.routed_experts = extend_routed_experts(sample.routed_experts, step_routed, prefix_len)


Missing null check crashes extend_routed_experts on None

High Severity

The condition guarding the extend_routed_experts call was narrowed from checking both tokens.get("routed_experts") is not None and sample.routed_experts is not None to only sample.routed_experts is not None. If the first step has routed experts (making sample.routed_experts non-None) but a subsequent step's tokens["routed_experts"] is None, then step_routed will be None and extend_routed_experts(sample.routed_experts, None, prefix_len) will crash inside validate_routed_experts when accessing None.dtype.

^{Reviewed by Cursor Bugbot for commit cd4ec22. Configure here.}

S1ro1 force-pushed the r3-v2 branch from c9d9d14 to 93b189e Compare May 11, 2026 22:13

S1ro1 marked this pull request as ready for review May 11, 2026 22:16

cursor Bot reviewed May 11, 2026

View reviewed changes

Comment thread src/prime_rl/trainer/batch.py Outdated

S1ro1 force-pushed the r3-v2 branch 3 times, most recently from f74100b to e650b11 Compare May 11, 2026 22:34

cursor Bot reviewed May 11, 2026

View reviewed changes

Comment thread src/prime_rl/inference/vllm/serving_tokens.py Outdated

S1ro1 force-pushed the r3-v2 branch 4 times, most recently from 0ee126e to 76959ac Compare May 11, 2026 23:06

feat: support split routed experts replay

c72cce9

S1ro1 force-pushed the r3-v2 branch from 76959ac to c72cce9 Compare May 11, 2026 23:14

perf: reduce routed experts response overhead

7f26c50

cursor Bot reviewed May 12, 2026

View reviewed changes

Comment thread tests/unit/inference/test_serving_tokens.py Outdated

Comment thread pyproject.toml Outdated

Comment thread src/prime_rl/inference/vllm/serving_tokens.py

chore: finalize routed experts payload pins

cd4ec22

cursor Bot reviewed May 12, 2026

View reviewed changes

S1ro1 added 2 commits May 13, 2026 00:59

chore: pin finalized routed experts stack

7c8adcf

fix: retry shared wandb init race

060800d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support routed experts replay for vLLM P/D#2474

Support routed experts replay for vLLM P/D#2474
S1ro1 wants to merge 5 commits into
mainfrom
r3-v2

S1ro1 commented May 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

S1ro1 commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 12, 2026

Choose a reason for hiding this comment

Missing null check crashes extend_routed_experts on None

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

S1ro1 commented May 11, 2026 •

edited

Loading

Missing null check crashes `extend_routed_experts` on None