Skip to content

Support routed experts replay for vLLM P/D#2474

Open
S1ro1 wants to merge 5 commits into
mainfrom
r3-v2
Open

Support routed experts replay for vLLM P/D#2474
S1ro1 wants to merge 5 commits into
mainfrom
r3-v2

Conversation

@S1ro1
Copy link
Copy Markdown
Collaborator

@S1ro1 S1ro1 commented May 11, 2026

Summary

Adds Prime-RL support for the compact split routed-experts path used by the patched vLLM/router/verifiers stack:

  • consumes split prompt/completion routed experts as compact (shape, bytes) payloads only
  • removes support for the old nested-list/base85 routed-experts paths
  • aligns and pads routed experts fail-fast, allowing only the expected missing final-token entry
  • replays experts through trainer MoE layers using the sparse MoE-layer index emitted by vLLM
  • pads/truncates routed experts correctly during sample and micro-batch preparation
  • exposes inference.routed_experts_replay_max_blocks for vLLM routed-experts replay cache sizing
  • retries the W&B shared-mode non-primary init race surfaced by GPU integration CI
  • records the required vLLM fork state and wheel build in src/prime_rl/inference/vllm_state.md

Cross-repo PRs:

Pinned stack:

Local validation:

  • uv sync --all-extras
  • uv run ruff check .
  • uv run ruff format --check .
  • PYTEST_OUTPUT_DIR=/tmp/outputs uv run pytest tests/unit -m "not gpu" (342 passed, 65 deselected)
  • uv run ruff check src/prime_rl/utils/monitor/wandb.py
  • uv run ruff format --check src/prime_rl/utils/monitor/wandb.py

Note

Medium Risk
Medium risk because it changes the routed-experts HTTP/transport payload format and replay indexing across multiple trainer models, and pins a custom vllm wheel/verifiers revision that can affect inference behavior.

Overview
Adds end-to-end support for split prompt vs completion routed-experts in the vLLM P/D path by emitting prompt_routed_experts plus per-choice routed_experts from /inference/v1/generate using a compact base64 int16 payload, and removing the chat endpoint’s custom routed-experts capture override.

Switches trainer/orchestrator transport from nested lists to a new RoutedExperts bytes+shape struct (transport/routed_experts.py), with strict alignment/concat/slice/padding helpers applied during trajectory interleaving, batch packing, and tensor materialization.

Updates MoE model router-replay to index routed experts by sparse MoE-layer order (not decoder layer index) and asserts layer-count consistency, and exposes inference.routed_experts_replay_max_blocks for the vLLM replay cache.

Pins the patched stack by updating verifiers and the x86_64 vllm wheel source, and documents the required vLLM fork/build contract in inference/vllm_state.md; unit tests are updated to match the new payload/validation behavior.

Reviewed by Cursor Bugbot for commit 7c8adcf. Bugbot is set up for automated code reviews on this repo. Configure here.

@S1ro1 S1ro1 marked this pull request as ready for review May 11, 2026 22:16
Comment thread src/prime_rl/trainer/batch.py Outdated
@S1ro1 S1ro1 force-pushed the r3-v2 branch 3 times, most recently from f74100b to e650b11 Compare May 11, 2026 22:34
Comment thread src/prime_rl/inference/vllm/serving_tokens.py Outdated
@S1ro1 S1ro1 force-pushed the r3-v2 branch 4 times, most recently from 0ee126e to 76959ac Compare May 11, 2026 23:06
Comment thread tests/unit/inference/test_serving_tokens.py Outdated
Comment thread pyproject.toml Outdated
Comment thread src/prime_rl/inference/vllm/serving_tokens.py
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit cd4ec22. Configure here.

if prefix_len > 0 and prefix_len <= len(step_routed):
sample.routed_experts[prefix_len - 1] = step_routed[prefix_len - 1]
sample.routed_experts.extend(step_routed[prefix_len:])
sample.routed_experts = extend_routed_experts(sample.routed_experts, step_routed, prefix_len)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing null check crashes extend_routed_experts on None

High Severity

The condition guarding the extend_routed_experts call was narrowed from checking both tokens.get("routed_experts") is not None and sample.routed_experts is not None to only sample.routed_experts is not None. If the first step has routed experts (making sample.routed_experts non-None) but a subsequent step's tokens["routed_experts"] is None, then step_routed will be None and extend_routed_experts(sample.routed_experts, None, prefix_len) will crash inside validate_routed_experts when accessing None.dtype.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit cd4ec22. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant