feat: wire r3 v3 routed experts replay by S1ro1 · Pull Request #2487 · PrimeIntellect-ai/prime-rl

S1ro1 · 2026-05-13T11:59:12Z

Summary

expose choices[i].routed_experts as compact base64 NumPy payloads from the prime-rl vLLM token/chat wrappers
keep routed experts as a first-class TrainingSample.routed_experts field using a RoutedExperts transport struct; tolist() is too expensive, so the struct carries raw bytes plus shape/dtype
stitch multi-turn routed experts by mutating the existing sample only, then load the packed bytes in the trainer with torch.frombuffer
reject trainer.enable_router_replay with inference.kv_cache_offload; CPU KV offload/router-cache recovery is intentionally not supported in this version
pin the upstream vLLM nightly wheel 0.20.2rc1.dev354+g24337fb86.cu129 mirrored to the prime-rl v0.5.0 release, and keep the prime-rl vLLM plugin patches for upstream compatibility
patch vLLM config validation in prime-rl to allow routed-experts capture with the NIXL connector; P/D routed experts are stitched by the router, while CPU KV offload remains rejected by prime-rl validation
pin verifiers to upstream main 7fdf522
pin vllm-router to release 0.1.24, which includes P/D routed-experts stitching

Related PRs

Router: Feat: merge routed experts for P/D router#29
Verifiers: fix: move routed experts helpers to response utils verifiers#1373

Verification

uv lock --check
uv run ruff check --config=pyproject.toml
uv run ruff format --check --config=pyproject.toml
uv run ruff check src/prime_rl/transport/types.py src/prime_rl/orchestrator/trajectories.py src/prime_rl/trainer/batch.py src/prime_rl/trainer/rl/data.py src/prime_rl/inference/vllm/routed_experts.py src/prime_rl/inference/vllm/serving_tokens.py src/prime_rl/inference/vllm/serving_chat_with_tokens.py src/prime_rl/inference/patches.py tests/unit/inference/test_serving_tokens.py tests/unit/orchestrator/test_batch.py tests/unit/orchestrator/test_trajectories.py
git diff --check
uv run python - <<'PY' ... transformers_v5_compat() ... PY to verify the vLLM plugin patches DPEngineCoreProc on the nightly wheel
uv run pytest tests/unit/inference/test_serving_tokens.py tests/unit/orchestrator/test_batch.py tests/unit/orchestrator/test_trajectories.py (59 passed)

Note

Medium Risk
Changes the inference→orchestrator→trainer data contract for routed_experts (new packed-bytes struct and base64 NumPy payloads) and updates batch assembly/tensorization logic, which could break router-replay or training if shape/dtype handling is off. Also pins to a custom vLLM wheel and adjusts vLLM monkey patches, increasing integration risk across upstream versions.

Overview
Enables router replay to consume compact routed-expert decisions end-to-end by exporting choices[i].routed_experts as a base64-encoded NumPy payload (new serialize_routed_experts/RoutedExpertsCapture) and updating both the chat and tokens vLLM serving wrappers to attach this field.

Refactors the training data path to avoid expensive tolist() conversions by introducing a RoutedExperts transport struct (raw bytes + shape + dtype) and updating trajectory stitching, batch packing/padding, and trainer tensorization (torch.frombuffer) to operate on the packed representation.

Adds a config validation that forbids trainer.enable_router_replay with inference.kv_cache_offload, tweaks vLLM DP pause/resume monkey patches to bypass upstream two-phase pause behavior, and updates dependencies/pins (custom vllm wheel, verifiers rev, uv lock updates including tokenspeed-mla).

^{Reviewed by Cursor Bugbot for commit 9438623. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor · 2026-05-14T16:06:50Z

-            sample.routed_experts.extend(step_routed[prefix_len:])
+            if prefix_len > 0 and prefix_len <= step_routed.shape[0]:
+                sample_routed_experts[prefix_len - 1] = step_routed[prefix_len - 1]
+            sample_routed_experts = np.concatenate((sample_routed_experts, step_routed[prefix_len:]), axis=0)


Mixed compact dtypes cause silent truncation during stitching

Medium Severity

When stitching multi-turn routed experts, _decode_routed_experts preserves each step's independently-chosen compact dtype. If step 1 serializes as uint8 (all expert IDs ≤ 255) and step 2 as int16 (some IDs > 255), the boundary replacement sample_routed_experts[prefix_len - 1] = step_routed[prefix_len - 1] writes int16 values into a uint8 array, silently truncating expert IDs via numpy overflow. The subsequent np.concatenate upcasts correctly, but the corrupted boundary value persists. This affects models with more than 255 experts where per-step value ranges happen to differ.

Additional Locations (1)

src/prime_rl/inference/vllm/routed_experts.py#L19-L30

^{Reviewed by Cursor Bugbot for commit 9438623. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit d6d06b4. Configure here.}

cursor · 2026-05-14T19:08:41Z

+                .reshape(packed_routed_experts.shape)
+                .to(torch.int32)
+                .unsqueeze(0)
+            )


Read-only tensor from torch.frombuffer on immutable bytes

Medium Severity

torch.frombuffer is called on packed_routed_experts.data which is bytes (immutable). When the compact dtype is already int32, .to(torch.int32) is a no-op returning self, so the final tensor remains read-only and backed by the immutable buffer. For uint8/int16 sources, .to(torch.int32) creates a writable copy, making the behavior dtype-dependent. The analogous pixel_values conversion at line 228 correctly wraps in bytearray(...) to ensure mutability. Passing bytearray(packed_routed_experts.data) here would make the behavior consistent and safe regardless of source dtype.

^{Reviewed by Cursor Bugbot for commit d6d06b4. Configure here.}

S1ro1 added 2 commits May 13, 2026 17:41

feat: wire r3 v3 routed experts stack

4895316

feat: reset routing caches on policy update

721a874

S1ro1 force-pushed the feat/r3-v3-routed-experts branch from bf79561 to 721a874 Compare May 13, 2026 12:13

S1ro1 added 3 commits May 13, 2026 18:02

fix: rely on native vllm routed experts

baa6935

fix: pin routed experts dependencies for ci

18e9a7a

fix: update scheduler tests for prefix reset

90d2e3a

S1ro1 force-pushed the feat/r3-v3-routed-experts branch from bc91c30 to e55328f Compare May 14, 2026 14:09

fix: clean routed experts replay integration

1fea38e

S1ro1 force-pushed the feat/r3-v3-routed-experts branch from e55328f to 1fea38e Compare May 14, 2026 14:13

S1ro1 added 3 commits May 14, 2026 20:39

fix: keep routed experts transport first class

2c019e1

fix: keep routed experts on samples

803b4ae

fix: use upstream vllm nightly wheel

9092eca

S1ro1 marked this pull request as ready for review May 14, 2026 15:52

S1ro1 and others added 2 commits May 14, 2026 21:23

fix: pin latest routed experts verifiers

f49caec

Merge branch 'main' into feat/r3-v3-routed-experts

9438623

cursor Bot reviewed May 14, 2026

View reviewed changes

S1ro1 added 3 commits May 15, 2026 00:28

fix: pin routed experts dependencies

61a0388

fix: allow routed experts with nixl

094d233

style: format nixl patch

d6d06b4

cursor Bot reviewed May 14, 2026

View reviewed changes

samsja approved these changes May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: wire r3 v3 routed experts replay#2487

feat: wire r3 v3 routed experts replay#2487
S1ro1 wants to merge 14 commits into
mainfrom
feat/r3-v3-routed-experts

S1ro1 commented May 13, 2026 •

edited

Loading

Uh oh!

cursor Bot May 14, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

S1ro1 commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related PRs

Verification

Uh oh!

cursor Bot May 14, 2026

Choose a reason for hiding this comment

Mixed compact dtypes cause silent truncation during stitching

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 14, 2026

Choose a reason for hiding this comment

Read-only tensor from torch.frombuffer on immutable bytes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

S1ro1 commented May 13, 2026 •

edited

Loading

Read-only tensor from `torch.frombuffer` on immutable bytes