Skip to content

Compose split routed experts from vLLM responses#1349

Open
S1ro1 wants to merge 7 commits into
mainfrom
feat/split-routed-experts
Open

Compose split routed experts from vLLM responses#1349
S1ro1 wants to merge 7 commits into
mainfrom
feat/split-routed-experts

Conversation

@S1ro1
Copy link
Copy Markdown
Contributor

@S1ro1 S1ro1 commented May 11, 2026

Summary

  • add a compact RoutedExperts token payload type backed by int16 bytes
  • compose vLLM split prompt_routed_experts plus completion routed experts into one sequence-aligned payload
  • decode only the compact base64 object emitted by patched vLLM; the old base85/list routed-experts path is not supported
  • update chat, completions, and renderer clients to consume the new split routed-experts response shape
  • preserve routed experts when response tokens are truncated
  • merge current main so the PR also carries the renderer multimodal sidecar changes without conflicts

Validation

  • uv run pytest tests/test_renderer_client.py tests/test_env_server.py -q
  • uvx ruff@0.15.12 format --isolated --check .
  • uvx ruff@0.15.12 check --isolated .

Note

Medium Risk
Touches token parsing/serialization paths across multiple clients and changes the routed_experts wire format/type, so malformed/partial payloads or shape mismatches could break downstream consumers and truncation behavior.

Overview
Adds a new RoutedExperts bytes-based type plus verifiers/clients/routed_experts.py utilities to decode base64 int16 routed-expert payloads and compose split prompt+completion routing into a single sequence-aligned buffer.

Updates the OpenAI chat, OpenAI completions, and renderer clients to read routed-expert data from model_extra/response fields (prompt_routed_experts + completion routed_experts), removing the previous inline base85/NumPy decode path, and threads the composed routing into ResponseTokens.

Adjusts response token truncation to slice the new bytes-based routed-experts buffer correctly so routing metadata remains consistent when prompts/completions are clipped.

Reviewed by Cursor Bugbot for commit 162cffb. Bugbot is set up for automated code reviews on this repo. Configure here.

@S1ro1 S1ro1 force-pushed the feat/split-routed-experts branch from 277ab6e to 8dbd674 Compare May 11, 2026 22:28
@S1ro1 S1ro1 marked this pull request as ready for review May 11, 2026 22:42
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8dbd674. Configure here.

Comment thread verifiers/utils/response_utils.py Outdated
Comment thread verifiers/clients/routed_experts.py
Comment thread verifiers/clients/openai_chat_completions_client.py Outdated
@willccbb
Copy link
Copy Markdown
Member

willccbb commented May 12, 2026

can we put this in a utils file? trying to keep most folders unified by object type, e.g clients is full of _client.py files at top-level. could be verifiers.utils.router_utils/client_utils or verifiers.clients.utils.router_utils

@S1ro1
Copy link
Copy Markdown
Contributor Author

S1ro1 commented May 12, 2026

can we put this in a utils file?

Yeah can clean up after, for now not 100% sure we can merge, it hits inference speed quite a lot and adds a lot of engineering overhead on prime-rl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants