feat: add Dynamo vLLM inference backend by samsja · Pull Request #2465 · PrimeIntellect-ai/prime-rl

samsja · 2026-05-10T02:33:31Z

Summary

Adds a dynamo inference backend alongside vLLM and SGLang, backed by Dynamo's vLLM worker.
Launches a Dynamo frontend, Dynamo vLLM worker, and prime-rl HTTP proxy that preserves the OpenAI-compatible chat completions surface expected by rollout clients.
Wires Dynamo admin routes for liveness, NCCL broadcaster init, full-weight updates, pause/resume, and route readiness.
Adds config translation, RL auto-setup defaults, dependency pins, focused config coverage, and updates the relevant skills docs.

Validation

uv run ruff check packages/prime-rl-configs/src/prime_rl/configs/inference.py packages/prime-rl-configs/src/prime_rl/configs/rl.py packages/prime-rl-configs/src/prime_rl/configs/shared.py src/prime_rl/inference/dynamo src/prime_rl/inference/server.py src/prime_rl/entrypoints/inference.py src/prime_rl/utils/client.py tests/unit/test_configs.py
uv run pytest tests/unit/test_configs.py -q (81 passed, 46 warnings)
Dry-run config resolution for both backends with the Qwen4B/Hendrycks/AIME25 comparison config:
- uv run rl @ /tmp/prime-rl-hendrycks-aime25-bs64.toml --dry-run --inference.backend vllm --wandb.name dryrun-vllm-qwen4b-aime25-500-bs64 --output-dir /tmp/prime-rl-vllm-qwen4b-aime25-500-bs64-dryrun
- uv run rl @ /tmp/prime-rl-hendrycks-aime25-bs64.toml --dry-run --inference.backend dynamo --wandb.name dryrun-dynamo-qwen4b-aime25-500-bs64 --output-dir /tmp/prime-rl-dynamo-qwen4b-aime25-500-bs64-dryrun
Online W&B 500-step comparison using the referenced Qwen4B Hendrycks Math config shape, adapted to max_steps=500, batch_size=64, max_inflight_rollouts=64, rollouts_per_example=4, and AIME2025 eval with 30 examples x 4 rollouts:
- vLLM: https://wandb.ai/primeintellect/hendrycks-math-qwen4b/runs/3438d550c9324d1b92229b601e9c87b0
- Dynamo: https://wandb.ai/primeintellect/hendrycks-math-qwen4b/runs/775b7b96d2c245b8b931ca980c6c37c0
AIME2025 evals (Avg@4 / Pass@4):
- step 100: vLLM 0.0333 / 0.0667, Dynamo 0.0667 / 0.1000
- step 200: vLLM 0.0667 / 0.1000, Dynamo 0.0583 / 0.1000
- step 300: vLLM 0.0833 / 0.1333, Dynamo 0.0833 / 0.1333
- step 400: vLLM 0.0667 / 0.1333, Dynamo 0.0917 / 0.1000
- final after step 499 (max_steps=500): vLLM 0.1333 / 0.2333, Dynamo 0.1167 / 0.1667
Final eval details:
- vLLM: Evaluated aime2025 in 12.19s (Avg@4=0.1333, Pass@1=0.1333, Pass@2=0.1889, Pass@4=0.2333, No-response: 0.0%, Completion Length: 982.02, Truncated: 85.0%)
- Dynamo: Evaluated aime2025 in 13.31s (Avg@4=0.1167, Pass@1=0.1167, Pass@2=0.1444, Pass@4=0.1667, No-response: 0.0%, Completion Length: 968.05, Truncated: 81.7%)
Both runs ended with RL training finished! and Orchestrator finished.; no matching vLLM/Dynamo run processes remained afterward.

Note: Dynamo/vLLM logs an EngineDeadError during process termination after the RL job sends SIGTERM at shutdown; the final eval, orchestrator shutdown, trainer exit, and W&B sync had already completed successfully.

feat: add dynamo inference backend

aa7b2a4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Dynamo vLLM inference backend#2465

feat: add Dynamo vLLM inference backend#2465
samsja wants to merge 1 commit into
feat/sglang-backendfrom
feat/dynamo-backend

samsja commented May 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

samsja commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

samsja commented May 10, 2026 •

edited

Loading