Skip to content

feat: add Dynamo vLLM inference backend#2465

Draft
samsja wants to merge 1 commit into
feat/sglang-backendfrom
feat/dynamo-backend
Draft

feat: add Dynamo vLLM inference backend#2465
samsja wants to merge 1 commit into
feat/sglang-backendfrom
feat/dynamo-backend

Conversation

@samsja
Copy link
Copy Markdown
Member

@samsja samsja commented May 10, 2026

Summary

  • Adds a dynamo inference backend alongside vLLM and SGLang, backed by Dynamo's vLLM worker.
  • Launches a Dynamo frontend, Dynamo vLLM worker, and prime-rl HTTP proxy that preserves the OpenAI-compatible chat completions surface expected by rollout clients.
  • Wires Dynamo admin routes for liveness, NCCL broadcaster init, full-weight updates, pause/resume, and route readiness.
  • Adds config translation, RL auto-setup defaults, dependency pins, focused config coverage, and updates the relevant skills docs.

Validation

  • uv run ruff check packages/prime-rl-configs/src/prime_rl/configs/inference.py packages/prime-rl-configs/src/prime_rl/configs/rl.py packages/prime-rl-configs/src/prime_rl/configs/shared.py src/prime_rl/inference/dynamo src/prime_rl/inference/server.py src/prime_rl/entrypoints/inference.py src/prime_rl/utils/client.py tests/unit/test_configs.py
  • uv run pytest tests/unit/test_configs.py -q (81 passed, 46 warnings)
  • Dry-run config resolution for both backends with the Qwen4B/Hendrycks/AIME25 comparison config:
    • uv run rl @ /tmp/prime-rl-hendrycks-aime25-bs64.toml --dry-run --inference.backend vllm --wandb.name dryrun-vllm-qwen4b-aime25-500-bs64 --output-dir /tmp/prime-rl-vllm-qwen4b-aime25-500-bs64-dryrun
    • uv run rl @ /tmp/prime-rl-hendrycks-aime25-bs64.toml --dry-run --inference.backend dynamo --wandb.name dryrun-dynamo-qwen4b-aime25-500-bs64 --output-dir /tmp/prime-rl-dynamo-qwen4b-aime25-500-bs64-dryrun
  • Online W&B 500-step comparison using the referenced Qwen4B Hendrycks Math config shape, adapted to max_steps=500, batch_size=64, max_inflight_rollouts=64, rollouts_per_example=4, and AIME2025 eval with 30 examples x 4 rollouts:
  • AIME2025 evals (Avg@4 / Pass@4):
    • step 100: vLLM 0.0333 / 0.0667, Dynamo 0.0667 / 0.1000
    • step 200: vLLM 0.0667 / 0.1000, Dynamo 0.0583 / 0.1000
    • step 300: vLLM 0.0833 / 0.1333, Dynamo 0.0833 / 0.1333
    • step 400: vLLM 0.0667 / 0.1333, Dynamo 0.0917 / 0.1000
    • final after step 499 (max_steps=500): vLLM 0.1333 / 0.2333, Dynamo 0.1167 / 0.1667
  • Final eval details:
    • vLLM: Evaluated aime2025 in 12.19s (Avg@4=0.1333, Pass@1=0.1333, Pass@2=0.1889, Pass@4=0.2333, No-response: 0.0%, Completion Length: 982.02, Truncated: 85.0%)
    • Dynamo: Evaluated aime2025 in 13.31s (Avg@4=0.1167, Pass@1=0.1167, Pass@2=0.1444, Pass@4=0.1667, No-response: 0.0%, Completion Length: 968.05, Truncated: 81.7%)
  • Both runs ended with RL training finished! and Orchestrator finished.; no matching vLLM/Dynamo run processes remained afterward.

Note: Dynamo/vLLM logs an EngineDeadError during process termination after the RL job sends SIGTERM at shutdown; the final eval, orchestrator shutdown, trainer exit, and W&B sync had already completed successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant