[codex] Add synchronous ES LoRA trainer by Matthew-agi · Pull Request #2447 · PrimeIntellect-ai/prime-rl

Matthew-agi · 2026-05-08T15:40:43Z

Summary

Adds a synchronous ES-LoRA trainer as a peer trainer path under prime_rl.trainer.es, plus an es launcher/config surface.

The trainer:

materializes LoRA candidate adapters from a flat ES parameter vector
loads candidate adapters into existing vLLM inference pools via Prime-RL admin endpoints
evaluates candidates synchronously across configured Verifiers train envs
updates the mean LoRA vector with one-sided or mirrored ES estimates
writes ES state checkpoints, the current mean adapter, metrics, and a debug config

This intentionally reuses the pieces ES needs from Prime-RL: config parsing, launch/process handling, torch distributed rank setup, env wrappers, client/admin clients, LoRA adapter serialization, logging, checkpoints, and monitors. It does not pull in SFT CE loss, optimizer, FSDP training, gradient scaling, or RL orchestrator async weight-update machinery.

Validation

Local macOS:

uv tool run ruff check packages/prime-rl-configs/src/prime_rl/configs/es.py src/prime_rl/entrypoints/es.py src/prime_rl/trainer/es tests/unit/train/es/test_candidates.py tests/unit/test_configs.py
uv tool run ruff format --check packages/prime-rl-configs/src/prime_rl/configs/es.py src/prime_rl/entrypoints/es.py src/prime_rl/trainer/es tests/unit/train/es/test_candidates.py tests/unit/test_configs.py
git diff --check

Prime Intellect A6000 smoke box:

uv run ruff check packages/prime-rl-configs/src/prime_rl/configs/es.py src/prime_rl/entrypoints/es.py src/prime_rl/trainer/es tests/unit/train/es/test_candidates.py tests/unit/test_configs.py
uv run ruff format --check packages/prime-rl-configs/src/prime_rl/configs/es.py src/prime_rl/entrypoints/es.py src/prime_rl/trainer/es tests/unit/train/es/test_candidates.py tests/unit/test_configs.py
uv run pytest tests/unit/train/es/test_candidates.py tests/unit/test_configs.py -q (79 passed)
uv run es @ configs/debug/es/train.toml --dry-run
one-step live smoke with temporary Verifiers env + vLLM PrimeIntellect/Qwen3-0.6B + dynamic LoRA loading:
- wrote /home/ubuntu/es-smoke-output/adapter/adapter_model.safetensors
- wrote /home/ubuntu/es-smoke-output/checkpoints/step_1/es/es_state.pt
- metrics showed generation_s=39.07, adapter_write_s=0.22, adapter_load_s=0.19, adapter_unload_s=0.01, update_s=0.08

Matt added 12 commits May 8, 2026 11:40

Add synchronous ES LoRA trainer

b593707

Add concision Gemma ES environment

ade5657

Match prior one-sided ES update

ad8c258

Add persistent vLLM LoRA slots for ES

f66480e

Add GSM8K ES configs

8723a9c

Preserve GSM8K ES output directory on launch

09077ed

Add ES mean-policy eval support

0a2d749

Add ES vLLM slot benchmark script

d365942

Initialize dist in ES vLLM benchmark

745eec7

Add GSM8K vLLM sweep configs

6fb5e81

Tune GSM8K ES serving config

a7ae727

Use Qwen3 base model for GSM8K configs

3320b58

Matthew-agi closed this May 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Add synchronous ES LoRA trainer#2447

[codex] Add synchronous ES LoRA trainer#2447
Matthew-agi wants to merge 12 commits into
PrimeIntellect-ai:mainfrom
Matthew-agi:codex/synchronous-es-trainer

Matthew-agi commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Matthew-agi commented May 8, 2026

Summary

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant