Skip to content

[codex] Add synchronous ES LoRA trainer#2447

Closed
Matthew-agi wants to merge 12 commits into
PrimeIntellect-ai:mainfrom
Matthew-agi:codex/synchronous-es-trainer
Closed

[codex] Add synchronous ES LoRA trainer#2447
Matthew-agi wants to merge 12 commits into
PrimeIntellect-ai:mainfrom
Matthew-agi:codex/synchronous-es-trainer

Conversation

@Matthew-agi
Copy link
Copy Markdown

Summary

Adds a synchronous ES-LoRA trainer as a peer trainer path under prime_rl.trainer.es, plus an es launcher/config surface.

The trainer:

  • materializes LoRA candidate adapters from a flat ES parameter vector
  • loads candidate adapters into existing vLLM inference pools via Prime-RL admin endpoints
  • evaluates candidates synchronously across configured Verifiers train envs
  • updates the mean LoRA vector with one-sided or mirrored ES estimates
  • writes ES state checkpoints, the current mean adapter, metrics, and a debug config

This intentionally reuses the pieces ES needs from Prime-RL: config parsing, launch/process handling, torch distributed rank setup, env wrappers, client/admin clients, LoRA adapter serialization, logging, checkpoints, and monitors. It does not pull in SFT CE loss, optimizer, FSDP training, gradient scaling, or RL orchestrator async weight-update machinery.

Validation

Local macOS:

  • uv tool run ruff check packages/prime-rl-configs/src/prime_rl/configs/es.py src/prime_rl/entrypoints/es.py src/prime_rl/trainer/es tests/unit/train/es/test_candidates.py tests/unit/test_configs.py
  • uv tool run ruff format --check packages/prime-rl-configs/src/prime_rl/configs/es.py src/prime_rl/entrypoints/es.py src/prime_rl/trainer/es tests/unit/train/es/test_candidates.py tests/unit/test_configs.py
  • git diff --check

Prime Intellect A6000 smoke box:

  • uv run ruff check packages/prime-rl-configs/src/prime_rl/configs/es.py src/prime_rl/entrypoints/es.py src/prime_rl/trainer/es tests/unit/train/es/test_candidates.py tests/unit/test_configs.py
  • uv run ruff format --check packages/prime-rl-configs/src/prime_rl/configs/es.py src/prime_rl/entrypoints/es.py src/prime_rl/trainer/es tests/unit/train/es/test_candidates.py tests/unit/test_configs.py
  • uv run pytest tests/unit/train/es/test_candidates.py tests/unit/test_configs.py -q (79 passed)
  • uv run es @ configs/debug/es/train.toml --dry-run
  • one-step live smoke with temporary Verifiers env + vLLM PrimeIntellect/Qwen3-0.6B + dynamic LoRA loading:
    • wrote /home/ubuntu/es-smoke-output/adapter/adapter_model.safetensors
    • wrote /home/ubuntu/es-smoke-output/checkpoints/step_1/es/es_state.pt
    • metrics showed generation_s=39.07, adapter_write_s=0.22, adapter_load_s=0.19, adapter_unload_s=0.01, update_s=0.08

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant