Add SFT LoRA support by philippnormann · Pull Request #1849 · PrimeIntellect-ai/prime-rl

philippnormann · 2026-02-22T21:38:42Z

Summary

Add LoRA runtime setup to the SFT trainer path when model.lora is enabled.
Initialize MultiRunManager state and LoRA scaling (alpha / rank) for SFT LoRA runs.
Set per-step LoRA token counts so LoRA-wrapped layers receive correct token partitioning metadata.

Why

SFT LoRA was not fully wired as a first-class runtime path and could fail at startup without manual setup.

Before

SFT with LoRA could fail with:
- RuntimeError: MultiRunManager not initialized. Please call setup_multi_run_manager first.

After

SFT LoRA starts and advances training steps in the default trainer path.

Evidence

Reverse-text SFT loss/mean convergence (full-ft vs LoRA), 200 steps.

Configs used:

sft_fullft_rtext_200.toml

max_steps = 200

[ckpt]
interval = 20

[model]
name = "PrimeIntellect/Qwen3-0.6B"

[data]
name = "willcb/R1-reverse-wikipedia-paragraphs-v1-1000"
seq_len = 4096
batch_size = 32

[optim]
lr = 2e-5

sft_lora_rtext_200.toml

max_steps = 200

[ckpt]
interval = 20

[ckpt.weights]
save_adapter_separately = true
save_format = "safetensors"

[model]
name = "PrimeIntellect/Qwen3-0.6B"

[model.lora]
rank = 16
alpha = 32
dropout = 0.0
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

[data]
name = "willcb/R1-reverse-wikipedia-paragraphs-v1-1000"
seq_len = 4096
batch_size = 32

[optim]
lr = 5e-4

Validation

SFT LoRA run completes 200 steps without MultiRunManager initialization failures.
Attached reverse-text loss curve confirms stable optimization/convergence behavior.

Scope

This PR covers SFT LoRA runtime support.

Note

Medium Risk
Modifies SFT training initialization and distributed checkpoint saving paths for LoRA, including collectives over FSDP/DTensor; mistakes could break training startup or produce invalid adapter checkpoints.

Overview
Enables SFT runs with model.lora by initializing the MultiRunManager, setting LoRA scaling (alpha / rank), and updating per-step LoRA token counts so LoRA layers receive correct partition metadata.

Updates weight checkpointing when save_adapter_separately is enabled to save PEFT-compatible adapter artifacts via save_state_dict, while also capturing MultiRun LoRA state across all ranks (handling DTensor.full_tensor() collectives under FSDP).

Adds GPU CI integration configs and a new integration test that runs SFT LoRA start + resume, asserts loss decreases, and validates adapter checkpoint structure/keys.

^{Written by Cursor Bugbot for commit 833afe9. This will update automatically on new commits. Configure here.}

philippnormann · 2026-02-27T17:55:50Z

Hi! It would be great to get some feedback here and in #1850 when you have time 😌

I rebased on the latest main and addressed all issues raised by the bot checks. If you think the design should be adjusted, I’d be very happy to make further changes.

I also have a follow-up PR prepared for LoRA warm-start support in the RL trainer, enabling end-to-end SFT+RL with LoRA adapters (without merges). I was waiting for feedback here and in #1850 before opening the next ones.

If possible, could someone also trigger CI for both PRs?

Thanks a lot! 🙏🏼

philippnormann · 2026-03-20T01:25:06Z

Just pushed an update that adds integration tests and fixes the adapter checkpoint export so saved adapters are PEFT-compatible / vLLM-loadable:

Using get_state_dict_for_run(0) in ckpt.py for clean adapter keys instead of the .0-indexed output from get_adapter_state_dict
Saving the adapter via save_state_dict so the format respects config (safetensors by default)
Integration tests in tests/integration/test_sft_lora.py with start/resume coverage and adapter key format assertions

Tested end-to-end on 1x 4090 and 2x H100, adapter loads in vLLM.

philippnormann · 2026-03-21T02:11:23Z

Thanks for the feedback @Jackmin801!

I cleaned this up so save_to_path() now just saves the final adapter state dict it gets, and for save_adapter_separately that state dict is assembled earlier directly from get_multi_run_manager().get_state_dict_for_run(0).

The filtering was from an attempt to also keep modules_to_save, but it can be dropped since current vLLM does not support adapters with non-empty modules_to_save.

Jackmin801

Nice! Good work. lgtm

philippnormann force-pushed the feature/sft-lora-support branch from 6700860 to d1cfa48 Compare February 26, 2026 10:47

Jackmin801 force-pushed the feature/sft-lora-support branch from d1cfa48 to 6e30981 Compare March 6, 2026 03:51

philippnormann mentioned this pull request Mar 10, 2026

Add SFT validation eval with val_data #1850

Merged

philippnormann force-pushed the feature/sft-lora-support branch from 6e30981 to 723a6c5 Compare March 10, 2026 12:50

Add SFT LoRA support

29bf608

philippnormann force-pushed the feature/sft-lora-support branch from 723a6c5 to 29bf608 Compare March 20, 2026 01:18

Fix ruff formatting and add missing comment

833afe9

Jackmin801 reviewed Mar 20, 2026

View reviewed changes

Comment thread src/prime_rl/trainer/ckpt.py Outdated

Jackmin801 reviewed Mar 20, 2026

View reviewed changes

Comment thread src/prime_rl/trainer/ckpt.py Outdated

Use MultiRunManager for SFT LoRA adapter export

69dc853

Rename SFT LoRA run adapter export helper

fa22226

Jackmin801 approved these changes Mar 21, 2026

View reviewed changes

Jackmin801 merged commit 4875c10 into PrimeIntellect-ai:main Mar 21, 2026
12 of 16 checks passed

philippnormann deleted the feature/sft-lora-support branch April 1, 2026 01:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SFT LoRA support#1849

Add SFT LoRA support#1849
Jackmin801 merged 4 commits into
PrimeIntellect-ai:mainfrom
philippnormann:feature/sft-lora-support

philippnormann commented Feb 22, 2026 •

edited by cursor Bot

Loading

Uh oh!

philippnormann commented Feb 27, 2026 •

edited

Loading

Uh oh!

philippnormann commented Mar 20, 2026

Uh oh!

Uh oh!

Uh oh!

philippnormann commented Mar 21, 2026

Uh oh!

Jackmin801 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

philippnormann commented Feb 22, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Before

After

Evidence

Validation

Scope

Uh oh!

philippnormann commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philippnormann commented Mar 20, 2026

Uh oh!

Uh oh!

Uh oh!

philippnormann commented Mar 21, 2026

Uh oh!

Jackmin801 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

philippnormann commented Feb 22, 2026 •

edited by cursor Bot

Loading

philippnormann commented Feb 27, 2026 •

edited

Loading