Skip to content

Record: Alpha=144 LoRA + Warm-start A + WD 1.0 — val_bpb 1.07209 (3-seed mean)#1767

Open
renqianluo wants to merge 1 commit intoopenai:mainfrom
renqianluo:record/alpha-144-wd1-1.07209
Open

Record: Alpha=144 LoRA + Warm-start A + WD 1.0 — val_bpb 1.07209 (3-seed mean)#1767
renqianluo wants to merge 1 commit intoopenai:mainfrom
renqianluo:record/alpha-144-wd1-1.07209

Conversation

@renqianluo
Copy link
Copy Markdown

Summary

Four composable small-LOC changes to BatchedLinearLoRA on top of @dexhunter's 1.07193 phased-TTT code. Everything outside the LoRA module (VarLen attention, Fused MLP, multi-phase global SGD, trimmed GPTQ, triple depth recurrence) is unchanged.

  1. Alpha/rank output scalingforward(x) * (alpha/rank). Without this, raising rank directly diverges on some seeds.
  2. Warm-start A across batches — only B resets between batches, A accumulates feature directions over the ~780 phased-TTT batches.
  3. Raised TTT weight decay 0.5 → 1.0 — counteracts the across-batch A overfit enabled by (2).
  4. Alpha lifted 96 → 144 — scale=1.125 on rank 128 gives LoRA more adaptation strength; (3) keeps it stable.

Results

Seed rank-96 baseline + alpha 96 + warm+WD=1 + alpha 144
1337 1.07423 1.07379 1.07298 1.07189
42 1.07341 1.07320 1.07298 1.07248
314 1.07214 1.07200 1.07203 1.07189
Mean 1.07326 1.07300 1.07266 1.07209

Every seed improves monotonically across every change.

Compliance

All train ≤596s, eval 455.7–456.7s, artifacts ≤15.94MB. Issue #1017 conditions 1–4 verified.

Attribution

…eed mean)

Four composable novel changes on top of dexhunter's phased-TTT code:
1. Alpha/rank LoRA scaling enables stable higher rank (128 vs 96)
2. Warm-start LoRA A across batches lets feature directions accumulate
3. Raised TTT weight decay (0.5 -> 1.0) prevents warm-A overfit
4. Alpha lifted 96 -> 144 gives LoRA more adaptation strength; WD keeps it stable

3-seed mean 1.07209 BPB (seeds 1337, 42, 314). All seeds improve monotonically
across each of the four changes. Matches/approaches dexhunter's 1.07193 closely
despite different seed set.
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 22, 2026
TTT_LORA_ALPHA env var (default 96, spec uses 144). Only zero B on reset;
A accumulates feature directions across batches. Output scaled by alpha/rank.
Validated by renqianluo (openai#1767) and bigbag (openai#1771).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@renqianluo renqianluo changed the title Alpha=144 LoRA + Warm-start A + WD 1.0 — val_bpb 1.07209 (3-seed mean) Record: Alpha=144 LoRA + Warm-start A + WD 1.0 — val_bpb 1.07209 (3-seed mean) Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant