Record: Alpha=144 LoRA + Warm-start A + WD 1.0 — val_bpb 1.07209 (3-seed mean) by renqianluo · Pull Request #1767 · openai/parameter-golf

renqianluo · 2026-04-22T01:40:27Z

Summary

Four composable small-LOC changes to BatchedLinearLoRA on top of @dexhunter's 1.07193 phased-TTT code. Everything outside the LoRA module (VarLen attention, Fused MLP, multi-phase global SGD, trimmed GPTQ, triple depth recurrence) is unchanged.

Alpha/rank output scaling — forward(x) * (alpha/rank). Without this, raising rank directly diverges on some seeds.
Warm-start A across batches — only B resets between batches, A accumulates feature directions over the ~780 phased-TTT batches.
Raised TTT weight decay 0.5 → 1.0 — counteracts the across-batch A overfit enabled by (2).
Alpha lifted 96 → 144 — scale=1.125 on rank 128 gives LoRA more adaptation strength; (3) keeps it stable.

Results

Seed	rank-96 baseline	+ alpha 96	+ warm+WD=1	+ alpha 144
1337	1.07423	1.07379	1.07298	1.07189
42	1.07341	1.07320	1.07298	1.07248
314	1.07214	1.07200	1.07203	1.07189
Mean	1.07326	1.07300	1.07266	1.07209

Every seed improves monotonically across every change.

Compliance

All train ≤596s, eval 455.7–456.7s, artifacts ≤15.94MB. Issue #1017 conditions 1–4 verified.

Attribution

@bigbag (Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean) #1493), @EthanYangTW (Record: SP8192 + Triple Recurrence + Banking + Fused MLP + Muon 0.97 — val_bpb 1.0778 (3-seed mean) #1523), @samacqua (Record: Varlen attention + fused MLP + doc-independent TTT (1.07336) #1530), @romeerp (Record: VarLenAttn + PhasingTTT - val_bpb 1.0728 (3-seed mean) #1610), @dexhunter (1.07193), @abaybektursun (Record: LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1194 (3-seed mean) #549)

…eed mean) Four composable novel changes on top of dexhunter's phased-TTT code: 1. Alpha/rank LoRA scaling enables stable higher rank (128 vs 96) 2. Warm-start LoRA A across batches lets feature directions accumulate 3. Raised TTT weight decay (0.5 -> 1.0) prevents warm-A overfit 4. Alpha lifted 96 -> 144 gives LoRA more adaptation strength; WD keeps it stable 3-seed mean 1.07209 BPB (seeds 1337, 42, 314). All seeds improve monotonically across each of the four changes. Matches/approaches dexhunter's 1.07193 closely despite different seed set.

TTT_LORA_ALPHA env var (default 96, spec uses 144). Only zero B on reset; A accumulates feature directions across batches. Output scaled by alpha/rank. Validated by renqianluo (openai#1767) and bigbag (openai#1771). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bigbag mentioned this pull request Apr 22, 2026

Record: SP8192 CaseOps + V13 Curriculum + SmearGate + LoRA-TTT — val_bpb 1.06513 (3-seed mean) #1771

Open

3 tasks

renqianluo changed the title ~~Alpha=144 LoRA + Warm-start A + WD 1.0 — val_bpb 1.07209 (3-seed mean)~~ Record: Alpha=144 LoRA + Warm-start A + WD 1.0 — val_bpb 1.07209 (3-seed mean) Apr 22, 2026

leon2k2k2k mentioned this pull request Apr 22, 2026

Record: SP8192 + CaseOps + Gated Attention + Quant Gate + Loop4-5 + Phased TTT + Frozen Recurrent Alpha — val_bpb 1.06421 #1779

Open

3 tasks

renqianluo mentioned this pull request Apr 23, 2026

Record: GatedAttn + Alpha-Scaled LoRA + Warm-start A + WD 1.0 — val_bpb 1.07081 (3-seed mean) #1784

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Alpha=144 LoRA + Warm-start A + WD 1.0 — val_bpb 1.07209 (3-seed mean)#1767

Record: Alpha=144 LoRA + Warm-start A + WD 1.0 — val_bpb 1.07209 (3-seed mean)#1767
renqianluo wants to merge 1 commit intoopenai:mainfrom
renqianluo:record/alpha-144-wd1-1.07209

renqianluo commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

renqianluo commented Apr 22, 2026

Summary

Results

Compliance

Attribution

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant