Record: SP8192 CaseOps + V13 Curriculum + SmearGate + LoRA-TTT — val_bpb 1.06513 (3-seed mean) by bigbag · Pull Request #1771 · openai/parameter-golf

bigbag · 2026-04-22T11:08:50Z

Summary

val_bpb = 1.06513 (3-seed mean, std 0.00055) | ~15.98 MB | 8xH100 SXM

Stacks 5 techniques on the SP8192 CaseOps base: recurrence depth curriculum (PR Record: CaseOps Tokenizer + Recurrence Depth Curriculum + Base Arch Stack — val_bpb 1.06505 #1756), SmearGate, GatedAttn + QuantGate (PR Record: SP8192 + CaseOps + GatedAttn + QuantGate + Loop45 + PhasedTTT — val_bpb 1.06549 #1736), and LoRA-TTT improvements (PR Record: Alpha=144 LoRA + Warm-start A + WD 1.0 — val_bpb 1.07209 (3-seed mean) #1767)
-0.01272 BPB vs our previous submission PR Record: SP8192 + Improved Parallel Residuals + Muon 0.97 + LR 0.03 + Legal TTT — val_bpb 1.07785 (3-seed mean) #1541 (1.07785)
-0.00626 BPB vs current legal Update README.md little things #1 PR RECORD: SmearGate + Attention Output Gate + Legal TTT | val_bpb=1.07139 #1667 (1.07139)
CaseOps legality pending issue Clarify which text normalizations are allowed for custom tokenizers #1604

3-Seed Results

Seed	Sliding BPB	TTT BPB	val_loss (nats/tok)	Artifact
42	1.07767	1.06449	2.32950	15,975,592
314	1.07856	1.06543	2.33156	15,976,709
999	1.07866	1.06547	2.33162	15,976,693
Mean	1.07830	1.06513	2.33089	15,976,331
Std	0.00055	0.00055

Note: val_bpb computed via standard sentencepiece LUT byte counting (consistent with PR #1769 methodology). Train logs report sidecar-based BPB; val_loss is the ground truth.

Key Techniques

SP8192 CaseOps — Lossless reversible case normalization (TITLE/ALLCAPS/CAPNEXT/ESC operators). Pending Clarify which text normalizations are allowed for custom tokenizers #1604.
Recurrence Depth Curriculum (PR Record: CaseOps Tokenizer + Recurrence Depth Curriculum + Base Arch Stack — val_bpb 1.06505 #1756) — Phased depth 1→3→4 training, eval at depth 4.
SmearGate (modded-nanogpt @classiclarryd) — Per-layer smoothing gate. Novel combo with GatedAttn.
GatedAttn + QuantGate (PR Record: SP8192 + CaseOps + GatedAttn + QuantGate + Loop45 + PhasedTTT — val_bpb 1.06549 #1736) — Full-dim attention gate with int8 passthrough.
LoRA-TTT Improvements (PR Record: Alpha=144 LoRA + Warm-start A + WD 1.0 — val_bpb 1.07209 (3-seed mean) #1767) — Alpha/rank scaling, warm-start A, WD 1.0, alpha 144.
Phased Score-First TTT — 3-phase AdamW (lr=1e-4, WD=1.0), 2000 prefix docs.

Rule Compliance

Score-first phased TTT (no re-scoring)
No pre-quant TTT, no n-gram cache
Artifact ≤ 16 MB (max 15,976,709 B), train ≤ 600s, eval ≤ 600s
CaseOps legality pending issue Clarify which text normalizations are allowed for custom tokenizers #1604

Test plan

Reviewer reproduces any single seed with the provided train_gpt.py and env vars from README
Verify artifact size < 16,000,000 bytes in each seed log
Verify score-first TTT ordering in code

🤖 Generated with Claude Code

…bpb 1.06513 (3-seed mean) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bigbag · 2026-04-22T11:08:59Z

Thanks to OpenAI's Advanced Competitor grant ($500 compute credit via RunPod) for making this work possible.

TTT_LORA_ALPHA env var (default 96, spec uses 144). Only zero B on reset; A accumulates feature directions across batches. Output scaled by alpha/rank. Validated by renqianluo (openai#1767) and bigbag (openai#1771). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…iculum + MLPClip12 Frontier: openai#1769 (1.06453) and openai#1771 (1.06513) both below baseline. New ideas: mlp-clip-sigmas-12, v-gate. Map updated with openai#1769, openai#1771, openai#1770. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…13) strongest legal signal; dexhunter PR openai#1769 (1.06453) new best; LoRA-TTT warm-start A+alpha=144+WD=1.0 appears legal; arXiv:2604.15259 looped transformer outer normalization; Day 13 plateau; Session 19 https://claude.ai/code/session_013agP2MtwGU9MaPNtWx2hib

Record: SP8192 CaseOps + V13 Curriculum + SmearGate + LoRA-TTT — val_…

0fc6ebb

…bpb 1.06513 (3-seed mean) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

renqianluo mentioned this pull request Apr 23, 2026

Record: GatedAttn + Alpha-Scaled LoRA + Warm-start A + WD 1.0 — val_bpb 1.07081 (3-seed mean) #1784

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SP8192 CaseOps + V13 Curriculum + SmearGate + LoRA-TTT — val_bpb 1.06513 (3-seed mean)#1771

Record: SP8192 CaseOps + V13 Curriculum + SmearGate + LoRA-TTT — val_bpb 1.06513 (3-seed mean)#1771
bigbag wants to merge 1 commit intoopenai:mainfrom
bigbag:submission/v13-l2-lora-ttt

bigbag commented Apr 22, 2026

Uh oh!

bigbag commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bigbag commented Apr 22, 2026

Summary

3-Seed Results

Key Techniques

Rule Compliance

Test plan

Uh oh!

bigbag commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant