Record: #1787 + Sparse Gate + Updated Frozen Carry — val_bpb 1.06287 by leon2k2k2k · Pull Request #1800 · openai/parameter-golf

leon2k2k2k · 2026-04-24T01:47:22Z

Summary

3-seed mean val_bpb = 1.06287 (seeds 42, 0, 1234), val_loss = 2.32695 nats/token
−0.00134 vs Record: SP8192 + CaseOps + Gated Attention + Quant Gate + Loop4-5 + Phased TTT + Frozen Recurrent Alpha — val_bpb 1.06421 #1779 (1.06421, our last submission), −0.00048 vs Record: PR #1736 + Polar Express NS + MIN_LR + Sparse Attn Gate + Fused CE + PR #1767 TTT — val_bpb 1.06335 #1787 (1.06335), −0.00262 vs Record: SP8192 + CaseOps + GatedAttn + QuantGate + Loop45 + PhasedTTT — val_bpb 1.06549 #1736 (1.06549)
Inherits from Record: SP8192 + CaseOps + Gated Attention + Quant Gate + Loop4-5 + Phased TTT + Frozen Recurrent Alpha — val_bpb 1.06421 #1779; adds a sparse attention-output gate and updated frozen recurrent carry
Stackable with the smear gate and LQER from Record: PR #1787 base + Smear Gate + LQER Asym — val_bpb 1.06157 #1797

Results (8×H100 80GB SXM, phased LoRA-TTT, 10-min train / 10-min eval)

Seed	Steps	Post-EMA (pre-quant)	Quantized	Post-TTT	Artifact (bytes)
42	4989	1.06749	1.07678	1.06366	15,909,254
0	4974	1.06685	1.07608	1.06311	15,904,209
1234	4973	1.06578	1.07509	1.06183	15,909,401
Mean	4979	1.06671	1.07598	1.06287	15,907,621

Frozen Recurrent Carry

The recurrent α/β carry coefficients (first introduced in #1779) were learned end-to-end on a full training run with no validation set involvement, then quantized to 2 decimal places before this promotion run:

β = [1.56, 1.85, 2.13]
α = [[0.23, 0.04, 0.03], [0.13, −0.34, 0.01], [0.06, 0.19, −0.02]]

Full-precision learned values: β = [1.5610, 1.8531, 2.1320], α = [[0.2314, 0.0388, 0.0347], [0.1260, −0.3438, 0.0145], [0.0557, 0.1934, −0.0172]].

The legality of offline-learned frozen scalars was discussed in #1779 — the data-size budget provides a natural bound on this class of technique.

What this adds over #1779

From #1787 (nprime06):

Polar Express Newton-Schulz coefficients
MIN_LR=0.10 warmdown floor
Fused softcapped CE
GPTQ_RESERVE_SECONDS=0.5, VAL_LOSS_EVERY=0

New in this PR:

Sparse attention-output gate — replaces the dense GatedAttn with a narrow-input sparse gate
Updated frozen recurrent carry — α/β re-learned on the sparse-gate stack and frozen to 2 decimal places (values above)

Rule Compliance

Score-first phased TTT (Condition 3), no pre-quant TTT, no n-gram cache
All artifacts ≤ 16 MB (max 15,909,401 bytes), train ≤ 600s, eval ≤ 600s
CaseOps tokenizer (pending issue Clarify which text normalizations are allowed for custom tokenizers #1604, same as Record: SP8192 + CaseOps + Gated Attention + Quant Gate + Loop4-5 + Phased TTT + Frozen Recurrent Alpha — val_bpb 1.06421 #1779/Record: PR #1736 + Polar Express NS + MIN_LR + Sparse Attn Gate + Fused CE + PR #1767 TTT — val_bpb 1.06335 #1787)

Test Plan

Reviewer reproduces any single seed with the provided train_gpt.py and env vars
Verify artifact size < 16,000,000 bytes in each seed log
Verify score-first TTT ordering in code

🤖 Generated with Claude Code

…06287

leon2k2k2k force-pushed the submission/036-sparse-updated-carry branch 3 times, most recently from 1a7f817 to 356dc2a Compare April 24, 2026 01:52

Record: openai#1787 + Sparse Gate + Updated Frozen Carry — val_bpb 1.…

372c5f1

…06287

leon2k2k2k force-pushed the submission/036-sparse-updated-carry branch from 356dc2a to 372c5f1 Compare April 24, 2026 01:55

leon2k2k2k closed this Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: #1787 + Sparse Gate + Updated Frozen Carry — val_bpb 1.06287#1800

Record: #1787 + Sparse Gate + Updated Frozen Carry — val_bpb 1.06287#1800
leon2k2k2k wants to merge 1 commit intoopenai:mainfrom
leon2k2k2k:submission/036-sparse-updated-carry

leon2k2k2k commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

leon2k2k2k commented Apr 24, 2026

Summary

Results (8×H100 80GB SXM, phased LoRA-TTT, 10-min train / 10-min eval)

Frozen Recurrent Carry

What this adds over #1779

Rule Compliance

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant