Record: VarLen Attention + Fused MLP + Multi-Phase Global SGD TTT — val_bpb 1.07193 (3-seed mean) by dexhunter · Pull Request #1626 · openai/parameter-golf

dexhunter · 2026-04-14T23:52:27Z

Summary

val_bpb: 1.07193 (3-seed mean, std 0.00063) | 2.76890 nats | ~15.93 MB
Novel multi-phase global SGD during phased TTT evaluation — splits prefix docs into 3 phases with interleaved scoring and distributed SGD adaptation
Builds on PR Record: Varlen attention + fused MLP + doc-independent TTT (1.07336) #1530 (@samacqua) VarLen attention + fused MLP + doc-TTT, with phased TTT concept from PR Record: VarLenAttn + PhasingTTT - val_bpb 1.0728 (3-seed mean) #1610 (@romeerp)
Additional improvements: trimmed GPTQ, MATRIX_LR=0.026, per-layer adaptive clip, int7 embeddings, warmdown 0.75

Results

Seed	Post-TTT BPB	val_loss (nats)	Artifact
42	1.07280	2.77116	15,932,897
0	1.07134	2.76739	15,939,841
1234	1.07164	2.76815	15,932,419
Mean	1.07193	2.76890

Key Innovation

Multi-phase global SGD: instead of a single SGD round on prefix docs (PR #1610), we split into 3 phases — scoring a chunk, running SGD, then scoring the next chunk with the improved model. This progressively adapts the base model while maintaining strict score-before-update legality. 3-phase gives -0.0008 BPP over single-phase.

Test plan

Verify 3-seed mean and std
Check artifact sizes < 16 MB
Verify score-before-update ordering in TTT logs
Check code consistency across seeds

@samacqua

…al_bpb 1.07193 (3-seed mean) Novel multi-phase global SGD during phased TTT evaluation. Builds on PR openai#1530 (@samacqua) + PR openai#1610 (@romeerp) phased TTT concept. 3-seed mean: 1.07193 BPB (2.76890 nats), std 0.00063. Seeds: 42, 0, 1234. All artifacts <16 MB.

romeerp · 2026-04-15T00:01:18Z

Wanted to implement this multi-phased strategy but didn't have compute to run tests for it. Glad you were able to do it and show improvement!

@dexhunter

Audits every CaseOps-lineage record-track PR (merged + unmerged) since 2026-04-18 for whether val docs are also in the training set. Working set: 34 PRs (31 from chronological seed list + 3 discovered ancestors: openai#1908, openai#1923, openai#2007). Boundary nodes openai#1493 / openai#1626 (pre-CaseOps). Verdicts: - CLEAN (8): openai#1729, openai#1851, openai#1868, openai#1908, openai#2019, openai#2027, openai#2031, openai#2068 - LEAK (25): openai#1736 (our research baseline) → openai#1769 → openai#1787 → openai#1797 → openai#1855 → V21 family (openai#1945, openai#1923, openai#1953, openai#1967) → openai#2018 → openai#2118 (current claimed frontier 1.04350), plus siblings. - INHERIT (1): openai#2050 (eval-only on frozen openai#1915) Code-level evidence (not README claims): - Every shipped prepare_caseops_data.py is byte-identical: SHARD_TOKENS=10_000_000, default=10_000 for --val-docs - NO PR overrides --val-docs (searched all .sh files in all 34 PRs) - cached_challenge_fineweb.py downloads from romeerp/parameter-golf-caseops-v1 HF dataset whose manifest pins docs_val=50000, docs_train=8181945, sums match → CLEAN by construction - PR openai#2018's DATASET_AUDIT.md is gold-standard explicit leak description - PR openai#2118's submission.json admits "--val-docs=10000 train shards + 50k val eval" Three signposts: - Leak introduced: PR openai#1736 by @dexhunter (Apr 19) — first prepare_caseops_data.py default invocation - Leak fixed: PR openai#1851 by @aquariouseworkman (Apr 27) — switched to HF dataset - Leak re-introduced: PR openai#1855 by @codemath3000 (same day) — rebuilt locally The merged-leaderboard SOTA (openai#1851/openai#1868 at 1.06128/1.06141) is CLEAN. The unmerged frontier (openai#2118 at 1.04350) is LEAK. The 0.018 bpb gap is inflated by val memorization; spec 301 was designed to measure how much remains under clean data. Files: caseops-memory-leakage/README.md — overview, methodology, takeaways caseops-memory-leakage/verdicts.md — 34-row master table with evidence caseops-memory-leakage/family-tree.md — ASCII trees with [C]/[L] annotations

dexhunter mentioned this pull request Apr 16, 2026

Record: Casefold V4 Tokenizer + Multi-Phase Global SGD TTT — val_bpb 1.05970 (3-seed mean) #1670

Open

tejasnaladala mentioned this pull request Apr 16, 2026

Clarify which text normalizations are allowed for custom tokenizers #1604

Open

dexhunter mentioned this pull request Apr 17, 2026

Record: Casefold V4 + AttnOutGate + Multi-Phase Global SGD TTT — val_bpb 1.05733 (3-seed mean) #1693

Open

7 tasks

X-Abhishek-X mentioned this pull request Apr 17, 2026

[Record] Stage 3 + SpinQuant V1 + MP-SGD-TTT — val_bpb 1.0759 #1695

Open

yahya010 mentioned this pull request Apr 18, 2026

Record: SP8192 MP-SGD TTT (4 phases) + QK-Gain 5.25 — val_bpb 1.07217 (3-seed mean) #1727

Open

12 tasks

mikeapedia mentioned this pull request Apr 19, 2026

Non-record: Neural Base Model, No TTT — Parcae + Gates + Layered Windows (val_bpb 1.07706) #1728

Open

7 tasks

romeerp mentioned this pull request Apr 19, 2026

Record: CaseOps Tokenizer + Tapered WD - val_bpb 1.0678 (3-seed mean) #1729

Merged

yahya010 mentioned this pull request Apr 19, 2026

Record: GatedDeltaNet + Legal TTT + Brotli-11 — val_bpb 1.01080 (3-seed mean, VALID artifacts) #1734

Closed

12 tasks

romeerp mentioned this pull request Apr 20, 2026

Record: CaseOps Tokenizer + Recurrence Depth Curriculum + Base Arch Stack — val_bpb 1.06505 #1756

Open

dentity007 mentioned this pull request Apr 22, 2026

Record: SP8192 + No Gates + Multi-Phase Global SGD TTT — val_bpb 1.07285 (3-seed mean) #1775

Open

3 tasks

This was referenced Apr 28, 2026

Update Parameter Golf leaderboard #1899

Open

Update Parameter Golf leaderboard #1900

Open

Update Parameter Golf leaderboard with BOS fix #1902

Merged

dttdrv mentioned this pull request Apr 28, 2026

{RECORD} CaseOps pre-quant TTT record (1.0354 BPB) #1911

Open

8 tasks

simon-marcus mentioned this pull request Apr 29, 2026

Record candidate: 1.06032 CaseOps + Matrix-LR 0.028 + TTT n=1 #1925

Open

8 tasks

MarioPaerle mentioned this pull request Apr 29, 2026

Record: PR #1886 base + per-block MLP output gate (Linear, weight-learnable) — val_bpb 1.06872 (3-seed mean) #1941

Closed

cocohearts merged commit 5c8e045 into openai:main Apr 29, 2026

andrewbaggio1 mentioned this pull request Apr 30, 2026

Record: PR #1945 base + 2560 long-context + no_qv TTT mask + TTT LR 0.75 + QK_GAIN 5.25 — val_bpb 1.05855 (3-seed mean) #1953

Open

10 tasks

AayushBaniya2006 mentioned this pull request Apr 30, 2026

Record: PR #1908 reproduction with compliant 600s wallclock — val_bpb 1.06044 (3-seed mean) #1956

Open

5 tasks

chris-colinsky mentioned this pull request Apr 30, 2026

Record candidate: PR #1855 + Adaptive Hessian-Sensitivity GPTQ Clip — val_bpb 1.06310 (3-seed mean) #1962

Open

6 tasks

simonbissonnette mentioned this pull request Apr 30, 2026

Record: PR1855/PR1953 base + Progressive context growth (val_bpb: 1.05759, 3-seed) #2014

Open

PiyushDatta mentioned this pull request May 1, 2026

Record: SP8192+DepthRec+Half batch SWA+Polar NS+Phased LoRa TTT - val_bpb 1.089 (best), val_bpb 1.090 (3-seed mean) - PiyushDatta #2106

Open

simon-marcus mentioned this pull request May 1, 2026

Corrected: PR #2014 stack + LeakyReLU 0.3 + token-only in-timer n-gram TTT (val_bpb 1.0570) #2140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: VarLen Attention + Fused MLP + Multi-Phase Global SGD TTT — val_bpb 1.07193 (3-seed mean)#1626

Record: VarLen Attention + Fused MLP + Multi-Phase Global SGD TTT — val_bpb 1.07193 (3-seed mean)#1626
cocohearts merged 1 commit intoopenai:mainfrom
dexhunter:dexhunter/multiphase-sgd-ttt

dexhunter commented Apr 14, 2026

Uh oh!

romeerp commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dexhunter commented Apr 14, 2026

Summary

Results

Key Innovation

Test plan

Uh oh!

romeerp commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants