Record: Improved Parallel Residuals + Systems Optimization — val_bpb 1.0752 (3-seed mean) by codemath3000 · Pull Request #1584 · openai/parameter-golf

codemath3000 · 2026-04-13T02:49:18Z

Summary

val_bpb: 1.0752 (3-seed mean, std 0.0006) | 8xH100 SXM, 600s | Legal TTT
Systems-level optimizations on PR Record: ImprovedParallelResiduals, 1.0758 BPB / 2.7789 nats, -0.0020 BPB / -0.0052 nats vs PR #1523 #1529's dual-lane parallel residual architecture: fused Muon kernel, batched EMA, loader prealloc
Identical ML; faster step time yields ~20 extra training steps in the same 600s budget
Per Record Criterion 1: "For submissions that improve speed through systems optimization without changing the ML, this requirement [0.005 nats] is waived." This submission changes only systems-level code (kernel fusion, batched ops, memory preallocation) without altering model architecture, optimizer logic, loss function, or any hyperparameter, meaning the 0.005 nats threshold is waived. Additionally, relative to the current official leaderboard SOTA at the time of writing (PR Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean) #1493), this submission wins by 0.0150 nats, well above the threshold.

Submission series: This PR is one of three related submissions applying the same systems optimizations to different base stacks (PR #1493, PR #1529, PR #1578). We submit against multiple bases so that a ready-to-merge option exists regardless of how the pending PRs are resolved. Judges should feel free to evaluate whichever base(s) they consider valid and disregard the rest.

Results

Seed	TTT BPB	Artifact
1337	1.0745	15,983,819
2024	1.0755	15,982,374
42	1.0755	15,979,637
Mean	1.0752	15,981,943

CUTLASS EVT Build

Required for full throughput. Source included in cutlass_evt_fusion/:

git clone https://github.com/NVIDIA/cutlass.git /opt/cutlass
cd /opt/cutlass && git checkout 08185b9c3e90510ee2b656662ed0d53b06d28157
pip install --no-build-isolation ./cutlass_evt_fusion

Test plan

3-seed training on 8xH100 SXM (seeds 1337, 2024, 42)
All artifacts under 16MB
All runs under 600s training + 600s eval
Round-trip quantization + TTT verified
Judges verify reproducibility

🤖 Generated with Claude Code

…1.0752 Systems-level optimizations (fused Muon, EMA foreach, loader prealloc) on PR openai#1529's dual-lane parallel residual architecture. Identical ML; faster step time yields extra training steps. 3-seed mean: 1.0752 BPB / 2.7773 nats. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…ai#1586 per-layer GPTQ highest-EV - PR openai#758 n-gram effectively dead: MatoTeziTanka (Apr 12) flagged XOR hash includes target token, same illegality as openai#727/openai#741 - GDN-Hybrid BPB bug confirmed: PR openai#1576 space-token double-count inflates denominator ~14%; actual score ~1.16-1.18, not 1.01671 - PR openai#1586 (dexhunter, 1.07493): Per-Layer Adaptive GPTQ MLP=12σ/Attn=13σ + int7 Emb (saves 530KB) + MLR=0.026; -0.0127 nats vs SOTA; implement now - PR openai#1584: systems-only (fused Muon, batched EMA, loader prealloc) ~+20 steps - Casefold Tokenizer (openai#1578/openai#1585): legality debated; await organizer ruling - New paper: arXiv:2604.06169 In-Place TTT (Apr 7) NTP-aligned score-first TTT - Merged SOTA 1.0810 unchanged (4-day stable streak); target ≤1.0760; 17 days https://claude.ai/code/session_01BE8wc8zxvZAo52QBXSNiL8

codemath3000 mentioned this pull request Apr 13, 2026

New record submissions for review (#1583, #1584, #1585) #1587

Open

This was referenced Apr 28, 2026

Update Parameter Golf leaderboard #1900

Open

Update Parameter Golf leaderboard with BOS fix #1902

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Improved Parallel Residuals + Systems Optimization — val_bpb 1.0752 (3-seed mean)#1584

Record: Improved Parallel Residuals + Systems Optimization — val_bpb 1.0752 (3-seed mean)#1584
codemath3000 wants to merge 1 commit intoopenai:mainfrom
codemath3000:submission/systems-opt-parallel-residuals

codemath3000 commented Apr 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

codemath3000 commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results

CUTLASS EVT Build

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codemath3000 commented Apr 13, 2026 •

edited

Loading