Record: PR #1854 neural stack — budget-compliant 1.06777 (3-seed mean) by robbiebusinessacc · Pull Request #1883 · openai/parameter-golf

robbiebusinessacc · 2026-04-28T05:17:41Z

Summary

3-seed validated reproduction of PR #1854's neural stack with PHASED_TTT_PREFIX_DOCS reduced from 2000 → 1500 to fit cleanly under the 600s evaluation budget. val_bpb 1.06777 (3-seed mean, std 0.00106) on 8×H100 SXM.

vs merged-leaderboard SOTA PR #1493 (@bigbag, 1.0810): −0.01323 BPB at ~13σ statistical significance, p ≪ 0.0001 against the 0.005-nat threshold.

Seed	val_bpb	Total bytes	Eval time
42	1.06686	15,952,086	374.6s
1337	1.06893	15,949,941	371.0s
314	1.06752	15,951,195	327.7s
Mean	1.06777	15,951,074	357.8s

Compliance

All 3 artifacts under 16,000,000 bytes (max 15,952,086, margin 47,914)
All 3 eval times under 600s (max 374.6s, margin 225.4s)
Training cap-bound at 600s, all 3 seeds
Headline val_bpb is the standard token-level NLL → byte path. No byte-PPM mixture is claimed. The exploratory multibin-λ refinement of PR Record: SP8192 + PPM-D byte mixture — 1.00136 BPB (3-seed mean) #1835's mixer is included in train_gpt.py for reproducibility but its mix_bpb is not the reported number — see README "Note on byte-PPM mixture" for the reasoning.
Score-first phased TTT only (PR Record: SP8192 + QK-Gain 5 + Legal Score-First TTT — val_bpb 1.08279 (3-seed mean) #1413 lineage). No pre-quant TTT, no SLOT, no n-gram cache, no logit bias.
CaseOps tokenizer byte counting via the fineweb_val_bytes_*.bin sidecar that recovers original UTF-8 byte counts; the inflated piece.encode() path is explicitly bypassed (train_gpt.py:387-389, 2618-2626). Full audit and Eppie/mhuen-style normalization proof in the README.

What's new vs PR #1854

PR #1854's reported eval wallclock is ~700s (per its own log: ttt_phased 516s + ppm_mix 116s + diagnostics 67s), over the 600s budget. This submission reproduces the same neural stack with PHASED_TTT_PREFIX_DOCS=1500 and lands at the same post-TTT val_bpb (~1.067) cleanly under 600s. Closed PRs in #677 cite eval over-budget as grounds for rejection (e.g. PR #503), so a budget-compliant 1.067 is a more defensible record candidate.

Files

records/track_10min_16mb/2026-04-28_PR1854_BudgetCompliant_1.0678/

README.md — full methodology, normalization proof, lineage and credits
submission.json — metadata
train_gpt.py — neural stack (mixer-independent headline path)
lossless_caps.py, prepare_caseops_data.py, tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model — verbatim from PR Record: PR #1797 base + PPM-D byte mixture — val_bpb 0.90236 (3-seed mean) #1854
train_seed{42,1337,314}.log — per-seed train+eval logs
final_model.int6.ptz — quantized model artifact

Test plan

Trains within 600s wallclock on 8×H100 80GB SXM (cap-bound, all 3 seeds)
All 3 artifacts under 16 MB cap
Eval completes within 600s wallclock cap (max 374.6s)
3-seed mean reproduced; per-seed numbers verified in attached logs
Full-vocab softmax normalization (standard F.cross_entropy over V=8192) — README §"Normalization proof"
Byte denominator equals original UTF-8 bytes via sidecar — README §"Normalization proof" (2)

…tion — val_bpb 1.06777 (3-seed mean) 3-seed validated reproduction of PR openai#1854's neural stack with PHASED_TTT_PREFIX_DOCS=1500 to fit the 600s eval budget. Beats merged SOTA PR openai#1493 (bigbag, 1.0810) by 0.01323 BPB at ~13σ statistical significance. Reported val_bpb is the standard token-level NLL → byte conversion (no byte-PPM mixture claimed). The exploratory multibin-λ refinement of PR openai#1835's mixer is included in train_gpt.py for completeness but its mix_bpb is not the headline claim, due to an open community question on byte-spread normalization vs Kraft compliance.

robbiebusinessacc and others added 2 commits April 28, 2026 00:59

Add explicit normalization proof and byte-counting audit to README

b525726

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: PR #1854 neural stack — budget-compliant 1.06777 (3-seed mean)#1883

Record: PR #1854 neural stack — budget-compliant 1.06777 (3-seed mean)#1883
robbiebusinessacc wants to merge 2 commits intoopenai:mainfrom
robbiebusinessacc:submission/multibin-lambda

robbiebusinessacc commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

robbiebusinessacc commented Apr 28, 2026

Summary

Compliance

What's new vs PR #1854

Files

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant