Record: PR #1854 neural stack — budget-compliant 1.06777 (3-seed mean)#1883
Open
robbiebusinessacc wants to merge 2 commits intoopenai:mainfrom
Open
Record: PR #1854 neural stack — budget-compliant 1.06777 (3-seed mean)#1883robbiebusinessacc wants to merge 2 commits intoopenai:mainfrom
robbiebusinessacc wants to merge 2 commits intoopenai:mainfrom
Conversation
…tion — val_bpb 1.06777 (3-seed mean) 3-seed validated reproduction of PR openai#1854's neural stack with PHASED_TTT_PREFIX_DOCS=1500 to fit the 600s eval budget. Beats merged SOTA PR openai#1493 (bigbag, 1.0810) by 0.01323 BPB at ~13σ statistical significance. Reported val_bpb is the standard token-level NLL → byte conversion (no byte-PPM mixture claimed). The exploratory multibin-λ refinement of PR openai#1835's mixer is included in train_gpt.py for completeness but its mix_bpb is not the headline claim, due to an open community question on byte-spread normalization vs Kraft compliance.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
3-seed validated reproduction of PR #1854's neural stack with
PHASED_TTT_PREFIX_DOCSreduced from 2000 → 1500 to fit cleanly under the 600s evaluation budget. val_bpb 1.06777 (3-seed mean, std 0.00106) on 8×H100 SXM.vs merged-leaderboard SOTA PR #1493 (@bigbag, 1.0810): −0.01323 BPB at ~13σ statistical significance, p ≪ 0.0001 against the 0.005-nat threshold.
Compliance
val_bpbis the standard token-level NLL → byte path. No byte-PPM mixture is claimed. The exploratory multibin-λ refinement of PR Record: SP8192 + PPM-D byte mixture — 1.00136 BPB (3-seed mean) #1835's mixer is included intrain_gpt.pyfor reproducibility but itsmix_bpbis not the reported number — see README "Note on byte-PPM mixture" for the reasoning.fineweb_val_bytes_*.binsidecar that recovers original UTF-8 byte counts; the inflatedpiece.encode()path is explicitly bypassed (train_gpt.py:387-389, 2618-2626). Full audit and Eppie/mhuen-style normalization proof in the README.What's new vs PR #1854
PR #1854's reported eval wallclock is ~700s (per its own log: ttt_phased 516s + ppm_mix 116s + diagnostics 67s), over the 600s budget. This submission reproduces the same neural stack with
PHASED_TTT_PREFIX_DOCS=1500and lands at the same post-TTT val_bpb (~1.067) cleanly under 600s. Closed PRs in #677 cite eval over-budget as grounds for rejection (e.g. PR #503), so a budget-compliant 1.067 is a more defensible record candidate.Files
records/track_10min_16mb/2026-04-28_PR1854_BudgetCompliant_1.0678/README.md— full methodology, normalization proof, lineage and creditssubmission.json— metadatatrain_gpt.py— neural stack (mixer-independent headline path)lossless_caps.py,prepare_caseops_data.py,tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model— verbatim from PR Record: PR #1797 base + PPM-D byte mixture — val_bpb 0.90236 (3-seed mean) #1854train_seed{42,1337,314}.log— per-seed train+eval logsfinal_model.int6.ptz— quantized model artifactTest plan
F.cross_entropyover V=8192) — README §"Normalization proof"