Record: SP8192 V21 + Inside-timer N-gram TTT (no Gated XSA) — val_bpb 1.05692 (3-seed mean)#2041
Open
jorge-asenjo wants to merge 1 commit intoopenai:mainfrom
Open
Conversation
… 1.05692 (3-seed mean)
leon2k2k2k
added a commit
to leon2k2k2k/parameter-golf
that referenced
this pull request
May 1, 2026
…ixer Self-contained reference for byte-level NN scoring without the C1/C2 leak in PR openai#2039 / openai#1967 / openai#2018 / openai#2041. Shows ~-0.097 BPB legitimate gain on spec 250 seed_0 (1M val tokens), independent of include_space leak. Files: README, proper_ppm_mixer_rigorous.py (canonical), byte_bpb_proper.py (NN-only baseline), show_big_gains.py (inspection), test_byte0_3way.py (5-config leak validation).
leon2k2k2k
added a commit
to leon2k2k2k/parameter-golf
that referenced
this pull request
May 1, 2026
…ence After user feedback that LEAK calls relied too heavily on lineage-inheritance and path heuristics, applied stricter criterion: a LEAK verdict requires at least one of (a) explicit shell-script invocation of prepare_caseops_data.py without --val-docs=50000, (b) README "Data setup" matching actual train log path, (c) audit/submission.json admission text, (d) train log path with `_caseops/datasets/datasets/<name>` triple-nesting OR single `<root>/datasets/<name>` (which only local prep produces; HF always gives double-nesting). Records that previously got LEAK by lineage-inheritance alone are now AMBIGUOUS unless they meet at least one of those tests. Changes: - openai#1945 LEAK → CLEAN (finalize_v18.sh has snapshot_download from HF; actual run path matches HF target; README's prepare_caseops_data.py section is stale documentation) - openai#1953 LEAK → AMBIGUOUS (PR ships only train_gpt.py + logs; no prep evidence; path matches HF target; parent openai#1945 confirmed CLEAN — leans CLEAN but no direct PR evidence) - openai#2041 LEAK → AMBIGUOUS (no prep invocation; double-nested path consistent with EITHER HF or local prep) - openai#2075 LEAK → AMBIGUOUS (ships prep file but no explicit invocation; path matches HF target) Updated tally: CLEAN 9, LEAK 21, AMBIGUOUS 3, INHERIT 1 (was 8/25/0/1). Headline impact: realistic clean SOTA is at most ~0.012 bpb below the claimed frontier openai#2118 (1.04350). Best clean BPB candidates in order: openai#2019 1.05847 (HF, confirmed) openai#1953 1.05855 (AMBIGUOUS, leans CLEAN) openai#1945 1.05943 (HF, confirmed via re-audit) openai#2031 1.05985 (HF, confirmed) openai#1908 1.06081 (HF, confirmed) openai#1851 1.06128 (HF, MERGED SOTA)
Collaborator
|
Leaderboard audit note (pre-cutoff state): I don't think this is valid as a record row. The n-gram path is not token-only in the submitted logs: seed42 reports nonzero within/word/agree gates ( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
3-seed mean val_bpb 1.05692 on
track_10min_16mb, full validation partition (47,851,520 tokens). All 3 seeds: train ≤596.094s, eval ≤592.2s, max artifact 15,977,032 bytes.vs current merged SOTA (PR #1855 1.06108): −0.00416 BPB ≈ 0.0091 nats, clears the 0.005-nat README threshold.
Stack
This stacks the eval-time recipe from PR #2018 on top of the PR #1967 V21 + LeakyReLU 0.3 base, without Gated XSA:
PHASED_TTT_NUM_PHASES=1,PHASED_TTT_PREFIX_DOCS=1000,NGRAM_HINT_PRECOMPUTE_OUTSIDE=0(inside timer),LQER_TOP_K=1,GPTQ_RESERVE_SECONDS=4.0.The N-gram precompute (~150-160s) is measured inside the eval timer per the merged A2 record convention (
records/track_10min_16mb/2026-04-09_A2_Muon097_3Seed/README.mdline 106).This submission does not include PR #2018's Gated XSA; we report a clean reproduction of the inside-timer eval recipe on the PR #1967 base for additional 3-seed evidence.
Compliance (Issue #1017)
p'(a) = exp(β·1[a=h])·p(a)/Z, Σ p'(a)=1.Full README + per-seed logs + train_gpt.py in
records/track_10min_16mb/2026-05-01_SP8192_InsiderNgram_V21_NoGatedXSA_1.0569/.Test plan