Record: SP8192 V21 + Inside-timer N-gram TTT (no Gated XSA) — val_bpb 1.05692 (3-seed mean) by jorge-asenjo · Pull Request #2041 · openai/parameter-golf

jorge-asenjo · 2026-04-30T23:20:00Z

Summary

3-seed mean val_bpb 1.05692 on track_10min_16mb, full validation partition (47,851,520 tokens). All 3 seeds: train ≤596.094s, eval ≤592.2s, max artifact 15,977,032 bytes.

vs current merged SOTA (PR #1855 1.06108): −0.00416 BPB ≈ 0.0091 nats, clears the 0.005-nat README threshold.

Seed	val_bpb	val_loss	train	eval	artifact
42	1.05610	2.31114	596.058s	592.2s	15,977,032 B
0	1.05736	2.31390	596.017s	565.2s	15,975,966 B
1234	1.05730	2.31377	596.094s	518.8s	15,972,820 B
mean	1.05692	2.31294	—	—	—
std (pop)	0.000580	—	—	—	—

Stack

This stacks the eval-time recipe from PR #2018 on top of the PR #1967 V21 + LeakyReLU 0.3 base, without Gated XSA:

PR Record: Gated XSA + LQER top-1 + strict token-only n-gram TTT (val_bpb: 1.047) #2018: PHASED_TTT_NUM_PHASES=1, PHASED_TTT_PREFIX_DOCS=1000, NGRAM_HINT_PRECOMPUTE_OUTSIDE=0 (inside timer), LQER_TOP_K=1, GPTQ_RESERVE_SECONDS=4.0.
PR Record: V21 + N-gram Tilt + LeakyReLU 0.3 — val_bpb 1.05851 (3-seed mean) #1967 V21 base: AsymLogit Rescale (Record: SP8192 #1855 Base + Asymmetric Logit Rescale + AWQ-lite — val_bpb 1.05971 (3-seed mean, full val) #1923), AWQ-lite (Record: PR #1855 base + activation-aware GPTQ mixed precision - val_bpb 1.06081 (3-seed mean) #1908), 7-knob TTT/QK (Record: PR #1945 base + 2560 long-context + no_qv TTT mask + TTT LR 0.75 + QK_GAIN 5.25 — val_bpb 1.05855 (3-seed mean) #1953), LeakyReLU squared 0.3 (Record: Leaky ReLU Slope + GPTQ Reverse-Cholesky Speedup + PR #1938 (val_bpb = 1.06242) #1948), CaseOps SP8192 tokenizer.
Hardware: 8×H100 SXM 80GB, torch 2.9.1+cu128, FA3.

The N-gram precompute (~150-160s) is measured inside the eval timer per the merged A2 record convention (records/track_10min_16mb/2026-04-09_A2_Muon097_3Seed/README.md line 106).

This submission does not include PR #2018's Gated XSA; we report a clean reproduction of the inside-timer eval recipe on the PR #1967 base for additional 3-seed evidence.

Compliance (Issue #1017)

C1 strict causal: standard varlen + per-doc cu_seqlens.
C2 full normalized distribution over SP8192 vocab; n-gram tilt is closed-form p'(a) = exp(β·1[a=h])·p(a)/Z, Σ p'(a)=1.
C3 score-before-update: Phased TTT scores each chunk before any LoRA gradient step.
C4 single pass: each val token contributes exactly one BPB term.
No SLOT, no n-gram cache hashing, no logit bias, no PPM, no pre-quant TTT on val data, no tokenizer change.

Full README + per-seed logs + train_gpt.py in records/track_10min_16mb/2026-05-01_SP8192_InsiderNgram_V21_NoGatedXSA_1.0569/.

Test plan

3 seeds (42, 0, 1234), full val (47,851,520 tokens)
All seeds train ≤600s wallclock cap
All seeds eval ≤600s wallclock cap
All seeds artifact ≤16,000,000 bytes
Score-before-update preserved (Phased TTT)
N-gram precompute inside eval timer (NGRAM_HINT_PRECOMPUTE_OUTSIDE=0)
No tokenizer change
Welch t-test vs PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855 clears p < 0.01 (Δ ≈ 0.0091 nats vs 3-sample std 0.000580)

… 1.05692 (3-seed mean)

…ixer Self-contained reference for byte-level NN scoring without the C1/C2 leak in PR openai#2039 / openai#1967 / openai#2018 / openai#2041. Shows ~-0.097 BPB legitimate gain on spec 250 seed_0 (1M val tokens), independent of include_space leak. Files: README, proper_ppm_mixer_rigorous.py (canonical), byte_bpb_proper.py (NN-only baseline), show_big_gains.py (inspection), test_byte0_3way.py (5-config leak validation).

…ence After user feedback that LEAK calls relied too heavily on lineage-inheritance and path heuristics, applied stricter criterion: a LEAK verdict requires at least one of (a) explicit shell-script invocation of prepare_caseops_data.py without --val-docs=50000, (b) README "Data setup" matching actual train log path, (c) audit/submission.json admission text, (d) train log path with `_caseops/datasets/datasets/<name>` triple-nesting OR single `<root>/datasets/<name>` (which only local prep produces; HF always gives double-nesting). Records that previously got LEAK by lineage-inheritance alone are now AMBIGUOUS unless they meet at least one of those tests. Changes: - openai#1945 LEAK → CLEAN (finalize_v18.sh has snapshot_download from HF; actual run path matches HF target; README's prepare_caseops_data.py section is stale documentation) - openai#1953 LEAK → AMBIGUOUS (PR ships only train_gpt.py + logs; no prep evidence; path matches HF target; parent openai#1945 confirmed CLEAN — leans CLEAN but no direct PR evidence) - openai#2041 LEAK → AMBIGUOUS (no prep invocation; double-nested path consistent with EITHER HF or local prep) - openai#2075 LEAK → AMBIGUOUS (ships prep file but no explicit invocation; path matches HF target) Updated tally: CLEAN 9, LEAK 21, AMBIGUOUS 3, INHERIT 1 (was 8/25/0/1). Headline impact: realistic clean SOTA is at most ~0.012 bpb below the claimed frontier openai#2118 (1.04350). Best clean BPB candidates in order: openai#2019 1.05847 (HF, confirmed) openai#1953 1.05855 (AMBIGUOUS, leans CLEAN) openai#1945 1.05943 (HF, confirmed via re-audit) openai#2031 1.05985 (HF, confirmed) openai#1908 1.06081 (HF, confirmed) openai#1851 1.06128 (HF, MERGED SOTA)

cocohearts · 2026-05-02T18:15:04Z

Leaderboard audit note (pre-cutoff state): I don't think this is valid as a record row. The n-gram path is not token-only in the submitted logs: seed42 reports nonzero within/word/agree gates (within_gate=9866847, word_gate=2891588, agree2plus=303177). Those gates depend on current target-token class/word information and hit the same C1 issue as the earlier within/word n-gram submissions. A token-only legal path should have those gates disabled/zero.

Record: SP8192 V21 + Inside-timer N-gram TTT (no Gated XSA) — val_bpb…

5bf97fd

… 1.05692 (3-seed mean)

himanshudongre mentioned this pull request May 1, 2026

Non-record: competition research notes #2111

Open

leon2k2k2k mentioned this pull request May 1, 2026

Train/val data leakage in CaseOps records — prepare_caseops_data.py default overlaps 80% of val docs with training data #2127

Open

cocohearts mentioned this pull request May 2, 2026

Update leaderboard with May 1 audited rows #2146

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SP8192 V21 + Inside-timer N-gram TTT (no Gated XSA) — val_bpb 1.05692 (3-seed mean)#2041

Record: SP8192 V21 + Inside-timer N-gram TTT (no Gated XSA) — val_bpb 1.05692 (3-seed mean)#2041
jorge-asenjo wants to merge 1 commit intoopenai:mainfrom
jorge-asenjo:submit/insidetimer-ngram-v21-1.0569

jorge-asenjo commented Apr 30, 2026

Uh oh!

cocohearts commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jorge-asenjo commented Apr 30, 2026

Summary

Stack

Compliance (Issue #1017)

Test plan

Uh oh!

cocohearts commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants