Record: PR #2014 stack + LeakyReLU 0.3 + strict in-timer n-gram TTT (val_bpb 1.0560)#2140
Record: PR #2014 stack + LeakyReLU 0.3 + strict in-timer n-gram TTT (val_bpb 1.0560)#2140simon-marcus wants to merge 1 commit intoopenai:mainfrom
Conversation
c1ac531 to
0eac71e
Compare
0eac71e to
fbedd5e
Compare
|
Flagging what looks like a Condition 1 compliance issue. This PR uses the n-gram tilt's within-word and word-start channels (
The merged precedent for this exact n-gram code (PR #1514, merged 2026-04-29) explicitly excludes both channels:
The same target-dependent gating is still present in this PR's code:
With The README's compliance section addresses future-token leakage ("does not inspect future tokens") but not the target-token-at-position-t issue that PR #1420's review identified. PR #2018 is cited as additional precedent but is currently OPEN/unmerged; only PR #1514 is binding precedent on this code, and it disabled these channels. Could the author and maintainers take a look? Happy to be corrected if I've misread something. |
…es, paper scan Post-deadline PR activity: PR openai#2138 Lock-In Byte Mixer confirmed BPB bug (corrected ~1.0671, not 0.979556); PR openai#2135 codemath3000 1.05651 narrowly misses 0.005 threshold; PR openai#2139 TTT Peer-LoRA Ensemble novel technique; PR openai#2140 flagged for target-token n-gram gating violation. New papers: BBQ quantization (ICLR 2026, arXiv:2603.01599), EntroLLM (2505.02380), In-Place TTT NTP-aligned (2604.06169). https://claude.ai/code/session_01CxuVyZaKMxMMc8Q4sMb2dF
Record: PR #2014 stack + LeakyReLU 0.3 + strict in-timer n-gram TTT (val_bpb 1.05601)
3-seed mean: val_bpb 1.05601155 | max 15,997,965 bytes | 8xH100 SXM | 600s train + in-timer eval
Results
Compared with the last merged leaderboard record (#1855, 1.06107587 BPB), this 3-seed mean improves val_bpb by 0.00506432.
Summary
This submission starts from the PR #2014 strict-compliance stack and adds two changes:
NGRAM_HINT_PRECOMPUTE_OUTSIDE=0) and applied as a scoring-time posterior adjustment to per-token NLL.The n-gram path does not add model parameters and has no artifact-size cost beyond source files. The run keeps the PR #2014 global prefix phase (
PHASED_TTT_PREFIX_DOCS=2500) and uses larger TTT chunks to fit hint construction and scoring inside the 600s eval budget.What changed vs PR #2014
NGRAM_HINT_PRECOMPUTE_OUTSIDE=0Compliance notes
596.142s,596.003s,596.061s).580.667s,583.138s,545.161s). All useNGRAM_HINT_PRECOMPUTE_OUTSIDE=0, so n-gram hint generation is inside the measured eval timer.Total submission size quantized+pergroupis 15,997,965 bytes, under 16 MB.Key settings
Files
train_gpt.py— full script for the candidate.online_ngram_tilt.py,online_ngram_state.c— online causal n-gram hint builder and scoring-time tilt helper, from the PR Record: Gated XSA + LQER top-1 + strict token-only n-gram TTT (val_bpb: 1.047) #2018 in-timer n-gram tilt work.train_seed42.log— seed-42 training + quantization log for the artifact reused by the eval sweep.eval_seed42_ngram_p0_c64.log— earlier seed-42 in-timer n-gram TTT eval log with prefix disabled.eval_seed42_ngram_p2500_c64.log— seed-42 frozen-settings in-timer n-gram TTT eval log.train_eval_seed314.log— seed-314 training, quantization, and in-timer TTT eval log.train_eval_seed0.log— seed-0 training, quantization, and in-timer TTT eval log.prepare_caseops_data.py,lossless_caps.py,tokenizers/...model— CaseOps data/tokenizer helpers from the merged Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855 lineage.submission.json— structured 3-seed metadata.Reproducing
After preparing the CaseOps data and tokenizer, run with the environment above:
For the eval-only sweep used here, load the saved quantized artifact and run with:
Credits
This is a stack on top of the recent strict-compliance CaseOps line. Most directly: