Skip to content

Record: SP8192 V21 + Inside-timer N-gram TTT (no Gated XSA) — val_bpb 1.05692 (3-seed mean)#2041

Open
jorge-asenjo wants to merge 1 commit intoopenai:mainfrom
jorge-asenjo:submit/insidetimer-ngram-v21-1.0569
Open

Record: SP8192 V21 + Inside-timer N-gram TTT (no Gated XSA) — val_bpb 1.05692 (3-seed mean)#2041
jorge-asenjo wants to merge 1 commit intoopenai:mainfrom
jorge-asenjo:submit/insidetimer-ngram-v21-1.0569

Conversation

@jorge-asenjo
Copy link
Copy Markdown

Summary

3-seed mean val_bpb 1.05692 on track_10min_16mb, full validation partition (47,851,520 tokens). All 3 seeds: train ≤596.094s, eval ≤592.2s, max artifact 15,977,032 bytes.

vs current merged SOTA (PR #1855 1.06108): −0.00416 BPB ≈ 0.0091 nats, clears the 0.005-nat README threshold.

Seed val_bpb val_loss train eval artifact
42 1.05610 2.31114 596.058s 592.2s 15,977,032 B
0 1.05736 2.31390 596.017s 565.2s 15,975,966 B
1234 1.05730 2.31377 596.094s 518.8s 15,972,820 B
mean 1.05692 2.31294
std (pop) 0.000580

Stack

This stacks the eval-time recipe from PR #2018 on top of the PR #1967 V21 + LeakyReLU 0.3 base, without Gated XSA:

The N-gram precompute (~150-160s) is measured inside the eval timer per the merged A2 record convention (records/track_10min_16mb/2026-04-09_A2_Muon097_3Seed/README.md line 106).

This submission does not include PR #2018's Gated XSA; we report a clean reproduction of the inside-timer eval recipe on the PR #1967 base for additional 3-seed evidence.

Compliance (Issue #1017)

  • C1 strict causal: standard varlen + per-doc cu_seqlens.
  • C2 full normalized distribution over SP8192 vocab; n-gram tilt is closed-form p'(a) = exp(β·1[a=h])·p(a)/Z, Σ p'(a)=1.
  • C3 score-before-update: Phased TTT scores each chunk before any LoRA gradient step.
  • C4 single pass: each val token contributes exactly one BPB term.
  • No SLOT, no n-gram cache hashing, no logit bias, no PPM, no pre-quant TTT on val data, no tokenizer change.

Full README + per-seed logs + train_gpt.py in records/track_10min_16mb/2026-05-01_SP8192_InsiderNgram_V21_NoGatedXSA_1.0569/.

Test plan

leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request May 1, 2026
…ixer

Self-contained reference for byte-level NN scoring without the C1/C2 leak
in PR openai#2039 / openai#1967 / openai#2018 / openai#2041. Shows ~-0.097 BPB legitimate gain on
spec 250 seed_0 (1M val tokens), independent of include_space leak.

Files: README, proper_ppm_mixer_rigorous.py (canonical), byte_bpb_proper.py
(NN-only baseline), show_big_gains.py (inspection), test_byte0_3way.py
(5-config leak validation).
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request May 1, 2026
…ence

After user feedback that LEAK calls relied too heavily on lineage-inheritance
and path heuristics, applied stricter criterion: a LEAK verdict requires at
least one of (a) explicit shell-script invocation of prepare_caseops_data.py
without --val-docs=50000, (b) README "Data setup" matching actual train log
path, (c) audit/submission.json admission text, (d) train log path with
`_caseops/datasets/datasets/<name>` triple-nesting OR single `<root>/datasets/<name>`
(which only local prep produces; HF always gives double-nesting).

Records that previously got LEAK by lineage-inheritance alone are now AMBIGUOUS
unless they meet at least one of those tests.

Changes:
  - openai#1945 LEAK → CLEAN  (finalize_v18.sh has snapshot_download from HF;
    actual run path matches HF target; README's prepare_caseops_data.py
    section is stale documentation)
  - openai#1953 LEAK → AMBIGUOUS  (PR ships only train_gpt.py + logs; no prep
    evidence; path matches HF target; parent openai#1945 confirmed CLEAN —
    leans CLEAN but no direct PR evidence)
  - openai#2041 LEAK → AMBIGUOUS  (no prep invocation; double-nested path
    consistent with EITHER HF or local prep)
  - openai#2075 LEAK → AMBIGUOUS  (ships prep file but no explicit invocation;
    path matches HF target)

Updated tally: CLEAN 9, LEAK 21, AMBIGUOUS 3, INHERIT 1 (was 8/25/0/1).

Headline impact: realistic clean SOTA is at most ~0.012 bpb below the
claimed frontier openai#2118 (1.04350). Best clean BPB candidates in order:
  openai#2019 1.05847 (HF, confirmed)
  openai#1953 1.05855 (AMBIGUOUS, leans CLEAN)
  openai#1945 1.05943 (HF, confirmed via re-audit)
  openai#2031 1.05985 (HF, confirmed)
  openai#1908 1.06081 (HF, confirmed)
  openai#1851 1.06128 (HF, MERGED SOTA)
@cocohearts
Copy link
Copy Markdown
Collaborator

Leaderboard audit note (pre-cutoff state): I don't think this is valid as a record row. The n-gram path is not token-only in the submitted logs: seed42 reports nonzero within/word/agree gates (within_gate=9866847, word_gate=2891588, agree2plus=303177). Those gates depend on current target-token class/word information and hit the same C1 issue as the earlier within/word n-gram submissions. A token-only legal path should have those gates disabled/zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants