Record: SP8192 + CaseOps + GatedAttn + QuantGate + Loop4-5 + PhasedTTT + MLPClip12 — val_bpb 1.06453 (5-seed mean) by dexhunter · Pull Request #1769 · openai/parameter-golf

dexhunter · 2026-04-22T03:36:07Z

Summary

One-line retune on top of our merged-ready base (CaseOps + GatedAttn + QuantGate + Loop4-5 + PhasedTTT): default mlp_clip_sigmas in the int6 GPTQ calibration changes from 10.0 → 12.0, preserving MLP outlier-column tail mass that carries signal at int6 with 4× MLP width.
5-seed mean val_bpb = 1.06453 (std 0.00068), val_loss = 2.32958 nats/token. −0.00096 BPB vs our prior submission (1.06549).
All 5 seeds clear the 16,000,000-byte decimal artifact cap (max 15,979,182; 20,818 bytes headroom) and both 600s budgets (train 596.1s, eval 390–401s).

Results (5-seed summary)

Seed	Post-TTT val_bpb	val_loss (nats/tok)	Artifact	Train (s)	Eval (s)
314	1.06356801	2.32748105	15,975,xxx	596.1	400.7
2025	1.06413130	2.32871372	15,975,xxx	596.1	394.7
777	1.06466993	2.32989245	15,975,xxx	596.1	394.6
1	1.06509678	2.33082656	15,975,xxx	596.1	391.2
1337	1.06516558	2.33097712	15,975,xxx	596.1	390.2
mean	1.06453	2.32958	15,975,561	596.1	394.3
std	0.00068	0.00148	—	—	—

Disclosure

7 seeds were run on this configuration; this PR reports the 5 lowest-BPB seeds per competition convention, with full 7-seed disclosure inside submission.json.seed_results_all_runs_disclosure. 7-seed mean = 1.06477 (std 0.00069) — still below the base.

Rule compliance

Score-first phased TTT (Condition 3) inherited unchanged from the base.
No change to tokenizer, BPB accounting, or the TTT loop.
All hard gates pass: artifact ≤ 16 MB (decimal), train ≤ 600s, eval ≤ 600s, no val data during training.
See the README's Rule Compliance section for the Issue A Field Guide to Valid Submissions #1017 Conditions 1–4 + Section V walkthrough.

Test plan

Reviewer reproduces any single seed with MLP_CLIP_SIGMAS unset (takes the new default 12.0) via the Run Command in the README.
Verify Total submission size < 16,000,000 in the fresh log.

5-seed mean val_bpb = 1.06453 (std 0.00068), val_loss = 2.32958 nats/token. −0.00096 BPB vs prior banked submission (1.06549). One-line change from base: default mlp_clip_sigmas in the int6 GPTQ calibration moves from 10.0 to 12.0, preserving MLP outlier-column tail mass that carries signal at int6 with 4x MLP width. All 5 seeds clear the 16,000,000-byte decimal artifact cap (max 15,979,182; 20,818 bytes headroom) and both 600s budgets (train 596.1s, eval 390-401s). 7 seeds were run on this configuration; README and submission.json report the 5 lowest-BPB seeds per competition convention, with full 7-seed disclosure in submission.json.seed_results_all_runs_disclosure. 7-seed mean = 1.06477 (std 0.00069).

…E.md Required reporting fields that were missing from top level of submission.json per the guide's "Required reporting fields" section: - val_loss_nats: 2.329578 (mean) - val_loss_nats_std: 0.00148 - bytes_total: 15,975,561 (mean artifact size across 5 seeds) Also pretty-printed the file (was compact, now indent=2 per convention).

…iculum + MLPClip12 Frontier: openai#1769 (1.06453) and openai#1771 (1.06513) both below baseline. New ideas: mlp-clip-sigmas-12, v-gate. Map updated with openai#1769, openai#1771, openai#1770. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… baseline We've never had 008 pre_gptq.pt + clip=12 + TTT in one run. Spec 009 used clip=10 accidentally. This ~$3 diagnostic establishes whether our pipeline matches openai#1769's 1.06453 or has a systematic gap, before spending more on 8×H100 experiments. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Re-runs spec 009 baseline with explicit MLP_CLIP_SIGMAS=12.0 to determine whether our pipeline matches openai#1769's 1.06453. ~$4, ~10 min, no training. Blocks all further 8×H100 spend until we know our true clean baseline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Spec: switch to seed 314 (dexhunter's best), add 4xH screen rung, update accept criteria vs openai#1769, fix commit description (025c not 025b), fix sanity greps to match d70888f's actual per-pass constants - Eval 026 seed_42: documents full three-stage gap analysis — gap vs openai#1769 is entirely in float (seed quality), GPTQ/TTT are equivalent or better - Experiments: add row 026 with seed 314 queued - Ideas: mark match-1769-baseline resolved with root cause Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…13) strongest legal signal; dexhunter PR openai#1769 (1.06453) new best; LoRA-TTT warm-start A+alpha=144+WD=1.0 appears legal; arXiv:2604.15259 looped transformer outer normalization; Day 13 plateau; Session 19 https://claude.ai/code/session_013agP2MtwGU9MaPNtWx2hib

@codemath3000

External reproductions of PR openai#1769 (and PR openai#1736) failed with ZeroDivisionError in phased TTT eval because the shipped prep script did not prepend the <s> control token (ID 1) to each doc. The SP tokenizer reserves IDs 0-7 (pad/s/</s>/unk + 4 CaseOps operators), so sp.encode cannot emit ID 1 naturally, and train_gpt.py:_find_docs (line 2209) requires BOS markers with no fallback. Training itself ran because _init_shard:408-409 falls back to bos_idx=[0] when no BOS is found; phased TTT eval has no equivalent fallback. Fix: add BOS_ID=1 constant, prepend to each doc's tokens, append 0 to the byte sidecar (BOS = 0 original bytes). Matches the canonical pattern in data/download_hf_docs_and_tokenize.py:364-366. The submitted 1.06453 metric is unaffected — val_bpb reduces to loss_sum/ln(2)/byte_sum (token counts cancel) and byte_sum is unchanged with BOS prepended. Our seed logs were measured on shards that already had BOS markers from an internal prep path; the shipped prep was the outlier. Also adds a Reproduction sanity check section to README.md that asserts bos_count > 0 on the first val shard. Reported by @codemath3000 in PR openai#1736 comment 4285805497.

dexhunter · 2026-04-23T00:52:22Z

FYI to reviewers — same bug/fix as PR #1736 comment, since this submission ships the same prepare_caseops_data.py. Reported there by @codemath3000.

Bug. prepare_caseops_data.py line 157 doesn't prepend BOS_ID = 1. train_gpt.py:_find_docs (line 2209) then returns [] and _loss_bpb_from_sums (line 2303) divides by zero in the phased TTT eval path. Training survives via the _init_shard:408–409 fallback; phased TTT eval does not.

Scope. Prep-only — submitted 1.06453 is on valid data. val_bpb = loss_sum / ln(2) / byte_sum (token counts cancel at line 2303), and byte_sum is unchanged with BOS prepended (BOS = 0 original bytes).

Fix. Pushed in commit fe7c309 on this branch: prepend BOS_ID = 1 to each doc's tokens and append 0 to the byte-count sidecar. README now includes a bos_count > 0 sanity check for the first val shard. Full diff and rationale in the PR #1736 comment linked above.

Seed logs (train_seed{1,314,777,1337,2025}.log) contained 6 absolute paths each (data_dir, datasets_dir, tokenizer_path, train_files, val_files, val_bytes_files) that referenced an internal working directory. Replace the prefix with `./` so the layout remains reviewable without leaking internal paths. Code size unchanged. Also drop `PHASED_TTT_ENABLED=1` from the README Run command — this env var is not read by train_gpt.py. The two phased-TTT env vars that ARE read (PHASED_TTT_PREFIX_DOCS, PHASED_TTT_NUM_PHASES) remain. Phased TTT is gated by the top-level TTT_ENABLED=1 which defaults to on.

dexhunter changed the title ~~Record: SP8192 CaseOps stack retune (MLP clip 10→12) → 1.06453~~ Record: SP8192 + CaseOps + GatedAttn + QuantGate + Loop4-5 + PhasedTTT + MLPClip12 — val_bpb 1.06453 (5-seed mean) Apr 22, 2026

bigbag mentioned this pull request Apr 22, 2026

Record: SP8192 CaseOps + V13 Curriculum + SmearGate + LoRA-TTT — val_bpb 1.06513 (3-seed mean) #1771

Open

3 tasks

dexhunter mentioned this pull request Apr 23, 2026

Record: SP8192 + CaseOps + GatedAttn + QuantGate + Loop45 + PhasedTTT — val_bpb 1.06549 #1736

Open

5 tasks

dexhunter mentioned this pull request Apr 23, 2026

Record: SP8192 + CaseOps + Gated Attention + Quant Gate + Loop4-5 + Phased TTT + Frozen Recurrent Alpha — val_bpb 1.06421 #1779

Open

3 tasks

OE-GOD mentioned this pull request Apr 23, 2026

Record: SP4096 + byte-level PPM adaptive-λ mixture — val_bpb 1.01925 (3-seed) #1785

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SP8192 + CaseOps + GatedAttn + QuantGate + Loop4-5 + PhasedTTT + MLPClip12 — val_bpb 1.06453 (5-seed mean)#1769

Record: SP8192 + CaseOps + GatedAttn + QuantGate + Loop4-5 + PhasedTTT + MLPClip12 — val_bpb 1.06453 (5-seed mean)#1769
dexhunter wants to merge 4 commits intoopenai:mainfrom
dexhunter:dexhunter/caseops-mlpclip12-1.06453

dexhunter commented Apr 22, 2026

Uh oh!

dexhunter commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dexhunter commented Apr 22, 2026

Summary

Results (5-seed summary)

Disclosure

Rule compliance

Test plan

Uh oh!

dexhunter commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant