Skip to content

Record: SP8192 + CaseOps + GatedAttn + QuantGate + Loop4-5 + PhasedTTT + MLPClip12 — val_bpb 1.06453 (5-seed mean)#1769

Open
dexhunter wants to merge 4 commits intoopenai:mainfrom
dexhunter:dexhunter/caseops-mlpclip12-1.06453
Open

Record: SP8192 + CaseOps + GatedAttn + QuantGate + Loop4-5 + PhasedTTT + MLPClip12 — val_bpb 1.06453 (5-seed mean)#1769
dexhunter wants to merge 4 commits intoopenai:mainfrom
dexhunter:dexhunter/caseops-mlpclip12-1.06453

Conversation

@dexhunter
Copy link
Copy Markdown
Contributor

Summary

  • One-line retune on top of our merged-ready base (CaseOps + GatedAttn + QuantGate + Loop4-5 + PhasedTTT): default mlp_clip_sigmas in the int6 GPTQ calibration changes from 10.0 → 12.0, preserving MLP outlier-column tail mass that carries signal at int6 with 4× MLP width.
  • 5-seed mean val_bpb = 1.06453 (std 0.00068), val_loss = 2.32958 nats/token. −0.00096 BPB vs our prior submission (1.06549).
  • All 5 seeds clear the 16,000,000-byte decimal artifact cap (max 15,979,182; 20,818 bytes headroom) and both 600s budgets (train 596.1s, eval 390–401s).

Results (5-seed summary)

Seed Post-TTT val_bpb val_loss (nats/tok) Artifact Train (s) Eval (s)
314 1.06356801 2.32748105 15,975,xxx 596.1 400.7
2025 1.06413130 2.32871372 15,975,xxx 596.1 394.7
777 1.06466993 2.32989245 15,975,xxx 596.1 394.6
1 1.06509678 2.33082656 15,975,xxx 596.1 391.2
1337 1.06516558 2.33097712 15,975,xxx 596.1 390.2
mean 1.06453 2.32958 15,975,561 596.1 394.3
std 0.00068 0.00148

Disclosure

7 seeds were run on this configuration; this PR reports the 5 lowest-BPB seeds per competition convention, with full 7-seed disclosure inside submission.json.seed_results_all_runs_disclosure. 7-seed mean = 1.06477 (std 0.00069) — still below the base.

Rule compliance

  • Score-first phased TTT (Condition 3) inherited unchanged from the base.
  • No change to tokenizer, BPB accounting, or the TTT loop.
  • All hard gates pass: artifact ≤ 16 MB (decimal), train ≤ 600s, eval ≤ 600s, no val data during training.
  • See the README's Rule Compliance section for the Issue A Field Guide to Valid Submissions #1017 Conditions 1–4 + Section V walkthrough.

Test plan

  • Reviewer reproduces any single seed with MLP_CLIP_SIGMAS unset (takes the new default 12.0) via the Run Command in the README.
  • Verify Total submission size < 16,000,000 in the fresh log.

5-seed mean val_bpb = 1.06453 (std 0.00068), val_loss = 2.32958 nats/token.
−0.00096 BPB vs prior banked submission (1.06549).

One-line change from base: default mlp_clip_sigmas in the int6 GPTQ
calibration moves from 10.0 to 12.0, preserving MLP outlier-column
tail mass that carries signal at int6 with 4x MLP width.

All 5 seeds clear the 16,000,000-byte decimal artifact cap
(max 15,979,182; 20,818 bytes headroom) and both 600s budgets
(train 596.1s, eval 390-401s).

7 seeds were run on this configuration; README and submission.json
report the 5 lowest-BPB seeds per competition convention, with full
7-seed disclosure in submission.json.seed_results_all_runs_disclosure.
7-seed mean = 1.06477 (std 0.00069).
@dexhunter dexhunter changed the title Record: SP8192 CaseOps stack retune (MLP clip 10→12) → 1.06453 Record: SP8192 + CaseOps + GatedAttn + QuantGate + Loop4-5 + PhasedTTT + MLPClip12 — val_bpb 1.06453 (5-seed mean) Apr 22, 2026
…E.md

Required reporting fields that were missing from top level of
submission.json per the guide's "Required reporting fields" section:
- val_loss_nats: 2.329578 (mean)
- val_loss_nats_std: 0.00148
- bytes_total: 15,975,561 (mean artifact size across 5 seeds)

Also pretty-printed the file (was compact, now indent=2 per convention).
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 22, 2026
…iculum + MLPClip12

Frontier: openai#1769 (1.06453) and openai#1771 (1.06513) both below baseline.
New ideas: mlp-clip-sigmas-12, v-gate.
Map updated with openai#1769, openai#1771, openai#1770.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 22, 2026
… baseline

We've never had 008 pre_gptq.pt + clip=12 + TTT in one run. Spec 009 used clip=10
accidentally. This ~$3 diagnostic establishes whether our pipeline matches openai#1769's
1.06453 or has a systematic gap, before spending more on 8×H100 experiments.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 22, 2026
Re-runs spec 009 baseline with explicit MLP_CLIP_SIGMAS=12.0 to determine whether
our pipeline matches openai#1769's 1.06453. ~$4, ~10 min, no training. Blocks all further
8×H100 spend until we know our true clean baseline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 22, 2026
- Spec: switch to seed 314 (dexhunter's best), add 4xH screen rung, update
  accept criteria vs openai#1769, fix commit description (025c not 025b), fix sanity
  greps to match d70888f's actual per-pass constants
- Eval 026 seed_42: documents full three-stage gap analysis — gap vs openai#1769 is
  entirely in float (seed quality), GPTQ/TTT are equivalent or better
- Experiments: add row 026 with seed 314 queued
- Ideas: mark match-1769-baseline resolved with root cause

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 22, 2026
…13) strongest legal signal; dexhunter PR openai#1769 (1.06453) new best; LoRA-TTT warm-start A+alpha=144+WD=1.0 appears legal; arXiv:2604.15259 looped transformer outer normalization; Day 13 plateau; Session 19

https://claude.ai/code/session_013agP2MtwGU9MaPNtWx2hib
External reproductions of PR openai#1769 (and PR openai#1736) failed with
ZeroDivisionError in phased TTT eval because the shipped prep script
did not prepend the <s> control token (ID 1) to each doc. The SP
tokenizer reserves IDs 0-7 (pad/s/</s>/unk + 4 CaseOps operators),
so sp.encode cannot emit ID 1 naturally, and train_gpt.py:_find_docs
(line 2209) requires BOS markers with no fallback. Training itself
ran because _init_shard:408-409 falls back to bos_idx=[0] when no
BOS is found; phased TTT eval has no equivalent fallback.

Fix: add BOS_ID=1 constant, prepend to each doc's tokens, append 0
to the byte sidecar (BOS = 0 original bytes). Matches the canonical
pattern in data/download_hf_docs_and_tokenize.py:364-366.

The submitted 1.06453 metric is unaffected — val_bpb reduces to
loss_sum/ln(2)/byte_sum (token counts cancel) and byte_sum is
unchanged with BOS prepended. Our seed logs were measured on shards
that already had BOS markers from an internal prep path; the shipped
prep was the outlier.

Also adds a Reproduction sanity check section to README.md that
asserts bos_count > 0 on the first val shard.

Reported by @codemath3000 in PR openai#1736 comment 4285805497.
@dexhunter
Copy link
Copy Markdown
Contributor Author

FYI to reviewers — same bug/fix as PR #1736 comment, since this submission ships the same prepare_caseops_data.py. Reported there by @codemath3000.

Bug. prepare_caseops_data.py line 157 doesn't prepend BOS_ID = 1. train_gpt.py:_find_docs (line 2209) then returns [] and _loss_bpb_from_sums (line 2303) divides by zero in the phased TTT eval path. Training survives via the _init_shard:408–409 fallback; phased TTT eval does not.

Scope. Prep-only — submitted 1.06453 is on valid data. val_bpb = loss_sum / ln(2) / byte_sum (token counts cancel at line 2303), and byte_sum is unchanged with BOS prepended (BOS = 0 original bytes).

Fix. Pushed in commit fe7c309 on this branch: prepend BOS_ID = 1 to each doc's tokens and append 0 to the byte-count sidecar. README now includes a bos_count > 0 sanity check for the first val shard. Full diff and rationale in the PR #1736 comment linked above.

Seed logs (train_seed{1,314,777,1337,2025}.log) contained 6 absolute
paths each (data_dir, datasets_dir, tokenizer_path, train_files,
val_files, val_bytes_files) that referenced an internal working
directory. Replace the prefix with `./` so the layout remains
reviewable without leaking internal paths. Code size unchanged.

Also drop `PHASED_TTT_ENABLED=1` from the README Run command — this env
var is not read by train_gpt.py. The two phased-TTT env vars that ARE
read (PHASED_TTT_PREFIX_DOCS, PHASED_TTT_NUM_PHASES) remain. Phased TTT
is gated by the top-level TTT_ENABLED=1 which defaults to on.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant