Record candidate: 1.06032 CaseOps + Matrix-LR 0.028 + TTT n=1 by simon-marcus · Pull Request #1925 · openai/parameter-golf

simon-marcus · 2026-04-29T11:36:39Z

Record candidate: CaseOps + Matrix-LR 0.028 + TTT n=1 / LoRA LR 8e-5

val_bpb: 1.06032 (seed-matched 3-seed composite mean vs #1855, std 0.00114) | 15.90 MB max | 8xH100 SXM | 600s train | score-first TTT eval

This updates the original PR #1925 result by reusing the same trained/quantized artifacts and changing only the legal score-first TTT eval procedure:

PHASED_TTT_PREFIX_DOCS=3500
PHASED_TTT_NUM_PHASES=1
TTT_LORA_LR=0.00008

No retraining or re-quantization is included in the updated headline result. The composite logs make that explicit: each contains the original train/quant section, a COMPOSITE EVAL-ONLY UPDATE FOR PR #1925 marker, and the updated TTT eval continuation.

Seed-Matched 3-Seed Results

Primary score report uses the exact seed set from #1855 (42, 0, 1234) for a direct paired comparison.

Seed	Updated log	Steps	Pre-quant BPB	Quantized BPB	Updated TTT BPB	Artifact bytes	Eval time	Delta vs #1855
42	`train_seed42_ttt_n1_lora8e5.log`	4,994	1.06350701	1.07204922	1.05906444	15,896,241	367.2s	-0.00083010
0	`train_seed0_ttt_n1_lora8e5.log`	4,975	1.06523234	1.07359331	1.06059202	15,898,523	370.1s	-0.00065411
1234	`train_seed1234_ttt_n1_lora8e5.log`	4,965	1.06571315	1.07432062	1.06129561	15,902,776	367.8s	-0.00079134
Mean		4,978	1.06481750	1.07332105	1.06031736	15,899,180	368.4s	-0.00075852

Seed-matched std over the updated TTT BPBs is 0.00114. The matched #1855 mean is 1.06107587, so this improves the paired comparison by 0.00075852 BPB. It also improves the original PR #1925 mean 1.06049099 by 0.00017363 BPB, with all three matched seeds individually better.

What Changed Since Initial PR #1925

Updated train_gpt.py default TTT_LORA_LR from 0.0001 to 0.00008.
Kept the existing PHASED_TTT_PREFIX_DOCS=3500 and PHASED_TTT_NUM_PHASES=1 default.
Added TTT_EVAL_UPDATE.md with the compact matched eval-only table.
Added composite logs for seeds 0, 42, and 1234.
Did not add model artifacts; only logs/docs/code metadata are included.

Compliance / Legality

Training is capped at 600s on 8xH100; seed-matched logs show 599.45s to 599.64s.
Updated TTT eval is under 600s; observed range 367.2s to 370.1s.
All artifacts are under 16,000,000 bytes; seed-matched max observed 15,902,776.
TTT is score-first and single-pass: each chunk is evaluated before adaptation and not rescored.
No validation tokens are used for training or pre-quant adaptation.
No SLOT.
No n-gram cache and no logit bias.
No ETLB.

Key Techniques

CaseOps SP8192 tokenizer and byte-sidecar path, using the lossless caps reserved tokenizer.
11-layer 512d XSA stack with U-Net skips, parallel decoder, depth recurrence, SparseAttnGate, BOS-fixed SmearGate, and LeakyReLU(0.5)^2 MLP.
Polar-Express Newton-Schulz Muon plus the tuned quant/compression stack: GPTQ int6 matrices, int7 embeddings, int8 row gate, LQER asymmetric rank-4 correction, and per-group lrzip + brotli compression.
Final deltas: MATRIX_LR=0.028, PHASED_TTT_PREFIX_DOCS=3500, PHASED_TTT_NUM_PHASES=1, and TTT_LORA_LR=0.00008.
Score-first phased TTT stays on the post-quant model and scores every chunk before any update.

Reproduction

DATA_PATH=./data/datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved \
TOKENIZER_PATH=./data/tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model \
CASEOPS_ENABLED=1 \
VOCAB_SIZE=8192 \
ITERATIONS=20000 \
MAX_WALLCLOCK_SECONDS=600 \
TTT_ENABLED=1 \
PHASED_TTT_ENABLED=1 \
PHASED_TTT_PREFIX_DOCS=3500 \
PHASED_TTT_NUM_PHASES=1 \
TTT_LORA_LR=0.00008 \
EMBED_BITS=7 \
MATRIX_LR=0.028 \
MIN_LR=0.1 \
MLP_CLIP_SIGMAS=11.5 \
ATTN_CLIP_SIGMAS=13.0 \
EMBED_CLIP_SIGMAS=14.0 \
GRAD_CLIP_NORM=0.3 \
TTT_CHUNK_SIZE=48 \
WARMUP_STEPS=20 \
MUON_BACKEND_STEPS=5 \
GLOBAL_TTT_MOMENTUM=0.9 \
WARMDOWN_FRAC=0.85 \
BETA2=0.99 \
TTT_BETA2=0.99 \
TTT_WEIGHT_DECAY=0.5 \
TTT_LORA_RANK=80 \
SPARSE_ATTN_GATE_SCALE=0.5 \
GPTQ_RESERVE_SECONDS=0.5 \
GPTQ_CALIBRATION_BATCHES=16 \
VAL_LOSS_EVERY=0 \
TRAIN_LOG_EVERY=50 \
GATED_ATTN_QUANT_GATE=1 \
SPARSE_ATTN_GATE_ENABLED=1 \
GATE_WINDOW=12 \
SMEAR_GATE_ENABLED=1 \
LQER_ENABLED=1 \
LQER_ASYM_ENABLED=1 \
LQER_RANK=4 \
LQER_FACTOR_BITS=4 \
LQER_ASYM_GROUP=64 \
LQER_TOP_K=3 \
FUSED_CE_ENABLED=1 \
COMPRESSOR=pergroup \
NCCL_NET=Socket \
SEED=42 \
torchrun --standalone --nproc_per_node=8 train_gpt.py

- spec 060N: compound AWQ-lite (PR openai#1908) + 4 TTT phases + 3000 prefix + 2 global-SGD epochs, eval-only on 060A's final_model.pt. Single-shot compound to use openai#1918's ~205s eval-time slack; safe fallback drops GLOBAL_TTT_EPOCHS if wallclock blows. - new idea 1925-matrix-lr-ttt-prefix-tune (PR openai#1925, hyperparam-only on openai#1855: MATRIX_LR=0.028 + PHASED_TTT_PREFIX_DOCS=3500 → 1.06109). - new idea 1915-per-doc-lora-ttt (PR openai#1915, per-doc-only LoRA TTT discipline; parked as fallback if global-SGD class is ruled out). - frontier scan: 21 new PRs (openai#1906-openai#1931). Headline: PRs openai#1908+openai#1918 independently confirm AWQ-lite mixed-bit GPTQ pattern at ~1.0608 on openai#1855 base; openai#1925 hyperparam-only at 1.06109; openai#1923 Asymmetric Logit Rescale = empirical negative; openai#1929 banned SLOT+prequant-TTT. - frontier-state.json: 21 PRs added; total 200. - diary/2026-04-29-frontier-scan.md: full scan report. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…ams) After 4 parallel research agents reviewed 30+ open PRs and compliance issues, two new findings: 1. PR openai#1923 (AsymLogit) flagged "empirical negative" by sunnypatneedi 4-29 frontier-scan, BUT only on PR openai#1855 base with default WD=1.0. Never tested on PR openai#1908 + WD=2.0 combo. V19's specific stack is NOT directly invalidated. 2. PR openai#1925 simon-marcus 1.06049 (3-seed verified, vs PR openai#1855 base 1.06108 = -0.00059 BPB). Just 2 hparam env vars: MATRIX_LR 0.026 -> 0.028 PHASED_TTT_PREFIX_DOCS 2500 -> 3500 Orthogonal axis to AsymLogit (LR/TTT prefix vs logit head). Adds two new scout scripts: - run_v19c_stacked_scout.sh: PR openai#1908 + AsymLogit + simon-marcus + WD=2.0 (full stack, recommended first scout) - run_v19b_simonmarcus_scout.sh: PR openai#1908 + simon-marcus + WD=2.0 (ablation if V19c wins partially) Decision rule (CaseOps val baseline 0.97651, community floor 0.0006): V19c < 0.97591 -> CLEAR WIN, run 3-seed V19c 0.97591-0.9755 -> borderline, ablate via V19a/V19b V19c > 0.9755 -> abandon stack, try Lead B (PR openai#1884) Other research findings: - PR openai#1898 SpinQuant flagged regression vs parent openai#1851 (skip) - PR openai#1929 SLOT banned per openai#1722 precedent - PR openai#1911 pre-quant TTT chain banned per openai#1735 precedent - cocohearts 4-28 PR openai#1902 confirmed PR openai#1855 as official openai#1 - regina-openai + Alex Zhao 48h zero activity - CaseOps de-facto legal (PR openai#1855 merged into chain)

…LoRA LR=8e-5) eval-only on spec 250 outputs

simon-marcus · 2026-05-04T16:28:03Z

@cocohearts Gently flagging a possible missed row: #1925.

I think it may have fallen into a scan-range gap. #1902 says it scanned PRs #1494-#1908, and the subsequent audit PR #2146 says it scanned #1944-#2140; so my #1925 may have "fallen between the chairs," as they say.

The chronology is a little hard to see from final PR states because both #1925 and #1945 were updated after opening, but here's how it looks:

#1925 opened 2026-04-29T11:36:39Z
#1925 commit 77c39308 landed 2026-04-29T14:24:29Z with mean 1.06049099
#1945 opened later at 2026-04-29T19:20:30Z
#1925 final eval update commit 7cd4308d landed 2026-04-29T20:26:14Z, improving the same PR to mean 1.06031736
#1945’s first strict-under-600s V21 v2 result appears to be commit 70067534 at 2026-04-29T21:45:35Z; the earlier 3f49b5e2 result had the seed-42 602.048s issue discussed here
I flagged #1925 on #1902 at 2026-04-29T22:18:56Z

I don’t think #1925 affects the final top row. But if technically acceptable, it seems like it would be a chronological frontier/support row after 77c39308. The final #1925 mean is 1.06031736 using the #1855 matched seeds 42/0/1234, improving all three matched seeds vs #1855.

No worries if I’m missing an exclusion rationale; just flagging because the scan ranges appear to skip this PR numerically. Thanks.

simon-marcus added 2 commits April 29, 2026 07:33

Add CaseOps matrix LR TTT3500 submission

b30691f

Update CaseOps seed-matched results

77c3930

simon-marcus changed the title ~~Record candidate: CaseOps + Matrix-LR 0.028 + Phased TTT 3500~~ Record candidate: 1.06049 CaseOps + Matrix-LR 0.028 + Phased TTT 3500 Apr 29, 2026

Update PR1925 TTT eval result

7cd4308

simon-marcus changed the title ~~Record candidate: 1.06049 CaseOps + Matrix-LR 0.028 + Phased TTT 3500~~ Record candidate: 1.06032 CaseOps + Matrix-LR 0.028 + TTT n=1 Apr 29, 2026

simon-marcus mentioned this pull request Apr 29, 2026

Update Parameter Golf leaderboard with BOS fix #1902

Merged

leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 30, 2026

spec 253: PR openai#1925 TTT-recipe port (single-phase, prefix=3500, …

ecfc5f7

…LoRA LR=8e-5) eval-only on spec 250 outputs

leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request May 1, 2026

spec 253: PR openai#1925 TTT-recipe port (single-phase, prefix=3500, …

c372c4b

…LoRA LR=8e-5) eval-only on spec 250 outputs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record candidate: 1.06032 CaseOps + Matrix-LR 0.028 + TTT n=1#1925

Record candidate: 1.06032 CaseOps + Matrix-LR 0.028 + TTT n=1#1925
simon-marcus wants to merge 3 commits intoopenai:mainfrom
simon-marcus:codex/caseops-matrixlr-ttt3500

simon-marcus commented Apr 29, 2026 •

edited

Loading

Uh oh!

simon-marcus commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

simon-marcus commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Record candidate: CaseOps + Matrix-LR 0.028 + TTT n=1 / LoRA LR 8e-5

Seed-Matched 3-Seed Results

What Changed Since Initial PR #1925

Compliance / Legality

Key Techniques

Reproduction

Uh oh!

simon-marcus commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

simon-marcus commented Apr 29, 2026 •

edited

Loading