Record: SP8192 + SLOT scored-position + cross-batch EMA warmup: val_bpb=0.94569 by davie2009kh · Pull Request #1929 · openai/parameter-golf

davie2009kh · 2026-04-29T13:06:06Z

SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Pre-Quant TTT + Scored-Position SLOT

val_bpb: 0.94569 (3-seed mean: seeds 1337, 42, 2025)

Seed	val_bpb
1337	0.95036466
42	0.96014543
2025	0.92656976

What this submission does

This builds on the kilojoules/alertcat PR #1738 stack and adds Scored-Position SLOT at evaluation time:

Per-sample delta [bsz, 1, d_model] and logit_bias [bsz, 1, vocab] in fp32
AdamW optimizer, 24 steps, cosine LR 0.008 → 0.0008
Optimization target: scored positions only (past tokens, no look-ahead)
Cross-batch EMA warmup (decay=0.5): converged delta/logit_bias means are carried forward as initialization for the next batch, giving each batch a head start on convergence at zero extra parameter cost
SLOT runs only during eval_val_sliding on the quantized model; training is unmodified

Training wall-clock: ~588s (within 600s budget).
Eval wall-clock: ~1405s (SLOT optimization over full val set).
Artifact size: ~15.87MB (within 16MB budget).

Base stack

SP8192 + CaseOps tokenizer (@romeerp PR Record: CaseOps Tokenizer + Tapered WD - val_bpb 1.0678 (3-seed mean) #1729)
GPTQ SDClip INT6 + Brotli compression (@clarkkev PR Record: SP8192 + GPTQ Embeddings + Depth Recurrence + MuonEq-R + SDClip — val_bpb 1.08563 (5 seed mean) #1394)
3-Layer Depth Recurrence L3-5 (@dexhunter PR Record: MuonEq-R + 3-Layer Recurrence + WD=0.095 + MLR=0.022 + All-Int6 — val_bpb 1.0900 (3-seed mean) #1331)
Parallel Residuals L7+ (@Robby955 PR Record: SP8192 + Parallel Residuals + Hessian-Aware SDClip — val_bpb 1.08354 (3-seed mean) #1412)
QK-Gain 5.25 (@stukenov PR Record: Pre-quant AdamW TTT + QK-Gain 4.0 — val_bpb 1.1025 (3-seed mean) #1364)
EMA 0.9965 + Muon with Huber-WD (@bigbag PR Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean) #1493)
Legal score-first TTT framework (@AjAnubolu PR Record: SP8192 + Parallel Pre-Quant TTT — val_bpb 1.0429 (3-seed mean) #1735)
3-epoch parallel pre-quant AdamW TTT + budget guard (@alertcat PR Record: PR #1735 + CaseOps Tokenizer V15 (val_bpb 1.03540, mean of 3 seeds) #1738, @kilojoules)

SLOT lineage

Scored-Position SLOT is inspired by @resouer PR #1229. Key differences: per-sample (not shared) delta, cross-batch EMA prior warmup, and restriction to scored positions only via a boolean mask.

…on (SDPA-friendly) — val_bpb 1.07037 (3-seed mean)

- spec 060N: compound AWQ-lite (PR openai#1908) + 4 TTT phases + 3000 prefix + 2 global-SGD epochs, eval-only on 060A's final_model.pt. Single-shot compound to use openai#1918's ~205s eval-time slack; safe fallback drops GLOBAL_TTT_EPOCHS if wallclock blows. - new idea 1925-matrix-lr-ttt-prefix-tune (PR openai#1925, hyperparam-only on openai#1855: MATRIX_LR=0.028 + PHASED_TTT_PREFIX_DOCS=3500 → 1.06109). - new idea 1915-per-doc-lora-ttt (PR openai#1915, per-doc-only LoRA TTT discipline; parked as fallback if global-SGD class is ruled out). - frontier scan: 21 new PRs (openai#1906-openai#1931). Headline: PRs openai#1908+openai#1918 independently confirm AWQ-lite mixed-bit GPTQ pattern at ~1.0608 on openai#1855 base; openai#1925 hyperparam-only at 1.06109; openai#1923 Asymmetric Logit Rescale = empirical negative; openai#1929 banned SLOT+prequant-TTT. - frontier-state.json: 21 PRs added; total 200. - diary/2026-04-29-frontier-scan.md: full scan report. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ams) After 4 parallel research agents reviewed 30+ open PRs and compliance issues, two new findings: 1. PR openai#1923 (AsymLogit) flagged "empirical negative" by sunnypatneedi 4-29 frontier-scan, BUT only on PR openai#1855 base with default WD=1.0. Never tested on PR openai#1908 + WD=2.0 combo. V19's specific stack is NOT directly invalidated. 2. PR openai#1925 simon-marcus 1.06049 (3-seed verified, vs PR openai#1855 base 1.06108 = -0.00059 BPB). Just 2 hparam env vars: MATRIX_LR 0.026 -> 0.028 PHASED_TTT_PREFIX_DOCS 2500 -> 3500 Orthogonal axis to AsymLogit (LR/TTT prefix vs logit head). Adds two new scout scripts: - run_v19c_stacked_scout.sh: PR openai#1908 + AsymLogit + simon-marcus + WD=2.0 (full stack, recommended first scout) - run_v19b_simonmarcus_scout.sh: PR openai#1908 + simon-marcus + WD=2.0 (ablation if V19c wins partially) Decision rule (CaseOps val baseline 0.97651, community floor 0.0006): V19c < 0.97591 -> CLEAR WIN, run 3-seed V19c 0.97591-0.9755 -> borderline, ablate via V19a/V19b V19c > 0.9755 -> abandon stack, try Lead B (PR openai#1884) Other research findings: - PR openai#1898 SpinQuant flagged regression vs parent openai#1851 (skip) - PR openai#1929 SLOT banned per openai#1722 precedent - PR openai#1911 pre-quant TTT chain banned per openai#1735 precedent - cocohearts 4-28 PR openai#1902 confirmed PR openai#1855 as official openai#1 - regina-openai + Alex Zhao 48h zero activity - CaseOps de-facto legal (PR openai#1855 merged into chain)

anmarhindi · 2026-04-29T15:26:57Z

This appears to be a score-before-update violation. As described, pre_quant_adamw_ttt performs AdamW adaptation over the full validation stream before scoring the same stream. Therefore the model used to score token x_t has already been updated using x_t.

Under the rules, the predictive distribution for x_t must be fixed before observing or updating on x_t; only after the score is recorded may x_t be used to update state for future tokens. If the 28-epoch validation adaptation happens before scoring, this is score-after-adapt TTT, not legal score-first adaptation.

davie2009kh added 2 commits April 24, 2026 21:02

Record attempt: SP8192 + 3-Epoch Parallel Pre-Quant TTT + Huber WD Mu…

7e16e60

…on (SDPA-friendly) — val_bpb 1.07037 (3-seed mean)

SP8192 + SLOT scored-position + cross-batch EMA warmup: val_bpb=0.94569

abb4050

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SP8192 + SLOT scored-position + cross-batch EMA warmup: val_bpb=0.94569#1929

Record: SP8192 + SLOT scored-position + cross-batch EMA warmup: val_bpb=0.94569#1929
davie2009kh wants to merge 2 commits intoopenai:mainfrom
davie2009kh:submission/slot-scored-position-ema

davie2009kh commented Apr 29, 2026

Uh oh!

anmarhindi commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

davie2009kh commented Apr 29, 2026

SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Pre-Quant TTT + Scored-Position SLOT

What this submission does

Base stack

SLOT lineage

Uh oh!

anmarhindi commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants