Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT by aquariouseworkman · Pull Request #1851 · openai/parameter-golf

aquariouseworkman · 2026-04-27T06:56:51Z

Record: SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT

val_bpb = 1.06128 | ~15.95 MB | 8xH100 SXM

Result

Seed	Pre-TTT BPB	Post-TTT BPB	Artifact (bytes)
42	1.07406	1.06128	15,952,086

Merged SOTA (PR #1493): 1.0810 BPP. Delta: -0.0197 BPP. Clears the 0.005-nat threshold.

Key Change: SmearGate BOS Document Boundary Fix

Builds on PR #1797 stack (PR #1787 base + SmearGate + LQER Asymmetric) but fixes the SmearGate cross-document leakage bug identified by @cocohearts in PR #1797 audit.

The bug: SmearGate 1-token causal lookback does not mask BOS positions, so the final token of document N smears into BOS of document N+1.

The fix (applied in both forward_logits and forward_ttt):

bos_mask = (input_ids[:, 1:] == 1).unsqueeze(-1)
g = g.masked_fill(bos_mask, 0.0)

Technique Stack

Component	Origin
CaseOps bijective case transform	PR #1729 / PR #1736
SparseAttnGate	PR #1787 (nprime06)
SmearGate + BOS fix	PR #1797 + this submission
LQER asymmetric rank-4	PR #1797
Phased TTT (score-first, 3 phases)	PR #1394 / PR #1736
PolarNS + MIN_LR=0.1 + FusedCE	PR #1787
Full Hessian GPTQ + Brotli	PR #1019 / PR #1530

Architecture

11L x 512d x 8H/4KV, MLP 4x, LeakyReLU(0.5)^2, Partial RoPE (16/64 dims), layerwise LN scale, tied embeddings, logit softcap=30.0. Depth recurrence: layers 3-5 looped x2 (activated at frac=0.35). Parallel residuals from layer 8. XSA on all 11 layers. SmearGate window=12.

Compliance

Artifact <= 16,000,000 bytes: 15,952,086 bytes
train_time <= 600s: 599.6s
eval_time <= 600s: 519.5s
Issue A Field Guide to Valid Submissions #1017 Conditions 1-4: All satisfied. SmearGate BOS mask ensures no cross-document leakage.

Credits

@nprime06 -- PR Record: PR #1736 + Polar Express NS + MIN_LR + Sparse Attn Gate + Fused CE + PR #1767 TTT — val_bpb 1.06335 #1787 base stack
@romeerp -- CaseOps transform (PR Record: CaseOps Tokenizer + Tapered WD - val_bpb 1.0678 (3-seed mean) #1729)
@dexhunter -- SmearGate + LQER (PR Record: PR #1787 base + Smear Gate + LQER Asym — val_bpb 1.06157 #1797)
@cocohearts -- Identifying SmearGate BOS bug
@abaybektursun -- Score-first TTT (PR Record: LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1194 (3-seed mean) #549)
@clarkkev -- GPTQ SDClip + SP8192 (PR Record: SP8192 + GPTQ Embeddings + Depth Recurrence + MuonEq-R + SDClip — val_bpb 1.08563 (5 seed mean) #1394)

@cocohearts

…symmetric + Phased TTT val_bpb = 1.06128 | ~15.95 MB | 8xH100 SXM Key Change: SmearGate BOS Document Boundary Fix Builds on PR openai#1797 stack (PR openai#1787 base + SmearGate + LQER Asymmetric) but fixes the SmearGate cross-document leakage bug identified by @cocohearts in PR openai#1797 audit. The bug: SmearGate 1-token causal lookback does not mask BOS positions, so the final token of document N smears into BOS of document N+1. Credits @nprime06 -- PR openai#1787 base stack @romeerp -- CaseOps transform (PR openai#1729) @dexhunter -- SmearGate + LQER (PR openai#1797) @cocohearts -- Identifying SmearGate BOS bug @abaybektursun -- Score-first TTT (PR openai#549) @clarkkev -- GPTQ SDClip + SP8192 (PR openai#1394)

h1beee · 2026-04-27T08:11:43Z

need results on 3 different seeds

… required; PR openai#1848 BPB risk; Day 18 plateau; Session 23 - Merged SOTA still 1.0810 (Day 18, no change since Apr 9) - PPM-D byte mixture confirmed by dexhunter at 1.0322 (PR openai#1857, self-closed) - SmearGate BOS bug documented: prev-token leaks at document boundaries; fix required - PR openai#1848 (newjordan, 0.87980) flagged BPB risk: sibling PR openai#1846 closed same day - PR openai#1858 (0.9946) only covers 8M/40.5M tokens — not leaderboard-comparable - PR openai#1855 (codemath3000, 1.06108) and openai#1851 (aquariouseworkman, 1.06128) both clean - PPM-D wave: PRs openai#1850, openai#1854, openai#1835 await organizer ruling - Added Session 23 lessons to CLAUDE.md - 3 days to deadline (Apr 30) — final GPU run window https://claude.ai/code/session_01RmJtLYUmKNzDgDVTnWoKzU

aquariouseworkman · 2026-04-27T20:31:02Z

need results on 3 different seeds

I ran out of credits on RunPod. This amazing person validated the other 2 for me!

see: #1851

Forward-1-token residual mixer at embedding lane: x_t <- x_t + lambda * sigmoid(W * x_t[:12]) * x_{t-1} The model gets a learnable bias toward bigram features without needing attention to discover it. Tiny (13 params total: 12-wide linear + scalar lambda). Zero-init lambda = transparent at start. BOS-fix prevents cross-document leakage during packed training: gate is masked to 0 at positions where input_ids == BOS_TOKEN_ID (default 1). Both smear_gate.weight and smear_lambda match 'smear' pattern -> route to scalar AdamW, not Muon. Both at GPT-level (not blocks), so explicitly appended to scalar_params in Optimizers.

@aquariouseworkman

- Adds 2-line BOS mask in both forward_logits and forward_ttt SmearGate paths. Before fix, the last token of doc N smeared into the BOS of doc N+1 — model-quality bug, not a C1 issue. Identical fix to PR openai#1851 @aquariouseworkman, audit by @cocohearts. - runpod/phase_g_3seed.sh: full 3-seed driver. Sets PR openai#1797 stack env vars + the PR openai#1855 9-hparam greedy stack delta: MLP_CLIP_SIGMAS=11.5 EMBED_CLIP_SIGMAS=14.0 WARMDOWN_FRAC=0.85 BETA2=0.99 TTT_BETA2=0.99 TTT_WEIGHT_DECAY=0.5 TTT_LORA_RANK=80 SPARSE_ATTN_GATE_SCALE=0.5 PHASED_TTT_PREFIX_DOCS=2500 Mixers (NGRAM/TEMP) stay OFF — pure neural baseline + bug fix + hparam stack. Auto-runs Welch t-test vs PR openai#1797 (1.06157±0.00066). - TTT 4-epoch (PR openai#1812) explicitly NOT adopted: that scheme targets the PR openai#1493 SGD-on-whole-model TTT path, not the PR openai#1797 LoRA-phased per-doc-reset path we're on. No clean mapping. Legality: all 16/16 unit tests still pass. BOS fix preserves causality (it only zeroes a gate at positions where current token is BOS, never references future tokens).

Christopher-Lee-McClendon mentioned this pull request Apr 27, 2026

Non-record: Polar Express NS Coefficient Ablation on #1809 (val_bpb 1.08154) #1831

Open

Christopher-Lee-McClendon mentioned this pull request Apr 27, 2026

Record: SmearGate BOS Fix 3-Seed Reproduction — val_bpb 1.06145 (3-seed mean) #1868

Open

Meirzhan05 mentioned this pull request Apr 28, 2026

Record: AttnOutGate + SmearGate + Softcap 15 — val_bpb 1.07750 (3-seed mean) #1880

Open

someone114514 mentioned this pull request Apr 28, 2026

Experiment: SmearGate BOS Fix + train-only logit calibration #1884

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT#1851

Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT#1851
aquariouseworkman wants to merge 1 commit intoopenai:mainfrom
aquariouseworkman:main

aquariouseworkman commented Apr 27, 2026

Uh oh!

h1beee commented Apr 27, 2026

Uh oh!

aquariouseworkman commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aquariouseworkman commented Apr 27, 2026

Record: SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT

Result

Key Change: SmearGate BOS Document Boundary Fix

Technique Stack

Architecture

Compliance

Credits

Uh oh!

h1beee commented Apr 27, 2026

Uh oh!

aquariouseworkman commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants