Skip to content

Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT#1851

Open
aquariouseworkman wants to merge 1 commit intoopenai:mainfrom
aquariouseworkman:main
Open

Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT#1851
aquariouseworkman wants to merge 1 commit intoopenai:mainfrom
aquariouseworkman:main

Conversation

@aquariouseworkman
Copy link
Copy Markdown
Contributor

Record: SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT

val_bpb = 1.06128 | ~15.95 MB | 8xH100 SXM

Result

Seed Pre-TTT BPB Post-TTT BPB Artifact (bytes)
42 1.07406 1.06128 15,952,086

Merged SOTA (PR #1493): 1.0810 BPP. Delta: -0.0197 BPP. Clears the 0.005-nat threshold.

Key Change: SmearGate BOS Document Boundary Fix

Builds on PR #1797 stack (PR #1787 base + SmearGate + LQER Asymmetric) but fixes the SmearGate cross-document leakage bug identified by @cocohearts in PR #1797 audit.

The bug: SmearGate 1-token causal lookback does not mask BOS positions, so the final token of document N smears into BOS of document N+1.

The fix (applied in both forward_logits and forward_ttt):

bos_mask = (input_ids[:, 1:] == 1).unsqueeze(-1)
g = g.masked_fill(bos_mask, 0.0)

Technique Stack

Component Origin
CaseOps bijective case transform PR #1729 / PR #1736
SparseAttnGate PR #1787 (nprime06)
SmearGate + BOS fix PR #1797 + this submission
LQER asymmetric rank-4 PR #1797
Phased TTT (score-first, 3 phases) PR #1394 / PR #1736
PolarNS + MIN_LR=0.1 + FusedCE PR #1787
Full Hessian GPTQ + Brotli PR #1019 / PR #1530

Architecture

11L x 512d x 8H/4KV, MLP 4x, LeakyReLU(0.5)^2, Partial RoPE (16/64 dims), layerwise LN scale, tied embeddings, logit softcap=30.0. Depth recurrence: layers 3-5 looped x2 (activated at frac=0.35). Parallel residuals from layer 8. XSA on all 11 layers. SmearGate window=12.

Compliance

  • Artifact <= 16,000,000 bytes: 15,952,086 bytes
  • train_time <= 600s: 599.6s
  • eval_time <= 600s: 519.5s
  • Issue A Field Guide to Valid Submissions #1017 Conditions 1-4: All satisfied. SmearGate BOS mask ensures no cross-document leakage.

Credits

…symmetric + Phased TTT

val_bpb = 1.06128 | ~15.95 MB | 8xH100 SXM

Key Change: SmearGate BOS Document Boundary Fix
Builds on PR openai#1797 stack (PR openai#1787 base + SmearGate + LQER Asymmetric) but fixes the SmearGate cross-document leakage bug identified by @cocohearts in PR openai#1797 audit.

The bug: SmearGate 1-token causal lookback does not mask BOS positions, so the final token of document N smears into BOS of document N+1.

Credits
@nprime06 -- PR openai#1787 base stack
@romeerp -- CaseOps transform (PR openai#1729)
@dexhunter -- SmearGate + LQER (PR openai#1797)
@cocohearts -- Identifying SmearGate BOS bug
@abaybektursun -- Score-first TTT (PR openai#549)
@clarkkev -- GPTQ SDClip + SP8192 (PR openai#1394)
@h1beee
Copy link
Copy Markdown

h1beee commented Apr 27, 2026

need results on 3 different seeds

sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 27, 2026
… required; PR openai#1848 BPB risk; Day 18 plateau; Session 23

- Merged SOTA still 1.0810 (Day 18, no change since Apr 9)
- PPM-D byte mixture confirmed by dexhunter at 1.0322 (PR openai#1857, self-closed)
- SmearGate BOS bug documented: prev-token leaks at document boundaries; fix required
- PR openai#1848 (newjordan, 0.87980) flagged BPB risk: sibling PR openai#1846 closed same day
- PR openai#1858 (0.9946) only covers 8M/40.5M tokens — not leaderboard-comparable
- PR openai#1855 (codemath3000, 1.06108) and openai#1851 (aquariouseworkman, 1.06128) both clean
- PPM-D wave: PRs openai#1850, openai#1854, openai#1835 await organizer ruling
- Added Session 23 lessons to CLAUDE.md
- 3 days to deadline (Apr 30) — final GPU run window

https://claude.ai/code/session_01RmJtLYUmKNzDgDVTnWoKzU
@aquariouseworkman
Copy link
Copy Markdown
Contributor Author

need results on 3 different seeds

I ran out of credits on RunPod. This amazing person validated the other 2 for me!

see: #1851

Meirzhan05 added a commit to Meirzhan05/parameter-golf that referenced this pull request Apr 28, 2026
Forward-1-token residual mixer at embedding lane:
  x_t <- x_t + lambda * sigmoid(W * x_t[:12]) * x_{t-1}

The model gets a learnable bias toward bigram features without needing
attention to discover it. Tiny (13 params total: 12-wide linear + scalar lambda).
Zero-init lambda = transparent at start.

BOS-fix prevents cross-document leakage during packed training: gate is
masked to 0 at positions where input_ids == BOS_TOKEN_ID (default 1).

Both smear_gate.weight and smear_lambda match 'smear' pattern -> route to
scalar AdamW, not Muon. Both at GPT-level (not blocks), so explicitly
appended to scalar_params in Optimizers.
Fija pushed a commit to Fija/parameter-golf that referenced this pull request Apr 28, 2026
- Adds 2-line BOS mask in both forward_logits and forward_ttt SmearGate
  paths. Before fix, the last token of doc N smeared into the BOS of doc
  N+1 — model-quality bug, not a C1 issue. Identical fix to PR openai#1851
  @aquariouseworkman, audit by @cocohearts.

- runpod/phase_g_3seed.sh: full 3-seed driver. Sets PR openai#1797 stack env
  vars + the PR openai#1855 9-hparam greedy stack delta:
    MLP_CLIP_SIGMAS=11.5 EMBED_CLIP_SIGMAS=14.0 WARMDOWN_FRAC=0.85
    BETA2=0.99 TTT_BETA2=0.99 TTT_WEIGHT_DECAY=0.5 TTT_LORA_RANK=80
    SPARSE_ATTN_GATE_SCALE=0.5 PHASED_TTT_PREFIX_DOCS=2500
  Mixers (NGRAM/TEMP) stay OFF — pure neural baseline + bug fix +
  hparam stack. Auto-runs Welch t-test vs PR openai#1797 (1.06157±0.00066).

- TTT 4-epoch (PR openai#1812) explicitly NOT adopted: that scheme targets the
  PR openai#1493 SGD-on-whole-model TTT path, not the PR openai#1797 LoRA-phased
  per-doc-reset path we're on. No clean mapping.

Legality: all 16/16 unit tests still pass. BOS fix preserves causality
(it only zeroes a gate at positions where current token is BOS, never
references future tokens).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants