Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT#1851
Open
aquariouseworkman wants to merge 1 commit intoopenai:mainfrom
Open
Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT#1851aquariouseworkman wants to merge 1 commit intoopenai:mainfrom
aquariouseworkman wants to merge 1 commit intoopenai:mainfrom
Conversation
…symmetric + Phased TTT val_bpb = 1.06128 | ~15.95 MB | 8xH100 SXM Key Change: SmearGate BOS Document Boundary Fix Builds on PR openai#1797 stack (PR openai#1787 base + SmearGate + LQER Asymmetric) but fixes the SmearGate cross-document leakage bug identified by @cocohearts in PR openai#1797 audit. The bug: SmearGate 1-token causal lookback does not mask BOS positions, so the final token of document N smears into BOS of document N+1. Credits @nprime06 -- PR openai#1787 base stack @romeerp -- CaseOps transform (PR openai#1729) @dexhunter -- SmearGate + LQER (PR openai#1797) @cocohearts -- Identifying SmearGate BOS bug @abaybektursun -- Score-first TTT (PR openai#549) @clarkkev -- GPTQ SDClip + SP8192 (PR openai#1394)
|
need results on 3 different seeds |
sunnypatneedi
pushed a commit
to sunnypatneedi/parameter-golf
that referenced
this pull request
Apr 27, 2026
… required; PR openai#1848 BPB risk; Day 18 plateau; Session 23 - Merged SOTA still 1.0810 (Day 18, no change since Apr 9) - PPM-D byte mixture confirmed by dexhunter at 1.0322 (PR openai#1857, self-closed) - SmearGate BOS bug documented: prev-token leaks at document boundaries; fix required - PR openai#1848 (newjordan, 0.87980) flagged BPB risk: sibling PR openai#1846 closed same day - PR openai#1858 (0.9946) only covers 8M/40.5M tokens — not leaderboard-comparable - PR openai#1855 (codemath3000, 1.06108) and openai#1851 (aquariouseworkman, 1.06128) both clean - PPM-D wave: PRs openai#1850, openai#1854, openai#1835 await organizer ruling - Added Session 23 lessons to CLAUDE.md - 3 days to deadline (Apr 30) — final GPU run window https://claude.ai/code/session_01RmJtLYUmKNzDgDVTnWoKzU
Contributor
Author
I ran out of credits on RunPod. This amazing person validated the other 2 for me! see: #1851 |
Meirzhan05
added a commit
to Meirzhan05/parameter-golf
that referenced
this pull request
Apr 28, 2026
Forward-1-token residual mixer at embedding lane:
x_t <- x_t + lambda * sigmoid(W * x_t[:12]) * x_{t-1}
The model gets a learnable bias toward bigram features without needing
attention to discover it. Tiny (13 params total: 12-wide linear + scalar lambda).
Zero-init lambda = transparent at start.
BOS-fix prevents cross-document leakage during packed training: gate is
masked to 0 at positions where input_ids == BOS_TOKEN_ID (default 1).
Both smear_gate.weight and smear_lambda match 'smear' pattern -> route to
scalar AdamW, not Muon. Both at GPT-level (not blocks), so explicitly
appended to scalar_params in Optimizers.
Fija
pushed a commit
to Fija/parameter-golf
that referenced
this pull request
Apr 28, 2026
- Adds 2-line BOS mask in both forward_logits and forward_ttt SmearGate paths. Before fix, the last token of doc N smeared into the BOS of doc N+1 — model-quality bug, not a C1 issue. Identical fix to PR openai#1851 @aquariouseworkman, audit by @cocohearts. - runpod/phase_g_3seed.sh: full 3-seed driver. Sets PR openai#1797 stack env vars + the PR openai#1855 9-hparam greedy stack delta: MLP_CLIP_SIGMAS=11.5 EMBED_CLIP_SIGMAS=14.0 WARMDOWN_FRAC=0.85 BETA2=0.99 TTT_BETA2=0.99 TTT_WEIGHT_DECAY=0.5 TTT_LORA_RANK=80 SPARSE_ATTN_GATE_SCALE=0.5 PHASED_TTT_PREFIX_DOCS=2500 Mixers (NGRAM/TEMP) stay OFF — pure neural baseline + bug fix + hparam stack. Auto-runs Welch t-test vs PR openai#1797 (1.06157±0.00066). - TTT 4-epoch (PR openai#1812) explicitly NOT adopted: that scheme targets the PR openai#1493 SGD-on-whole-model TTT path, not the PR openai#1797 LoRA-phased per-doc-reset path we're on. No clean mapping. Legality: all 16/16 unit tests still pass. BOS fix preserves causality (it only zeroes a gate at positions where current token is BOS, never references future tokens).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Record: SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT
val_bpb = 1.06128 | ~15.95 MB | 8xH100 SXM
Result
Merged SOTA (PR #1493): 1.0810 BPP. Delta: -0.0197 BPP. Clears the 0.005-nat threshold.
Key Change: SmearGate BOS Document Boundary Fix
Builds on PR #1797 stack (PR #1787 base + SmearGate + LQER Asymmetric) but fixes the SmearGate cross-document leakage bug identified by @cocohearts in PR #1797 audit.
The bug: SmearGate 1-token causal lookback does not mask BOS positions, so the final token of document N smears into BOS of document N+1.
The fix (applied in both forward_logits and forward_ttt):
Technique Stack
Architecture
11L x 512d x 8H/4KV, MLP 4x, LeakyReLU(0.5)^2, Partial RoPE (16/64 dims), layerwise LN scale, tied embeddings, logit softcap=30.0. Depth recurrence: layers 3-5 looped x2 (activated at frac=0.35). Parallel residuals from layer 8. XSA on all 11 layers. SmearGate window=12.
Compliance
Credits