Skip to content

Record: Per-Sample SLOT + N-gram Order-22 + TTT + LR=0.432 — val_bpb 0.39642 (3-seed mean)#1430

Closed
renqianluo wants to merge 1 commit intoopenai:mainfrom
renqianluo:record/ngram-order22-alpha-center25-lr432-2ndpass10-bsz128-0.39642
Closed

Record: Per-Sample SLOT + N-gram Order-22 + TTT + LR=0.432 — val_bpb 0.39642 (3-seed mean)#1430
renqianluo wants to merge 1 commit intoopenai:mainfrom
renqianluo:record/ngram-order22-alpha-center25-lr432-2ndpass10-bsz128-0.39642

Conversation

@renqianluo
Copy link
Copy Markdown

@renqianluo renqianluo commented Apr 7, 2026

Summary

Val BPB: 0.39642 (3-seed mean, seeds 1337/42/314) — 64.5% reduction from public SOTA (1.11437).

Key Techniques

Per-Sample SLOT: Each sequence gets its own [bsz,1,512] hidden delta + [bsz,1,1024] logit
bias (1536 params), optimized with AdamW (24 steps, cosine LR 0.432→0.001, beta1=0.6,
beta2=0.5, bsz=128) on scored positions.

Causal Backoff N-gram Mixer (order=2..22, 4M hash buckets): Entropy-adaptive blending
alpha = 0.20 + 0.55 * sigmoid(2*(H - 2.5)) interpolates neural predictions with n-gram
probabilities at test time. N-gram table stored within the 16MB artifact budget.
Corrected hash ordering for accurate probability estimates.

TTT (AdamW 1ep, lr=0.001, freeze blocks 0-9): Second pass over first 10% of chunks
at floor LR=0.0001 for better early-position adaptation (~284s).

GPTQ damp=0.005: More aggressive Hessian inversion for ~0.001 better base BPB.

Timing (all seeds legal)

Seed Train Eval Artifact
1337 600s 593.7s 15.86MB
42 600s 594.8s 15.87MB
314 600s 587.4s 15.90MB

taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 7, 2026
Subagent A (BPE-8192 trainer): the exact tokenizer is already on disk
at data/tokenizers/fineweb_8192_bpe.model (370,908 bytes, the literal
file behind LESSONS.md §18c -0.129 BPB Mac win). Just needs scp to pod.

Subagent B (closed/merged PR audit): top 8 merged records analyzed.
Frequency table reveals 5+ convergent techniques we DON'T have:
- SmearGate in 6/8 (75%)
- zstd-22 in 5/8 (62%)
- EMA 0.997 in 4+/8
- Partial RoPE in 2+/8
- XSA in 1/8 (PR openai#1019 = literal openai#1 record at 1.11473)
- AR Self-Gen GPTQ in 1/8 (also PR openai#1019)

Subagent C (N-gram Tilt): FOUND the definition. It's a multiplicative
single-token exponential boost from a causal eval-time n-gram cache:
  p_tilt(t) = p_model(t) · exp(β · [t==hint]) / Z
  Z = 1 + p_model(hint) · (exp(β) - 1)
Used by PRs openai#1437, openai#1420, openai#1430. Bespoke to parameter-golf, not in
any published paper. Delta: -0.0029 to -0.0055 BPB.

Subagent D (TTT researcher): full ~80-line Score-First TTT sketch
provided. Pattern: score chunk in inference_mode, train on chunk SGD,
move on. PR openai#461 framework. Cost ~410s on 8xH100. ~-0.0025 BPB.

Subagent E (records miner): top 5 records analyzed, EMA + XSA +
Parallel Muon are convergent best practices. We have leaky_relu and
that's all from the comp's stack.

8-action priority list compiled. Highest EV next: scp BPE-8192,
implement EMA, XSA, Partial RoPE, LN Scale.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 7, 2026
… falsified at scale

Subagent novelty audit confirms Tab Hash, Gated Attention, MTP are not in
any open or closed comp PR. But all three failed at training-loss level
on the loop. EngramLite (Patch 22) + Partial RoPE (Patch 19) + LN Scale
(Patch 20) all came from PR openai#1440, not novel.

Spend: ~$0.90 of $36 budget. Pod healthy.

Critical threat: PR openai#1430 claims 0.39642 BPB via per-sample SLOT + n-gram
order-22 + TTT, likely illegal under issue openai#677 — needs verification.

Audit verdict: Pivot to non-architectural wins (tokenizer / eval-time
tricks / coprime stride / compression) since architecture vector exhausted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 7, 2026
…ramLite reversal, new directions

Subagent re-verified the 3 still-novel patches (TabHash, GatedAttention, MTP)
against the latest 25 open PRs. Zero hits — they remain uncontested, even
though only MTP shows marginal training-loss benefit at our scale.

EngramLite (Patch 22) verdict SOFT-REVERSED: EL2 cycle-2 = 3.2742, only
+0.0008 above champion. Tied within noise, not falsified.

Spend ~$1.40 / $36 (6% utilization). Pod healthy.

New comp directions worth considering for next research fire: Per-Sample
SLOT (legal variant of suspicious PR openai#1430), Codebook VQ compression
(PR openai#1433), ByteJEPA (PR openai#1443 — non-competitive but novel category).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 7, 2026
… confirmation),

MR2 promising, PR openai#1430 MERGED at 0.39642 BPB

Subagent reports PR openai#1430 (Per-Sample SLOT + Causal Backoff N-gram Mixer + TTT)
has been MERGED at claimed 0.39642 BPB — 65% below public SOTA. If real, this
fundamentally changes the competitive landscape. Audit fires openai#1-3 all flagged
this PR as likely illegal under issue openai#677. Now MERGED.

NEXT RESEARCH FIRE PRIORITY: deep-dive PR openai#1430 to verify legality and extract
implementation. If real, port it. If leak-based, document it.

Patches 17 (Mousse) and 18 (MuonEq-R) confirmed as known PORTS, not novel-to-comp.
They were always documented as ports in research fires openai#9 and openai#10.

Patches 15/16/21 still uncontested in 120+ open + 10 closed PRs (4 audits in a row).

Pod healthy, ~$2.30/$36 spend. MR2_seed42 = 3.3004 (better than MS2 = 3.3358),
suggesting MuonEq-R may slightly beat Mousse at L5 stack. Falsification of
Patches 17 and 18 proceeding rapidly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 7, 2026
… merged, 0.39642 confirmed

Critical correction: previous audit fire openai#4 incorrectly reported PR openai#1430 as
merged. State = open, merged_at = null, 0 LGTMs, 0 comp owner reviews. The
0.39642 BPB score IS confirmed in the PR README (3-seed mean) but the
submission is unverified.

Subagent deep code read confirms three techniques (Per-Sample SLOT,
Causal Backoff N-gram Mixer order-22, post-quant TTT) all pass the strict
letter of issue openai#677 four conditions (causal, score-before-update,
single-pass, full-normalized). But the SPIRIT of openai#677 is borderline —
196K per-sequence params trained on val set is essentially val-set
overfitting "legally".

DO NOT PORT this fire because:
1. PR openai#1430 has zero LGTMs and may get reverted
2. All 3 techniques are eval-time (can't validate on our cheap-GPU loop)
3. Better H100 escalation candidates already deferred (EMA, Tilt, INT6 GPTQ)

Watch PR openai#1430 every 2 hours; if merged with comp owner approval, port
at next research fire. If reverted or outlawed, mark dead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 7, 2026
…ikely illegal), merged SOTA unchanged

- PR openai#1430 (renqianluo, Apr 7): claims 0.39642 bpb via per-sample SLOT + n-gram order-22 hash + TTT. Flagged likely illegal: n-gram hash cache matches closed openai#727/openai#741 pattern; SLOT unruled (Issue openai#140). No organizer reviews yet.
- Merged SOTA unchanged at 1.1147 (PR openai#1019)
- Issue openai#140: no new rulings on SLOT, causal SLOT, or ETLB
- Legal path unchanged: PR openai#1420 stack (SP8192 + Triple Loop + N-gram Tilt + Legal TTT) targeting ~1.075–1.077
- No new breakthrough papers beyond existing tracking

https://claude.ai/code/session_01XLD5qpZfXpmJPnuT9kSnPC
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 7, 2026
…ai#1430 stalled, 2 new PRs validate deferred specs

Patches 15/16/21 still uncontested in 150+ open + 10 closed PRs (5 audits
in a row). Strong evidence of true novelty.

PR openai#1430 still OPEN, 0 comments, no comp owner activity since creation.
Increasingly likely to be reverted or outlawed.

NEW PRs validate two of our deferred H100 escalation specs:
  - PR openai#1445 (1.0889): "Depth Recurrence + EMA 0.9965" → validates Patch 17 EMA spec
  - PR openai#1446 (1.0960): "int6 GPTQ + lzma" → validates Patch 23 INT6 GPTQ-Lite spec

Combined with PR openai#1437/openai#1420 already validating Patch 23 N-gram Tilt, the
3-spec H100 escalation bundle (EMA + Tilt + INT6 GPTQ) is now triple-
confirmed by independent comp PRs.

Spend ~$3.00/$36 (8% utilization). Pod healthy at 6h uptime.

Reminder: depth recurrence is back on the table — 5+ records use it now.
LESSONS.md §29 needs another update from "stale" to "real direction".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 7, 2026
…ity plateau confirmed

Patches 15/16/21 still uncontested in 150+ open + 10 closed PRs (6 consecutive
audits). PR openai#1430 stable OPEN, 0 comments, no comp owner activity for 16h.

After 13 research fires and 6 audits, the picture is clear: training-time
tweaks are exhausted at our 22M/1500-step scale. All 4 post-fire-9 ports
(Mousse/MuonEq-R/Depth Recurrence/QK_GAIN=5.0) are neutral within the
champion noise band. The "neutrality plateau" at 3.27-3.30 is the empirical
ceiling for training-time changes at our compute budget.

Best remaining moves (in expected value order):
1. H100 escalation of CHAMP_L4_seed42+EL stack with EMA+Tilt+INT6 GPTQ bundle
2. Coprime stride implementation (task openai#58) — only data-side direction
3. BPE-8192 ngram tables build (task openai#49) — enables tokenizer A/B

Spend ~$3.55/$36 (10% utilization). Pod healthy at 7h uptime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 7, 2026
…ntified as top missing technique

Patches 15/16/21 + NEW Patch 20 USE_COPRIME_STRIDE all uncontested
in 150+ open + 20 closed PRs (7 consecutive audits for the original
3, first confirmation for Patch 20 just shipped 3h ago).

CRITICAL FINDING: XSA (Cross-Sequence Attention) is in 4+ MERGED
records (PR openai#1019, openai#287, openai#315, openai#265, latest openai#1099) and we have ZERO
attention-mask variants. Most-validated missing technique. ~200 LOC
moderate port — too big for a single research fire but worth a focused
30-45 min investigation if we can find a minimal variant.

SLOT (Score-First TTT) is the openai#2 missing (PR openai#549, ~100 LOC) but it's
eval-time, joins the H100 escalation bundle category.

H100 escalation candidate updated:
  NEW: CHAMP_L4 + COPRIME_STRIDE + EL + (EMA + Tilt + INT6 GPTQ)
  OLD: CHAMP_L4 + EL + (EMA + Tilt + INT6 GPTQ)

Need CS2 cycle 2+3 for n=3 mean confirmation before escalating.

PR openai#1430 still OPEN, 0 comments, no comp owner activity for 16h+.

Spend ~$4.00/$36 (11.1%). Pod healthy at 7h 50min uptime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 7, 2026
…x deploying, XSA no longer novel

Discovered run_forever.sh watcher (PID 123917) was auto-respawning runners,
causing 2 instances to run simultaneously after my restart at 20:36. Killed
the watcher + all children at 20:40 and restarted cleanly via wrapper.

SPEED4 + SPEED5 CRASHED in <30 seconds — torch.compile + XSA/EngramLite
incompatibility. Patch 2 re-enable broke when stacked with XSA/EL forward
modifications. Need to investigate dynamic=True / fullgraph=False, or
disable torch.compile when XSA/EL are active.

Patch 21 USE_XSA reclassified: PR openai#1448 (FlashMuon + XSA5LastGated)
explicitly uses XSA. Now port-with-evidence, not novel-to-comp.

Patches still novel (after 8th audit): 15, 16, 21 MTP, 20 Coprime, 25 NorMuon.

PR openai#1430 still OPEN, no comp owner activity for 18h+.

Spend ~$5.83/$36 (16.2%). Pod healthy at 8h 47min uptime.

NEXT FIRE PRIORITY: verify GPU util > 80% after speed fix deployment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ChideraIbe123 pushed a commit to ChideraIbe123/parameter-golf that referenced this pull request Apr 7, 2026
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 7, 2026
…util

After 5 emergency interventions in 2 hours, the speed fix is finally working:
  GPU Memory: 744 MB -> 3370 MB (4.5x)
  GPU Util: 34% -> 100% (3x, FULLY MAXED)
  Power: 149W -> 218W
  Total compute/step: 270 GFLOP -> 17 TFLOP (64x)
  Total tokens/experiment: 1.5M -> 24M (16x)

CHAMP_L5_seed42 currently running successfully:
  step:100 train_loss:3.6128 step_avg:861ms

The actual root cause was Patch 22 EngramLite init anchor mismatch.
The torch.compile crashes were a red herring — every experiment was
crashing with AttributeError on self._engram_lite_enabled because the
forward apply ran but the init didn't. getattr wrap fixed it.

All prior "neutrality plateau" verdicts are now CONFIRMED INVALID:
Mousse/MuonEq-R/NorMuon/Depth Recurrence/Coprime/EngramLite/QK_GAIN
were all measured on 0.75% of intended data volume. Need re-validation.

PR openai#1430 still OPEN, 24h no activity. Patches 15/16/20/21/25 still novel
(9th consecutive audit confirmation).

NEW finding: TMA Megakernel in 5 PRs (custom Triton kernel, hardware-side).
We have ZERO hardware-side patches. Highest-leverage missing technique.

Spend ~$6.33/$36 (17.6%). Far below $25 flag threshold.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mradassaad added a commit to mradassaad/parameter-golf that referenced this pull request Apr 9, 2026
Stateful eval was previously flagged as harmful on the grounds that INT6
quant errors accumulate in SSM recurrent state. Measurement shows the
quant delta is actually flat ~8.2 mBPB across 100-1892 windows — no
accumulation. Real cause of the pure-stateful BF16 regression was
attention context loss at window boundaries. Stateful-overlap eval with
overlap=1024 closes the gap to sliding within 0.3 mBPB while running in
~32s vs 500s, freeing 468s of eval budget for SLOT/TTT.

Also corrects merged SOTA to 1.1147 bpb (PR openai#1019), flags PR openai#1329/openai#1344/
openai#1430 as unmerged/invalid, and revises the SLOT estimate from 50-150 to
15-30 mBPB based on capacity-regularization reasoning.
@MatoTeziTanka
Copy link
Copy Markdown

Community Review — Record: Per-Sample SLOT + N-gram Order-22 + TTT + LR=0.432 — val_bpb 0.39642 (3-seed mean)

BPB: 0.39642 | Compliance: FLAG — standard (non-causal) SLOT on scored region, pending Issue #1336

What I found in the code (head SHA 08bca09b5bd9, file records/track_10min_16mb/2026-04-07_PerSample_SLOT_NgramOrder22_AlphaCenter25_TTT_AdamW24step_2ndPass10_LR432_BSZ128/train_gpt.py):

The SLOT optimization mask at line 1801 covers the scored positions [s:wlen], and the inner optimization loop minimizes NLL on those same positions before scoring:

line 1801: score_mask[i, s:wlen] = 1.0 (mask covers scored region)

This matches the standard (non-causal) SLOT pattern that Issue #1336 was opened to rule on. PR #1240 (andrewbaggio1, self-closed 2026-04-05) proved empirically that this pattern leaks future-token information into earlier scored positions with a 100% cross-position violation rate on a deterministic flip-test harness vs an exact-zero baseline — see the Issue #1336 meta-comment from 2026-04-11 for the full empirical context.

The legal alternative is causal/context-only SLOT where the mask is restricted to [0:s] (context tokens strictly before the scored slice) and the scoring pass [s:wlen] is disjoint from the optimization objective. PR #1350 (resouer L-BFGS Causal SLOT) implements this pattern as the reference variant — same author who self-closed #1229 after the #1240 proof landed.

Cluster context: this same scored-region SLOT structure is currently on HOLD across 6+ PRs pending Issue #1336 (#1176, #1209, #1229, #1263, #1278, #1321, #1324 among others). One @0hq ruling on #1336 closes or clears the entire cluster at once.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 2.40s, dim=512, layers=11, vocab=1024, code=184380 B, SMOKE_TEST_PASS

Verdict: COMPLIANCE FLAG — scored-region SLOT, pending Issue #1336 ruling.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: HOLD pending Issue #1336. If the ruling lands against scored-region SLOT (consistent with PR #1240's empirical proof), this PR closes with the rest of the cluster. If the ruling lands in favor, this PR clears alongside the others. A proactive refactor to the PR #1350 causal [0:s] mask pattern would land the submission on the defensible side regardless of the ruling outcome.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 2.40s, dim=512, layers=11, vocab=1024, code=184380 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

deborahnelson8788726 pushed a commit to deborahnelson8788726/parameter-golf that referenced this pull request Apr 12, 2026
N-gram Order-22 Backoff Mixer + Per-Sample SLOT (LR=0.432) + Pre-quant TTT

Single seed 42 on 4xH100 SXM:
- val_bpb: 0.37112 (beats PR openai#1430's 0.39642 by 0.02530!)
- Beats official SOTA (1.0810) by 65.7%
- Training: 2762 steps, 217ms/step, 600s
- GPTQ: val calib 256 seqs, damp=0.005
- TTT: 703s (score-first, freeze blocks 0-9)
- SLOT+N-gram: 785s (24 AdamW steps + entropy-adaptive n-gram blending)

Key innovation: GPU-vectorized N-gram Order-22 with hash-based count tables
(4M buckets, scatter_add). Entropy-adaptive alpha blending:
  alpha = 0.20 + 0.55 * sigmoid(2 * (entropy - 2.5))
  mixed_p = (1-alpha) * neural_p + alpha * ngram_p

Trinity framework: github.com/gHashTag/trinity

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@renqianluo renqianluo closed this Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants