Record: SP8192 + PE + SmearGate + AttnOutGate + 4ep TTT — val_bpb 1.0770 (3-seed mean) by EthanYangTW · Pull Request #1826 · openai/parameter-golf

EthanYangTW · 2026-04-26T08:16:13Z

Summary

3-seed mean val_bpb: 1.0770 (std 0.0004) on 8×H100 SXM
All 3 seeds: 1.0772, 1.0765, 1.0772
Artifact: ~15.98 MB

3-Seed Results

Seed	Sliding BPB	TTT BPB	Artifact (bytes)
1337	1.0785	1.0772	15,982,989
42	1.0777	1.0765	15,984,317
2024	1.0784	1.0772	15,985,404
Mean	1.0782	1.0770	15,984,237

Innovation

Polar Express NS coefficients (PR Record: SP4096 + Polar Express + MuonEq-R + Depth Recurrence — 1.0923 BPB (3-seed) #1344)
MIN_LR=0.10 warmdown floor (PR Record: PR #1736 + Polar Express NS + MIN_LR + Sparse Attn Gate + Fused CE + PR #1767 TTT — val_bpb 1.06335 #1787)
QK-Gain 5.25 (PR Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean) #1493)
SmearGate — causal content-gated residual, zero-init (PR RECORD: SmearGate + Attention Output Gate + Legal TTT | val_bpb=1.07139 #1667)
Attention Output Gate — per-head sigmoid gate, width=12, zero-init (PR RECORD: SmearGate + Attention Output Gate + Legal TTT | val_bpb=1.07139 #1667)
4 TTT epochs (was 3)

Compliance (Issue #1017, Track B)

Train < 600s (599.6s)
Eval < 600s (531s)
Artifact < 16MB (15.98MB)
Score before update (each chunk scored under no_grad before TTT)
No SLOT, no pre-quant TTT, no n-gram cache, no CaseOps, no global TTT

Attribution

@abaybektursun (PR #1420), @clarkkev (PR #1394), @dexhunter (PR #1331), @aryanbhosale (PR #1477), @resouer (PR #1460), @orangekame3 (PR #1344), @nprime06 (PR #1787), @MarioPaerle (PR #1667), @bigbag (PR #1493)

…770 (3-seed mean)

Copilot

Pull request overview

Adds a new Track B record folder for the 10min/16MB benchmark, documenting a SP8192-based submission (PE + MIN_LR + SmearGate + AttnOutGate + 4-epoch TTT) and including the packed train_gpt.py launcher.

Changes:

Add a new record README with reported 3-seed results, architecture summary, compliance notes, and reproduction command.
Add a compressed-code train_gpt.py wrapper to execute the submission.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
records/track_10min_16mb/2026-04-26_V2_PE_MinLR_AttnGate/train_gpt.py	Adds a packed launcher that decompresses and executes the full training/eval script.
records/track_10min_16mb/2026-04-26_V2_PE_MinLR_AttnGate/README.md	Documents results, deltas, compliance, and reproduction steps for the new record.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-26T23:08:26Z

+| Seed | Steps | Sliding BPB | **TTT BPB** | Artifact (bytes) |
+|------|-------|-------------|-------------|-------------------|
+| 1337 | 4631 | 1.0785 | **1.0772** | 15,982,989 |


The markdown table is malformed: rows start with || which creates an extra empty column and renders inconsistently. Use a single leading | for each row (including the header separator) to match other record READMEs in this repo.

Copilot · 2026-04-26T23:08:27Z

+## Reproduction
+
+```bash
+pip install brotli sentencepiece
+MATCHED_FINEWEB_REPO_ID=kevclark/parameter-golf python3 data/cached_challenge_fineweb.py --variant sp8192 --train-shards 80
+SEED=1337 TTT_ENABLED=1 HASH_EMBED_ENABLED=1 TTT_LR=0.01 TTT_EPOCHS=4 TTT_OPTIMIZER=sgd MUON_MOMENTUM=0.97 GLOBAL_TTT_ENABLED=0 \
+  torchrun --standalone --nproc_per_node=8 train_gpt.py
+```


This records folder is missing required submission artifacts. The repo submission guidelines require (at minimum) a submission.json and train log(s) alongside README.md and train_gpt.py (see root README.md around the submission checklist). Please add submission.json and the run logs used to support the 3-seed claim, otherwise the submission can’t be verified/accepted.

Copilot · 2026-04-26T23:08:27Z

+EMA 0.997, warmdown 66.7%, loop at 35%
+SDClip GPTQ int6 (k=12.85) + int8 embed (k=20) + brotli
+Score-first TTT: SGD lr=0.01, mom=0.9, 4ep, 32K chunks
+Hash embedding: 16384x512, zero-init, trained in TTT
+~36M params, ~15.98MB artifact


The README/repro section enables and describes a TTT hash embedding (HASH_EMBED_ENABLED=1, and the architecture list mentions a 16384×512 hash embedding), but the PR title/description "Innovation" list doesn’t mention this component. Please align the PR description (and/or README) so reviewers can clearly understand whether hash embedding is part of the claimed improvement and compliance story.

EthanYangTW added 2 commits April 26, 2026 16:15

Record: SP8192 + PE + SmearGate + AttnOutGate + 4ep TTT — val_bpb 1.0…

c0ec37a

…770 (3-seed mean)

Fix: default GLOBAL_TTT_ENABLED=0 to match reported results

8f3c65a

EthanYangTW marked this pull request as ready for review April 26, 2026 23:05

Copilot AI review requested due to automatic review settings April 26, 2026 23:05

Copilot started reviewing on behalf of EthanYangTW April 26, 2026 23:05 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SP8192 + PE + SmearGate + AttnOutGate + 4ep TTT — val_bpb 1.0770 (3-seed mean)#1826

Record: SP8192 + PE + SmearGate + AttnOutGate + 4ep TTT — val_bpb 1.0770 (3-seed mean)#1826
EthanYangTW wants to merge 2 commits intoopenai:mainfrom
EthanYangTW:submission/v2-clean

EthanYangTW commented Apr 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

EthanYangTW commented Apr 26, 2026

Summary

3-Seed Results

Innovation

Compliance (Issue #1017, Track B)

Attribution

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants