Skip to content

Record: SP8192 + PE + SmearGate + AttnOutGate + 4ep TTT — val_bpb 1.0770 (3-seed mean)#1826

Open
EthanYangTW wants to merge 2 commits intoopenai:mainfrom
EthanYangTW:submission/v2-clean
Open

Record: SP8192 + PE + SmearGate + AttnOutGate + 4ep TTT — val_bpb 1.0770 (3-seed mean)#1826
EthanYangTW wants to merge 2 commits intoopenai:mainfrom
EthanYangTW:submission/v2-clean

Conversation

@EthanYangTW
Copy link
Copy Markdown

Summary

  • 3-seed mean val_bpb: 1.0770 (std 0.0004) on 8×H100 SXM
  • All 3 seeds: 1.0772, 1.0765, 1.0772
  • Artifact: ~15.98 MB

3-Seed Results

Seed Sliding BPB TTT BPB Artifact (bytes)
1337 1.0785 1.0772 15,982,989
42 1.0777 1.0765 15,984,317
2024 1.0784 1.0772 15,985,404
Mean 1.0782 1.0770 15,984,237

Innovation

  1. Polar Express NS coefficients (PR Record: SP4096 + Polar Express + MuonEq-R + Depth Recurrence — 1.0923 BPB (3-seed) #1344)
  2. MIN_LR=0.10 warmdown floor (PR Record: PR #1736 + Polar Express NS + MIN_LR + Sparse Attn Gate + Fused CE + PR #1767 TTT — val_bpb 1.06335 #1787)
  3. QK-Gain 5.25 (PR Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean) #1493)
  4. SmearGate — causal content-gated residual, zero-init (PR RECORD: SmearGate + Attention Output Gate + Legal TTT | val_bpb=1.07139 #1667)
  5. Attention Output Gate — per-head sigmoid gate, width=12, zero-init (PR RECORD: SmearGate + Attention Output Gate + Legal TTT | val_bpb=1.07139 #1667)
  6. 4 TTT epochs (was 3)

Compliance (Issue #1017, Track B)

  • Train < 600s (599.6s)
  • Eval < 600s (531s)
  • Artifact < 16MB (15.98MB)
  • Score before update (each chunk scored under no_grad before TTT)
  • No SLOT, no pre-quant TTT, no n-gram cache, no CaseOps, no global TTT

Attribution

@abaybektursun (PR #1420), @clarkkev (PR #1394), @dexhunter (PR #1331), @aryanbhosale (PR #1477), @resouer (PR #1460), @orangekame3 (PR #1344), @nprime06 (PR #1787), @MarioPaerle (PR #1667), @bigbag (PR #1493)

@EthanYangTW EthanYangTW marked this pull request as ready for review April 26, 2026 23:05
Copilot AI review requested due to automatic review settings April 26, 2026 23:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Track B record folder for the 10min/16MB benchmark, documenting a SP8192-based submission (PE + MIN_LR + SmearGate + AttnOutGate + 4-epoch TTT) and including the packed train_gpt.py launcher.

Changes:

  • Add a new record README with reported 3-seed results, architecture summary, compliance notes, and reproduction command.
  • Add a compressed-code train_gpt.py wrapper to execute the submission.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
records/track_10min_16mb/2026-04-26_V2_PE_MinLR_AttnGate/train_gpt.py Adds a packed launcher that decompresses and executes the full training/eval script.
records/track_10min_16mb/2026-04-26_V2_PE_MinLR_AttnGate/README.md Documents results, deltas, compliance, and reproduction steps for the new record.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +7 to +9
| Seed | Steps | Sliding BPB | **TTT BPB** | Artifact (bytes) |
|------|-------|-------------|-------------|-------------------|
| 1337 | 4631 | 1.0785 | **1.0772** | 15,982,989 |
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The markdown table is malformed: rows start with || which creates an extra empty column and renders inconsistently. Use a single leading | for each row (including the header separator) to match other record READMEs in this repo.

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +65
## Reproduction

```bash
pip install brotli sentencepiece
MATCHED_FINEWEB_REPO_ID=kevclark/parameter-golf python3 data/cached_challenge_fineweb.py --variant sp8192 --train-shards 80
SEED=1337 TTT_ENABLED=1 HASH_EMBED_ENABLED=1 TTT_LR=0.01 TTT_EPOCHS=4 TTT_OPTIMIZER=sgd MUON_MOMENTUM=0.97 GLOBAL_TTT_ENABLED=0 \
torchrun --standalone --nproc_per_node=8 train_gpt.py
```
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This records folder is missing required submission artifacts. The repo submission guidelines require (at minimum) a submission.json and train log(s) alongside README.md and train_gpt.py (see root README.md around the submission checklist). Please add submission.json and the run logs used to support the 3-seed claim, otherwise the submission can’t be verified/accepted.

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +45
EMA 0.997, warmdown 66.7%, loop at 35%
SDClip GPTQ int6 (k=12.85) + int8 embed (k=20) + brotli
Score-first TTT: SGD lr=0.01, mom=0.9, 4ep, 32K chunks
Hash embedding: 16384x512, zero-init, trained in TTT
~36M params, ~15.98MB artifact
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README/repro section enables and describes a TTT hash embedding (HASH_EMBED_ENABLED=1, and the architecture list mentions a 16384×512 hash embedding), but the PR title/description "Innovation" list doesn’t mention this component. Please align the PR description (and/or README) so reviewers can clearly understand whether hash embedding is part of the claimed improvement and compliance story.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants