Record: SP8192 + PE + SmearGate + AttnOutGate + 4ep TTT — val_bpb 1.0770 (3-seed mean)#1826
Record: SP8192 + PE + SmearGate + AttnOutGate + 4ep TTT — val_bpb 1.0770 (3-seed mean)#1826EthanYangTW wants to merge 2 commits intoopenai:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new Track B record folder for the 10min/16MB benchmark, documenting a SP8192-based submission (PE + MIN_LR + SmearGate + AttnOutGate + 4-epoch TTT) and including the packed train_gpt.py launcher.
Changes:
- Add a new record README with reported 3-seed results, architecture summary, compliance notes, and reproduction command.
- Add a compressed-code
train_gpt.pywrapper to execute the submission.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| records/track_10min_16mb/2026-04-26_V2_PE_MinLR_AttnGate/train_gpt.py | Adds a packed launcher that decompresses and executes the full training/eval script. |
| records/track_10min_16mb/2026-04-26_V2_PE_MinLR_AttnGate/README.md | Documents results, deltas, compliance, and reproduction steps for the new record. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | Seed | Steps | Sliding BPB | **TTT BPB** | Artifact (bytes) | | ||
| |------|-------|-------------|-------------|-------------------| | ||
| | 1337 | 4631 | 1.0785 | **1.0772** | 15,982,989 | |
There was a problem hiding this comment.
The markdown table is malformed: rows start with || which creates an extra empty column and renders inconsistently. Use a single leading | for each row (including the header separator) to match other record READMEs in this repo.
| ## Reproduction | ||
|
|
||
| ```bash | ||
| pip install brotli sentencepiece | ||
| MATCHED_FINEWEB_REPO_ID=kevclark/parameter-golf python3 data/cached_challenge_fineweb.py --variant sp8192 --train-shards 80 | ||
| SEED=1337 TTT_ENABLED=1 HASH_EMBED_ENABLED=1 TTT_LR=0.01 TTT_EPOCHS=4 TTT_OPTIMIZER=sgd MUON_MOMENTUM=0.97 GLOBAL_TTT_ENABLED=0 \ | ||
| torchrun --standalone --nproc_per_node=8 train_gpt.py | ||
| ``` |
There was a problem hiding this comment.
This records folder is missing required submission artifacts. The repo submission guidelines require (at minimum) a submission.json and train log(s) alongside README.md and train_gpt.py (see root README.md around the submission checklist). Please add submission.json and the run logs used to support the 3-seed claim, otherwise the submission can’t be verified/accepted.
| EMA 0.997, warmdown 66.7%, loop at 35% | ||
| SDClip GPTQ int6 (k=12.85) + int8 embed (k=20) + brotli | ||
| Score-first TTT: SGD lr=0.01, mom=0.9, 4ep, 32K chunks | ||
| Hash embedding: 16384x512, zero-init, trained in TTT | ||
| ~36M params, ~15.98MB artifact |
There was a problem hiding this comment.
The README/repro section enables and describes a TTT hash embedding (HASH_EMBED_ENABLED=1, and the architecture list mentions a 16384×512 hash embedding), but the PR title/description "Innovation" list doesn’t mention this component. Please align the PR description (and/or README) so reviewers can clearly understand whether hash embedding is part of the claimed improvement and compliance story.
Summary
3-Seed Results
Innovation
Compliance (Issue #1017, Track B)
Attribution
@abaybektursun (PR #1420), @clarkkev (PR #1394), @dexhunter (PR #1331), @aryanbhosale (PR #1477), @resouer (PR #1460), @orangekame3 (PR #1344), @nprime06 (PR #1787), @MarioPaerle (PR #1667), @bigbag (PR #1493)