Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions records/track_10min_16mb/2026-04-26_V2_PE_MinLR_AttnGate/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Record: SP8192 + PE + MIN_LR + SmearGate + AttnOutGate + 4ep TTT — val_bpb 1.0770 (3-seed mean)

**val_bpb = 1.0770** (3-seed mean, std 0.0004) | **~15.98 MB** | 8xH100 SXM

## 3-Seed Results

| Seed | Steps | Sliding BPB | **TTT BPB** | Artifact (bytes) |
|------|-------|-------------|-------------|-------------------|
| 1337 | 4631 | 1.0785 | **1.0772** | 15,982,989 |
Comment on lines +7 to +9
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The markdown table is malformed: rows start with || which creates an extra empty column and renders inconsistently. Use a single leading | for each row (including the header separator) to match other record READMEs in this repo.

Copilot uses AI. Check for mistakes.
| 42 | 4637 | 1.0777 | **1.0765** | 15,984,317 |
| 2024 | 4633 | 1.0784 | **1.0772** | 15,985,404 |
| **Mean** | **4634** | **1.0782** | **1.0770** | **15,984,237** |
| **Std** | | 0.0004 | **0.0004** | |

Delta vs previous SOTA (1.0783): **-0.0013 BPB**

## Changes from previous SOTA (2026-04-12)

### Training improvements
- **Polar Express NS coefficients** — 5 per-iteration minimax-optimal tuples + row normalization (was: fixed 3.4445/-4.775/2.0315)
- **MIN_LR=0.10** warmdown floor (was: 0.0 — LR dropped to zero)
- **QK_GAIN_INIT=5.25** (was: 5.0)
- **GPTQ_RESERVE_SECONDS=0.5** (was: 12.0)
- **VAL_LOSS_EVERY=0** — skip periodic val during training

### Architecture additions
- **SmearGate** — causal content-gated residual, zero-init transparent
- **Attention Output Gate** — per-head sigmoid gate on attn output (width=12), zero-init

### TTT improvement
- **4 epochs** (was: 3) of score-first SGD TTT

## Architecture (unchanged from base)

```
SP8192 tokenizer, 11 physical / 17 virtual layers
512 dim, MLP 4x (2048 hidden), GQA 8Q/4KV, head_dim=64
Parallel residuals L7+, QK-Gain 5.25, XSA all 11 layers
LeakyReLU(0.5)², skip gates, logit softcap 30
MuonEq-R (lr=0.022, wd=0.095, momentum=0.97) + AdamW
EMA 0.997, warmdown 66.7%, loop at 35%
SDClip GPTQ int6 (k=12.85) + int8 embed (k=20) + brotli
Score-first TTT: SGD lr=0.01, mom=0.9, 4ep, 32K chunks
Hash embedding: 16384x512, zero-init, trained in TTT
~36M params, ~15.98MB artifact
Comment on lines +41 to +45
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README/repro section enables and describes a TTT hash embedding (HASH_EMBED_ENABLED=1, and the architecture list mentions a 16384×512 hash embedding), but the PR title/description "Innovation" list doesn’t mention this component. Please align the PR description (and/or README) so reviewers can clearly understand whether hash embedding is part of the claimed improvement and compliance story.

Copilot uses AI. Check for mistakes.
```

## Compliance (Track B — Score-First TTT)

Per Issue #1017:
- **Condition 1:** Hash key uses prefix tokens only
- **Condition 2:** Full normalized softmax distribution
- **Condition 3:** Each chunk scored under no_grad() before TTT update
- **Condition 4:** Single left-to-right pass, no rescoring

No SLOT, no pre-quant TTT, no n-gram caches, no CaseOps, no global TTT, no multi-phase.

## Reproduction

```bash
pip install brotli sentencepiece
MATCHED_FINEWEB_REPO_ID=kevclark/parameter-golf python3 data/cached_challenge_fineweb.py --variant sp8192 --train-shards 80
SEED=1337 TTT_ENABLED=1 HASH_EMBED_ENABLED=1 TTT_LR=0.01 TTT_EPOCHS=4 TTT_OPTIMIZER=sgd MUON_MOMENTUM=0.97 GLOBAL_TTT_ENABLED=0 \
torchrun --standalone --nproc_per_node=8 train_gpt.py
```
Comment on lines +58 to +65
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This records folder is missing required submission artifacts. The repo submission guidelines require (at minimum) a submission.json and train log(s) alongside README.md and train_gpt.py (see root README.md around the submission checklist). Please add submission.json and the run logs used to support the 3-seed claim, otherwise the submission can’t be verified/accepted.

Copilot uses AI. Check for mistakes.
Loading
Loading