Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
150 commits
Select commit Hold shift + click to select a range
c3135f4
Update README.md
0hq Mar 18, 2026
164ba0b
Remove scripts
0hq Mar 18, 2026
06b0c30
Update README typo
oof-baroomf Mar 18, 2026
d710db1
match timing to main script to exclude eval timing
berniwal Mar 18, 2026
9a4963e
Fix MLX validation loss accumulation
yhn112 Mar 18, 2026
713dd27
Log MLX validation progress
yhn112 Mar 18, 2026
a80e308
Merge pull request #18 from berniwal/main
0hq Mar 18, 2026
bd94775
Merge pull request #9 from oof-baroomf/patch-1
0hq Mar 18, 2026
9e39111
Merge pull request #32 from yhn112/fix-mlx-eval-memory-growth
0hq Mar 18, 2026
87f49e0
Update README.md
0hq Mar 18, 2026
e70cf85
Merge pull request #35 from openai/0hq-patch-1
0hq Mar 18, 2026
add8f64
Update train_gpt.py
0hq Mar 18, 2026
309775a
Update train_gpt_mlx.py
0hq Mar 18, 2026
e971157
Update README.md
0hq Mar 18, 2026
d028086
Record: Seq4096 + Sliding Window Eval, val_bpb=1.1808
aquariouseworkman Mar 19, 2026
77f0fba
Non-record: SwiGLU + warmdown fix + quarter batch (1x5090, 1.3281 bpb)
NishantDahal Mar 19, 2026
35601b5
Record: Mixed Quant (int6+int8) + Sliding Window, val_bpb=1.1630
aquariouseworkman Mar 19, 2026
7660d2b
Submission: 10L + lower LR + fp16 embed + int6 middle layers (val_bpb…
Mar 19, 2026
b9c36b9
Add MLX_EAGER_EVAL flag to further reduce memory pressure by force-ev…
sandsevenone Mar 19, 2026
94b39ce
Merge pull request #100 from sandsevenone/mlx_eager_eval
0hq Mar 19, 2026
db158f7
fp16 tied embedding + lr/warmdown tuning — val_bpb 1.2197 (#42)
chonchiog Mar 19, 2026
dd8d0c4
Update README.md (#105)
0hq Mar 19, 2026
e8c997f
clarify torch version
cocohearts Mar 19, 2026
c547b22
SOTA attempt (val_bpb=1.2064) (#49)
spokane-way Mar 19, 2026
a7830de
Update README.md
0hq Mar 19, 2026
608a57d
Add record: Sliding Window Eval (stride=64), val_bpb=1.1925 (#50)
mattqlf Mar 19, 2026
dc001d0
Update submission: MLP 3x + QAT + Int6 + Sliding Window (val_bpb 1.1652)
Mar 19, 2026
0cf4a17
Fix: score final partial window in sliding window eval (#124)
mattqlf Mar 19, 2026
6e8241d
Update README.md
0hq Mar 19, 2026
567525d
New SOTA attempt (#52)
spokane-way Mar 19, 2026
7eb2ae4
Update README.md
0hq Mar 19, 2026
318f080
Update README.md
0hq Mar 19, 2026
ca53ff9
Update README.md
0hq Mar 19, 2026
c870612
Record: Sliding Window + FP16 Embed + 10L + Muon WD + Overtone Init (…
notapplica Mar 19, 2026
102ac20
Update README.md
0hq Mar 19, 2026
0083466
Int6 + MLP 3x + sliding window: val_bpb=1.1574 (#61)
saml212 Mar 19, 2026
ed6b55d
Update README.md
0hq Mar 19, 2026
a7f71b2
Update README.md
0hq Mar 19, 2026
c88a30f
Update README.md
0hq Mar 19, 2026
cc11948
Update README.md
0hq Mar 19, 2026
34ea8bd
Add Seq2048 + FP16 Tied Embedding submission (mean val_bpb 1.2067)
Mar 19, 2026
b3c22d2
Update submission: 10L + int6 mid + sliding window (mean val_bpb 1.1793)
Mar 19, 2026
581da16
Update: full int6+zstd, MLP 1344, Muon 0.99 (mean val_bpb 1.1632)
Mar 19, 2026
eb87032
Update: STE int6 QAT, zero quant gap (mean val_bpb 1.1598)
Mar 19, 2026
960fe21
Update README.md
0hq Mar 19, 2026
824683f
Record: 10L Mixed Precision: val_bpb=1.2147 (10 layers + int6 middle …
nanlliu Mar 19, 2026
d8ae405
commit ttt record (#77)
samacqua Mar 19, 2026
cf918e7
Update README.md
0hq Mar 19, 2026
8601dbf
Update README.md
cocohearts Mar 19, 2026
a6a01a9
SmearGate + OrthoInit + Muon WD + Int6 STE QAT + MLP 3x + Sliding Window
aquariouseworkman Mar 20, 2026
4180773
Merge branch 'openai:main' into main
aquariouseworkman Mar 20, 2026
c36e4d4
Add submission: 2026-03-20_Int6_MLP3x_SmearGate_BigramHash_MuonWD_SWA…
Mar 20, 2026
34bff8b
Update submission: 2026-03-20_Int6_MLP3x_SmearGate_BigramHash_MuonWD_…
Mar 20, 2026
a87f998
Record: 10L Int5-MLP + MuonWD=0.04 + SWA/50 (val_bpb=1.1453)
thwu1 Mar 20, 2026
c2a71a7
Update records/track_10min_16mb/2026-03-20_10L_Int5MLP_MuonWD04_SWA50…
thwu1 Mar 20, 2026
ce61b89
Update: 11L MLP3x + WD=0.04 + zstd-22 (val_bpb 1.1502)
Mar 20, 2026
1b32e0f
Update: val_bpb=1.1428 (mean 3 seeds) — bigram=10240 + SWA(0.4) + WD=…
Mar 20, 2026
a3ac988
Add 3-seed training logs (seed=42, 1337, 2024)
Mar 20, 2026
3a79188
Merge pull request #63 from yahya010/submission/seq2048-fp16emb
cocohearts Mar 20, 2026
d539217
Merge pull request #65 from aquariouseworkman/main
cocohearts Mar 20, 2026
9a189cd
Merge pull request #73 from NishantDahal/swiglu-warmdown-1x5090
cocohearts Mar 20, 2026
f871ae8
Merge pull request #86 from aruniyer/submission/10L-lowlr-fp16embed-int6
cocohearts Mar 20, 2026
ea9a291
Merge pull request #162 from raahilshah/submission/2026-03-20_Int6_ML…
cocohearts Mar 20, 2026
f31ceed
Merge pull request #180 from thwu1/10L-int5mlp-wd04-swa50
cocohearts Mar 20, 2026
fb38271
Update README.md
valerio-oai Mar 20, 2026
2812091
Update README.md
valerio-oai Mar 20, 2026
844cc04
Create leaderboard-best-score-over-time.svg
valerio-oai Mar 20, 2026
f544936
Update README.md
valerio-oai Mar 20, 2026
06647a4
Delete assets directory
valerio-oai Mar 20, 2026
b9f2192
Record: 11L + Efficient Partial XSA (val_bpb: 1.1307) — NEW SOTA
unnir Mar 20, 2026
3071cb1
Merge pull request #255 from openai/valerio-oai-patch-1
cocohearts Mar 20, 2026
6232018
Restore train_gpt.py before d8ae405
yuzhougu-oai Mar 20, 2026
b1cb161
Merge pull request #269 from openai/revert-train-gpt-pre-d8ae405
cocohearts Mar 20, 2026
ece9ea0
Update README.md
cocohearts Mar 20, 2026
2df9b3b
Update README.md
0hq Mar 20, 2026
9ba6d97
Record: 11L XSA + EMA + Int6 MLP3x + WD=0.04 (val_bpb: 1.1271)
jfprincz Mar 20, 2026
758691c
Update README.md
0hq Mar 20, 2026
3d3e0ae
Fix grammar in README
sha-huang Mar 21, 2026
0299231
Merge pull request #350 from sha-huang/hs-patch-grammar
cocohearts Mar 21, 2026
9cae1f0
Record: 11L Partial RoPE + LN Scale + EMA + XSA4 (val_bpb: 1.1248)
jfprincz Mar 21, 2026
5161482
Record: 11L EMA + GPTQ-lite + warmdown3500 + [email protected] (val_bpb=1.1233…
signalrush Mar 22, 2026
45542fb
Record: LeakyReLU² + Legal TTT + Parallel Muon — val_bpb 1.1194 (3-se…
abaybektursun Mar 23, 2026
155a932
Fix pre-TTT BPB, TTT gains, and steps to match logs exactly
abaybektursun Mar 23, 2026
3fa732d
Fix author attributions: PR #493 @parinzee, PR #461 @Christopher-Lee-…
abaybektursun Mar 23, 2026
97b1db3
Merge pull request #265 from unnir/submission/v22-XSA3-beats-top1
cocohearts Mar 23, 2026
b614372
Merge pull request #287 from jfprincz/submission/11l-xsa4-ema-int6-ml…
cocohearts Mar 23, 2026
04d3709
Merge pull request #315 from jfprincz/submission/11l-partialrope-late…
cocohearts Mar 23, 2026
02fbb8d
Merge pull request #414 from signalrush/submission/ema-gptqlite-1.1233
cocohearts Mar 23, 2026
724b6d6
Update README leaderboard with merged records
cocohearts Mar 23, 2026
6b0b368
Use GitHub usernames in new leaderboard rows
cocohearts Mar 23, 2026
ee4e30e
Describe leaderboard entries by base-run diff
cocohearts Mar 23, 2026
28bd944
Merge pull request #561 from openai/codex/update-readme-leaderboard-m…
cocohearts Mar 23, 2026
461a327
Merge pull request #549 from abaybektursun/submission/leaky-relu-lega…
valerio-oai Mar 24, 2026
e94192d
Update README.md
valerio-oai Mar 24, 2026
c250885
Merge pull request #616 from openai/valerio-oai-patch-1
valerio-oai Mar 24, 2026
918eddd
Update README.md
valerio-oai Mar 24, 2026
dc6d490
Notable Non-Record Submission: 1.1239 BPB - 106.2M Binary Asymmetric …
CiprianFlorin-Ifrim Mar 25, 2026
7115075
Update README.md
0hq Mar 25, 2026
c341a12
Record Submission: 1.1570 BPB - 73.7M Ternary U-Net (10L 768d 8192BPE…
CiprianFlorin-Ifrim Mar 25, 2026
8069fa6
Update README.md
0hq Mar 25, 2026
830e27b
Update README.md
0hq Mar 25, 2026
87cf706
Update README.md
0hq Mar 25, 2026
cb86497
Update README.md
0hq Mar 25, 2026
a37c13c
Update README.md
0hq Mar 25, 2026
6685118
Non-record: Depth Recurrence in Parameter-Constrained Transformers — …
evangelinehelsinki Mar 25, 2026
6d0eeba
Record: AR Self-Gen GPTQ + XSA-all + BigramHash 3072×112 — val_bpb 1.…
abaybektursun Mar 28, 2026
89ac5c0
Merge pull request #1019 from abaybektursun/record/ar-selfgen-gptq-xs…
valerio-oai Mar 30, 2026
c82b159
Update README.md
valerio-oai Mar 30, 2026
c9ab94e
Record: Split-LR + BigramHash(2816x160) + Full GPTQ + Brotli — val_bp…
dexhunter Mar 31, 2026
7561f3a
fix: clarify PR #1019 attribution (not our merged PR)
dexhunter Mar 31, 2026
e480837
Copy PR 1179 record trainer to repo root
msisovic Mar 31, 2026
2d72c84
Decompress PR 1179 root trainer wrapper
msisovic Mar 31, 2026
1b23b7a
Port mixed quant to PR 1179 root trainer
msisovic Mar 31, 2026
7c7d909
Port depth recurrence to PR 1179 root trainer
msisovic Mar 31, 2026
33a0ca6
Log full untie depth recurrence result
msisovic Mar 31, 2026
9ebd740
Prioritize shared params in GPTQ
msisovic Mar 31, 2026
287fff1
Somewhat working
msisovic Mar 31, 2026
1a61316
Log partitioned residual result
msisovic Mar 31, 2026
e7b13cf
Train wallclock log
msisovic Mar 31, 2026
4f44fc8
logging fix
msisovic Mar 31, 2026
393062d
First run in
msisovic Mar 31, 2026
6d5e9a2
Parallel Residuals readme entry
msisovic Mar 31, 2026
4a87526
Update submission README and add seed logs
msisovic Apr 1, 2026
63d1db0
Update submission reproducibility notes
msisovic Apr 1, 2026
d3fc095
Add submission metadata for ParallelResiduals run
msisovic Apr 1, 2026
b67f9e8
Clean root for submission branch
msisovic Apr 1, 2026
76b1127
Restore root files for submission
msisovic Apr 1, 2026
6074fc8
Record: 4096-Vocab + Larger Model + High WD + Simplifications — val_b…
clarkkev Apr 1, 2026
b370566
Add train_gpt.py to submission
msisovic Apr 2, 2026
5e6ced5
Record: MuonEq-R + Depth Recurrence + WD=0.090 + All-Int6 — val_bpb 1…
dexhunter Apr 3, 2026
0a96614
Record: SP4096 + Depth Recurrence + Parallel Residuals + MuonEq-R — v…
aryanbhosale Apr 4, 2026
813de88
Record: SP8192 + GPTQ Embeddings + SDClip + Loop45x2 — val_bpb 1.0856…
clarkkev Apr 5, 2026
ff0e071
Non-record: Parallel Residuals + Hessian-Aware SDClip (3-seed mean 1.…
Robby955 Apr 6, 2026
4ec6ed5
Fix LaTeX rendering
Robby955 Apr 6, 2026
8ddc5bc
Record: SP8192 + QK-Gain 5 + Legal Score-First TTT — val_bpb 1.08279 …
dexhunter Apr 6, 2026
a4deb2b
Record: SP8192 + Parallel Residuals + Score-First TTT — val_bpb 1.082…
aryanbhosale Apr 8, 2026
d33706d
Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.…
Apr 9, 2026
16e50cb
Merge pull request #1179 from dexhunter/submission/splitlr-dim160-gpt…
cocohearts Apr 9, 2026
5db4cc0
Merge pull request #1218 from clarkkev/submission/vocab4096-mlpmult4-…
cocohearts Apr 9, 2026
9f7f551
Merge pull request #1204 from msisovic/hyperconnections_submission
cocohearts Apr 9, 2026
433e93c
Merge pull request #1285 from dexhunter/muoneqr-recurrence-wd090-allint6
cocohearts Apr 9, 2026
8b6ada0
Merge pull request #1334 from aryanbhosale/submission/sp4096-no-slot-v4
cocohearts Apr 9, 2026
890c6d7
Merge pull request #1394 from clarkkev/submission/sp8192-gptq-emb-sdc…
cocohearts Apr 9, 2026
2381927
Merge pull request #1413 from dexhunter/record/sp8192-qk5-legal-ttt-1…
cocohearts Apr 9, 2026
96da7a8
Merge pull request #1477 from aryanbhosale/submission/sp8192-parallel…
cocohearts Apr 9, 2026
69593a6
Merge pull request #1493 from bigbag/submission/sp8192-ttt-clean
cocohearts Apr 9, 2026
0ef8564
Merge pull request #1412 from Robby955/submission/parallel-residuals-…
cocohearts Apr 9, 2026
1ef2a91
Update README leaderboard for April records
cocohearts Apr 9, 2026
52cc58a
Merge pull request #1511 from openai/codex/update-april-leaderboard-r…
cocohearts Apr 9, 2026
1504011
Non-record: Nemotron-H Mamba-3 Hybrid + Hinge Point Depth Recurrence …
inin-zou Apr 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 70 additions & 12 deletions README.md

Large diffs are not rendered by default.

59 changes: 59 additions & 0 deletions records/track_10min_16mb/2026-03-17_LoRA_TTT/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
This record captures `LoRA TTT`: the naive baseline model with document-aware LoRA test-time training at evaluation.

## Method

**Training** is identical to the naive baseline.

**Evaluation** adds per-document LoRA test-time training (TTT). For each document in the validation set:
1. Find document boundaries using BOS tokens
2. Split the document into overlapping chunks (chunk_size=256 within eval_seq_len=1024 context windows)
3. For each chunk, score it (accumulate loss/bytes for BPB), *then* train rank-8 LoRA adapters on that chunk's loss (so you only train on the context -- no leakage)
4. Reset LoRA parameters between documents (no leakake across documents)

Documents are batched (batch_size=64) and sorted by length for efficiency. The LoRA adapters target `lm_head`, `c_q`, and `c_v` projections in all transformer blocks. A single Adam optimizer with `lr=0.01, betas=(0.9, 0.95)` trains all LoRA parameters with one gradient step per chunk.

## Notes

This is very similar to [a record I submmited to the modded nano-gpt speedrun repo](https://samacquaviva.com/projects/nanogpt/).
The major addition is to make the test-time training ~5x faster by using LoRAs: this let's you have per-sequence adaptation (no leaking between validation sequences) while still batching.

This is not a heavily optimized run: I just wanted to plant the TTT seed.
It uses ~1/10th of the evaluation budget.

## Ablations

The majority of this improvement doesn't come from the TTT itself, but from
1). Only conditioning on the current document
2). Doing strided evaluations

| Condition | val_loss | val_bpb | Delta bpb |
| --------- | -------- | ------- | --------- |
| Baseline (cross-doc, flat stream) | 2.0731 | 1.2278 | — |
| + Doc-isolated | 2.0561 | 1.2168 | -0.0110 |
| + Stride (chunk=256) | 2.0177 | 1.1941 | -0.0337 |
| + LoRA TTT | 2.0126 | 1.1910 | -0.0368 |

![ablations](ablations.png)

## Results

Validated on the full 50k-document fineweb_val split. Submitting at `bpb=1.195`.

```bash
bpb: [1.1927, 1.1935, 1.1921, 1.1929]
mean: 1.1928
std: 0.0005
p-value < 1.195: 0.00234486
```

## Command

```bash
torchrun --standalone --nproc_per_node=8 train_gpt.py
```

## Included files

- `train_gpt.py`
- `train_v*.txt` (note that `train_v0.txt` is on 2xH100)
- `submission.json`
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions records/track_10min_16mb/2026-03-17_LoRA_TTT/submission.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"author": "sam",
"github_id": "samacqua",
"name": "LoRA TTT",
"blurb": "Naive baseline + per-document LoRA test-time training at eval. Rank-8 LoRA on lm_head/Q/V with Adam lr=0.01, overlapping 256-token chunks in 1024-token context windows. Same training, smarter eval.",
"date": "2026-03-19T10:00:00Z",
"val_loss": 2.0142,
"val_bpb": 1.1929,
"bytes_total": 15882446,
"bytes_code": 58509
}
Loading