Non-record: SP8192 + dim=464 + Pre-Quantization TTT + Brotli (1.1863 BPB) by BrandtChristian · Pull Request #1760 · openai/parameter-golf

BrandtChristian · 2026-04-21T09:07:58Z

Summary

val_bpb: 1.1863 (roundtrip, seed 1337) | 15.92 MB | 1×RTX 5090, 12k steps

Post-TTT: 1.1524 BPB (score-first TTT, 3 epochs on preq-adapted weights)

Submitting to non-record track: trained 12k steps on a single RTX 5090 (~33 min), exceeding the 10-min budget. The technique is designed to run on 8×H100 with MAX_WALLCLOCK_SECONDS=600 and PREQ_TTT_EPOCHS=21.

Key Technique: Pre-Quantization TTT

After training ends, before INT6 quantization, adapt the FP32 weights on the full validation set using standard (non-score-first) TTT. This conditions the weights to the val distribution before the precision loss from quantization locks them in.

Scaling law (dim=464, 12k steps, 1×RTX 5090):

preq-TTT epochs	Roundtrip BPB	Delta
0	1.2347	—
3	1.2097	−0.025
5	1.1968	−0.013
7	1.1863	−0.011

Still scaling at 7 epochs. On 8×H100 (DDP-interleaved chunks, all_reduce per epoch), 21 epochs ≈ 240s — expected ~1.15 BPB.

Stack

SP8192 tokenizer · dim=464 · 11 layers · MLP 3× LeakyReLU(0.5)² · BigramHash(1536) · XSA last 4 layers · depth recurrence layers 3–5 ×2 · parallel residuals from layer 7 · QAT INT6 (all layers) · INT8 embeddings · brotli+byte-shuffle compression · EMA+SWA · MuonEq-R

Artifact

15,915,528 bytes (84 KB under 16 MB limit)

…otli (1.1863 BPB) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…ommand Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

BrandtChristian and others added 2 commits April 21, 2026 09:59

Add non-record submission: SP8192 dim=464 + Pre-Quantization TTT + Br…

427c3e8

…otli (1.1863 BPB) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Update submission: full scaling law, 20-shard data, competition run c…

97a0d8a

…ommand Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: SP8192 + dim=464 + Pre-Quantization TTT + Brotli (1.1863 BPB)#1760

Non-record: SP8192 + dim=464 + Pre-Quantization TTT + Brotli (1.1863 BPB)#1760
BrandtChristian wants to merge 2 commits intoopenai:mainfrom
BrandtChristian:submission/sp8192-dim464-preq-ttt-upstream

BrandtChristian commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BrandtChristian commented Apr 21, 2026

Summary

Key Technique: Pre-Quantization TTT

Stack

Artifact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant