Skip to content

Non-record: SP8192 + dim=464 + Pre-Quantization TTT + Brotli (1.1863 BPB)#1760

Open
BrandtChristian wants to merge 2 commits intoopenai:mainfrom
BrandtChristian:submission/sp8192-dim464-preq-ttt-upstream
Open

Non-record: SP8192 + dim=464 + Pre-Quantization TTT + Brotli (1.1863 BPB)#1760
BrandtChristian wants to merge 2 commits intoopenai:mainfrom
BrandtChristian:submission/sp8192-dim464-preq-ttt-upstream

Conversation

@BrandtChristian
Copy link
Copy Markdown

Summary

val_bpb: 1.1863 (roundtrip, seed 1337) | 15.92 MB | 1×RTX 5090, 12k steps

Post-TTT: 1.1524 BPB (score-first TTT, 3 epochs on preq-adapted weights)

Submitting to non-record track: trained 12k steps on a single RTX 5090 (~33 min), exceeding the 10-min budget. The technique is designed to run on 8×H100 with MAX_WALLCLOCK_SECONDS=600 and PREQ_TTT_EPOCHS=21.

Key Technique: Pre-Quantization TTT

After training ends, before INT6 quantization, adapt the FP32 weights on the full validation set using standard (non-score-first) TTT. This conditions the weights to the val distribution before the precision loss from quantization locks them in.

Scaling law (dim=464, 12k steps, 1×RTX 5090):

preq-TTT epochs Roundtrip BPB Delta
0 1.2347
3 1.2097 −0.025
5 1.1968 −0.013
7 1.1863 −0.011

Still scaling at 7 epochs. On 8×H100 (DDP-interleaved chunks, all_reduce per epoch), 21 epochs ≈ 240s — expected ~1.15 BPB.

Stack

SP8192 tokenizer · dim=464 · 11 layers · MLP 3× LeakyReLU(0.5)² · BigramHash(1536) · XSA last 4 layers · depth recurrence layers 3–5 ×2 · parallel residuals from layer 7 · QAT INT6 (all layers) · INT8 embeddings · brotli+byte-shuffle compression · EMA+SWA · MuonEq-R

Artifact

15,915,528 bytes (84 KB under 16 MB limit)

BrandtChristian and others added 2 commits April 21, 2026 09:59
…otli (1.1863 BPB)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ommand

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant