Record: SP10240 + SimCTG + QAHSP + post-quant TTT — 1.07197 ttt-sliding-window (3-seed mean, std 0.00023) by BharathSShankar · Pull Request #2022 · openai/parameter-golf

BharathSShankar · 2026-04-30T21:37:17Z

Summary

Record submission for the 10-min / 16 MB track combining:

Architecture: PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855 lineage — 11L × 512d × 8H / 4KV with 3-Layer Recurrence
(loops 3-5), Parallel Residuals (from layer 7), LeakyReLU(0.5)² SwiGLU, Partial
RoPE (16/64), XSA on all 11 layers, tied embeddings, SP10240 tokenizer.
Optimizer: Polar Express NS Muon.
Regularizers: SimCTG contrastive (λ=0.3, margin=0.4) + QAHSP quant-aware
activation regularizer (λ=0.3) — STE penalty MSE(h, STE-quantize(h, int6))
pushing hidden states onto an int6 grid during training.
Test-time: post-quant TTT (TTT_ENABLED=1, default 3 epochs LR 5e-3)
on already-graded eval tokens, after the legal pre-quant grade pass.
Quant + compression: GPTQ int6 (matrices) + int7 (token embeddings) + brotli.

Numbers, cap-fit, and 3-seed std are filled in README.md and the train logs.

Compliance

Trains in <600s on 8×H100 (MAX_WALLCLOCK_SECONDS=600).
Post-quant TTT operates on tokens after the legal pre-quantization post-EMA
grading pass per Issue A Field Guide to Valid Submissions #1017 / README eval rules.
Artifact ≤ 16,000,000 bytes (validated by open_prs.sh).

Files

final_model.int6.ptz — brotli-compressed quantized model
train_gpt.py — self-extracting (lzma+base85+exec, SOTA-standard format)
submission.json — leaderboard metadata
train_seed{42,1337,2025}.log — 3-seed training logs
README.md — full record card with cap accounting + 3-seed table

Credits

PR #1855 (architecture lineage), PR #1493 (sliding-window stride 64 + 3-Layer
Recurrence base), PR #1394 (SP-CaseOps line), PR #287 (Partial RoPE),
PR #1412 (Parallel Residuals), PR #549 (LeakyReLU(0.5)²),
PR #1413 (legal score-first TTT framing).

QAHSP regularizer is novel to this submission; see Submission C
(Cross-Base Regularizer Transferability) for the cross-base ablation
characterizing where it helps and where it hurts.

Test plan

Decode train_gpt.py: python3 -c "import lzma,base64,re;exec(lzma.decompress(base64.b85decode(re.search(r'b85decode\(\"([^\"]+)\"\)', open('train_gpt.py').read()).group(1))).decode())"
Reviewer confirms reproduction at any of the 3 seeds under
MAX_WALLCLOCK_SECONDS=600 SP_VOCAB_SIZE=10240 N9_SIMCTG_LAMBDA=0.3 N9_SIMCTG_MARGIN=0.4 REG_QAHSP_LAMBDA=0.3 TTT_ENABLED=1
Reviewer confirms total bundle bytes ≤ 16,000,000
Reviewer confirms 3-seed std clears statistical significance

🤖 Generated with Claude Code

…ssion Stack: PR openai#1855 lineage (11L x 512d x 8H, 3-Layer Recurrence loops 3-5, Parallel Residuals from layer 7, LeakyReLU(0.5)^2, Partial RoPE 16/64, XSA all-layers, SP10240 tokenizer, tied embeddings) + SimCTG (lambda=0.3, margin=0.4) + QAHSP quant-aware activation regularizer (lambda=0.3) + post-quant TTT (TTT_ENABLED=1) + Polar Express NS Muon + GPTQ int6/int7 + brotli compression. train_gpt.py is in SOTA-standard self-extracting format (lzma+base85+exec).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SP10240 + SimCTG + QAHSP + post-quant TTT — 1.07197 ttt-sliding-window (3-seed mean, std 0.00023)#2022

Record: SP10240 + SimCTG + QAHSP + post-quant TTT — 1.07197 ttt-sliding-window (3-seed mean, std 0.00023)#2022
BharathSShankar wants to merge 1 commit intoopenai:mainfrom
BharathSShankar:submission/2026-04-30_SP10240_SimCTG_QAHSP_PostQuantTTT_OptioAI

BharathSShankar commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BharathSShankar commented Apr 30, 2026

Summary

Compliance

Files

Credits

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant