Skip to content

Record: SP10240 + SimCTG + QAHSP + post-quant TTT — 1.07197 ttt-sliding-window (3-seed mean, std 0.00023)#2022

Open
BharathSShankar wants to merge 1 commit intoopenai:mainfrom
BharathSShankar:submission/2026-04-30_SP10240_SimCTG_QAHSP_PostQuantTTT_OptioAI
Open

Record: SP10240 + SimCTG + QAHSP + post-quant TTT — 1.07197 ttt-sliding-window (3-seed mean, std 0.00023)#2022
BharathSShankar wants to merge 1 commit intoopenai:mainfrom
BharathSShankar:submission/2026-04-30_SP10240_SimCTG_QAHSP_PostQuantTTT_OptioAI

Conversation

@BharathSShankar
Copy link
Copy Markdown

Summary

Record submission for the 10-min / 16 MB track combining:

  • Architecture: PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855 lineage — 11L × 512d × 8H / 4KV with 3-Layer Recurrence
    (loops 3-5), Parallel Residuals (from layer 7), LeakyReLU(0.5)² SwiGLU, Partial
    RoPE (16/64), XSA on all 11 layers, tied embeddings, SP10240 tokenizer.
  • Optimizer: Polar Express NS Muon.
  • Regularizers: SimCTG contrastive (λ=0.3, margin=0.4) + QAHSP quant-aware
    activation regularizer
    (λ=0.3) — STE penalty MSE(h, STE-quantize(h, int6))
    pushing hidden states onto an int6 grid during training.
  • Test-time: post-quant TTT (TTT_ENABLED=1, default 3 epochs LR 5e-3)
    on already-graded eval tokens, after the legal pre-quant grade pass.
  • Quant + compression: GPTQ int6 (matrices) + int7 (token embeddings) + brotli.

Numbers, cap-fit, and 3-seed std are filled in README.md and the train logs.

Compliance

  • Trains in <600s on 8×H100 (MAX_WALLCLOCK_SECONDS=600).
  • Post-quant TTT operates on tokens after the legal pre-quantization post-EMA
    grading pass per Issue A Field Guide to Valid Submissions #1017 / README eval rules.
  • Artifact ≤ 16,000,000 bytes (validated by open_prs.sh).

Files

  • final_model.int6.ptz — brotli-compressed quantized model
  • train_gpt.py — self-extracting (lzma+base85+exec, SOTA-standard format)
  • submission.json — leaderboard metadata
  • train_seed{42,1337,2025}.log — 3-seed training logs
  • README.md — full record card with cap accounting + 3-seed table

Credits

PR #1855 (architecture lineage), PR #1493 (sliding-window stride 64 + 3-Layer
Recurrence base), PR #1394 (SP-CaseOps line), PR #287 (Partial RoPE),
PR #1412 (Parallel Residuals), PR #549 (LeakyReLU(0.5)²),
PR #1413 (legal score-first TTT framing).

QAHSP regularizer is novel to this submission; see Submission C
(Cross-Base Regularizer Transferability) for the cross-base ablation
characterizing where it helps and where it hurts.

Test plan

  • Decode train_gpt.py: python3 -c "import lzma,base64,re;exec(lzma.decompress(base64.b85decode(re.search(r'b85decode\(\"([^\"]+)\"\)', open('train_gpt.py').read()).group(1))).decode())"
  • Reviewer confirms reproduction at any of the 3 seeds under
    MAX_WALLCLOCK_SECONDS=600 SP_VOCAB_SIZE=10240 N9_SIMCTG_LAMBDA=0.3 N9_SIMCTG_MARGIN=0.4 REG_QAHSP_LAMBDA=0.3 TTT_ENABLED=1
  • Reviewer confirms total bundle bytes ≤ 16,000,000
  • Reviewer confirms 3-seed std clears statistical significance

🤖 Generated with Claude Code

…ssion

Stack: PR openai#1855 lineage (11L x 512d x 8H, 3-Layer Recurrence loops 3-5,
Parallel Residuals from layer 7, LeakyReLU(0.5)^2, Partial RoPE 16/64,
XSA all-layers, SP10240 tokenizer, tied embeddings) + SimCTG (lambda=0.3,
margin=0.4) + QAHSP quant-aware activation regularizer (lambda=0.3) +
post-quant TTT (TTT_ENABLED=1) + Polar Express NS Muon + GPTQ int6/int7 +
brotli compression.

train_gpt.py is in SOTA-standard self-extracting format (lzma+base85+exec).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant