Record: Partial SpinQuant (start_layer=5) + PR#1851 Stack — val_bpb 1.06614 (3-seed mean) by X-Abhishek-X · Pull Request #1898 · openai/parameter-golf

X-Abhishek-X · 2026-04-28T17:52:05Z

3-seed mean: 1.06614 (std 0.00131), seeds 1337/42/2024
All artifacts under 16MB, training under 600s, eval under 600s
Score-first Phased TTT (3 phases, post-quant only)

What's new in this PR

Introduces SPINQUANT_START_LAYER=5 — Hadamard pre-rotation applied to layers 5–10 only (12 modules) instead of all 66. Reduces brotli entropy overhead from ~1MB to ~200KB, freeing enough headroom for EMBED_BITS=6 under the 16MB cap. Zero serialized bytes — rotation regenerated from seed at eval time. Built on SpinQuant V1 (PR #1695).

Results

Seed	TTT BPB	Artifact
42	1.06484	15,627,137 ✅
2024	1.06611	15,623,946 ✅
1337	1.06746	15,626,137 ✅
Mean	1.06614

Merged SOTA (PR #1493): 1.0810 → delta −0.01486

Credits

@aquariouseworkman — PR#1851 base (SmearGate BOS fix, LQER Asym, Phased TTT)
@romeerp — CaseOps SP8192 tokenizer (PR Record: CaseOps Tokenizer + Tapered WD - val_bpb 1.0678 (3-seed mean) #1729)
@nprime06 — SparseAttnGate + MIN_LR (PR Record: PR #1736 + Polar Express NS + MIN_LR + Sparse Attn Gate + Fused CE + PR #1767 TTT — val_bpb 1.06335 #1787)
@dexhunter — SmearGate + LQER Asym (PR Record: PR #1787 base + Smear Gate + LQER Asym — val_bpb 1.06157 #1797)
@cocohearts — SmearGate BOS audit (PR Record: PR #1787 base + Smear Gate + LQER Asym — val_bpb 1.06157 #1797)
@abaybektursun — Phased TTT framework (PR Record: LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1194 (3-seed mean) #549)
@clarkkev — GPTQ + SDClip (PR Record: SP8192 + GPTQ Embeddings + Depth Recurrence + MuonEq-R + SDClip — val_bpb 1.08563 (5 seed mean) #1394)
PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855 authors — hparam greedy search
@X-Abhishek-X — SpinQuant V1 + Partial SpinQuant (PR [Record] Stage 3 + SpinQuant V1 + MP-SGD-TTT — val_bpb 1.0759 #1695)

…07063, healing-property observation

…b 1.06182, seed 42 SpinQuant V1 Hadamard pre-rotation grafted onto PR openai#1851 stack (CaseOps + SmearGate-BOS-fix + LQER-Asym + 3-phase Phased TTT). Proves SpinQuant composes cleanly with LQER: GPTQ damage reduced from +0.00916 to +0.00640 (30% improvement). Artifact 957KB oversize due to Brotli entropy on rotated tensors — submitted as ablation study. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…arams + PR#1851 Base — val_bpb 1.06614 (3-seed mean) 3-seed mean: 1.06614 (std 0.00131), seeds 1337/42/2024 All artifacts under 16MB, training under 600s, eval under 600s

Six follow-on specs to spec 060A (openai#1855 port): - 060B: SDClip ATTN tightening (config-only, eval via RESUME_FROM_CKPT) - 060C: 046L deploy-time quant repair (~150 lines code port from exp/046-quant-repair @ fcb816f); eval-side, free - 060D: 046G-tighter SDClip (config-only, fits within openai#1855 lrzip headroom) - 060E: full stack (060B + 060C combined) - 060F: LQER bumps (RANK=5, TOP_K=4, ASYM_GROUP=32; config-only) - 060G: Partial SpinQuant from PR openai#1898 (~100 lines code port) Plus tmp_exec/launch_060_eval.sh: shared eval-only launcher for RESUME_FROM_CKPT mode, used by 060B/D/E/F. Loads 060A's final_model.pt, re-quantizes + re-evals with overridden env vars. ~-3 per arm vs ~ for full retrain. All specs reference 060A's checkpoint at runs/060A-1855-port/seed_42/ final_model.pt as their hotstart.

Three arms test whether int6 embedding (saves ~150-300 KB after pergroup compression) can be recovered by LQER asym capacity bumps: - H1: EMBED_BITS=6 alone (worst case) - H2: EMBED_BITS=6 + LQER_RANK=5 (recommended) - H3: EMBED_BITS=6 + LQER_RANK=5 + TOP_K=4 (most capacity) PR openai#1898 made int6-embed work via SpinQuant rotation (-2x quant noise on tok_emb distribution). Without rotation, LQER alone may or may not recover; this spec measures it. If H2/H3 are quality-neutral or better, stacking with 060G (Partial SpinQuant) becomes very interesting — could compound to 060A val_bpb -0.005 with bytes saved. Eval-only via RESUME_FROM_CKPT (~-3 per arm).

PR openai#1898 (X-Abhishek-X) ran Partial SpinQuant + EMBED_BITS=6 reinvest on the same chain and reported val_bpb 1.06614 vs their base openai#1851's 1.06128 = +0.00486 REGRESSION. Their PR framed it as -0.01486 vs the 2-week-old merged SOTA openai#1493 (1.0810) instead of vs their actual parent. Implications: - 060G (Partial SpinQuant): empirically null/negative on this chain. - 060H (EMBED_BITS=6 alone or with LQER reinvest): even riskier without SpinQuant's rotation protection. Both specs marked as DEPRECATED at the top. Not deleted (kept as documentation for if conditions change later, e.g., deploy-time repair specifically targeting tok_emb precision).

X-Abhishek-X and others added 3 commits April 26, 2026 21:12

Non-record (wishlist): E2E TTT — full-model SGD per chunk, val_bpb 1.…

b87b494

…07063, healing-property observation

Record: Partial SpinQuant (start_layer=5) + EMBED_BITS=6 + PR#1855 Hp…

8463faf

…arams + PR#1851 Base — val_bpb 1.06614 (3-seed mean) 3-seed mean: 1.06614 (std 0.00131), seeds 1337/42/2024 All artifacts under 16MB, training under 600s, eval under 600s

X-Abhishek-X changed the title ~~Record: Partial SpinQuant (start_layer=5) + EMBED_BITS=6 + PR#1855 Hparams + PR#1851 Base — val_bpb 1.06614 (3-seed mean)~~ Record: Partial SpinQuant (start_layer=5) + PR#1851 Stack — val_bpb 1.06614 (3-seed mean) Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Partial SpinQuant (start_layer=5) + PR#1851 Stack — val_bpb 1.06614 (3-seed mean)#1898

Record: Partial SpinQuant (start_layer=5) + PR#1851 Stack — val_bpb 1.06614 (3-seed mean)#1898
X-Abhishek-X wants to merge 3 commits intoopenai:mainfrom
X-Abhishek-X:spinquant-partial-embed6-hparam-greedy-3seed

X-Abhishek-X commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

X-Abhishek-X commented Apr 28, 2026

What's new in this PR

Results

Credits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant