Skip to content

Record: Partial SpinQuant (start_layer=5) + PR#1851 Stack — val_bpb 1.06614 (3-seed mean)#1898

Open
X-Abhishek-X wants to merge 3 commits intoopenai:mainfrom
X-Abhishek-X:spinquant-partial-embed6-hparam-greedy-3seed
Open

Record: Partial SpinQuant (start_layer=5) + PR#1851 Stack — val_bpb 1.06614 (3-seed mean)#1898
X-Abhishek-X wants to merge 3 commits intoopenai:mainfrom
X-Abhishek-X:spinquant-partial-embed6-hparam-greedy-3seed

Conversation

@X-Abhishek-X
Copy link
Copy Markdown

3-seed mean: 1.06614 (std 0.00131), seeds 1337/42/2024
All artifacts under 16MB, training under 600s, eval under 600s
Score-first Phased TTT (3 phases, post-quant only)

What's new in this PR

Introduces SPINQUANT_START_LAYER=5 — Hadamard pre-rotation applied to layers 5–10 only (12 modules) instead of all 66. Reduces brotli entropy overhead from ~1MB to ~200KB, freeing enough headroom for EMBED_BITS=6 under the 16MB cap. Zero serialized bytes — rotation regenerated from seed at eval time. Built on SpinQuant V1 (PR #1695).

Results

Seed TTT BPB Artifact
42 1.06484 15,627,137 ✅
2024 1.06611 15,623,946 ✅
1337 1.06746 15,626,137 ✅
Mean 1.06614

Merged SOTA (PR #1493): 1.0810 → delta −0.01486

Credits

X-Abhishek-X and others added 3 commits April 26, 2026 21:12
…b 1.06182, seed 42

SpinQuant V1 Hadamard pre-rotation grafted onto PR openai#1851 stack (CaseOps +
SmearGate-BOS-fix + LQER-Asym + 3-phase Phased TTT). Proves SpinQuant
composes cleanly with LQER: GPTQ damage reduced from +0.00916 to +0.00640
(30% improvement). Artifact 957KB oversize due to Brotli entropy on rotated
tensors — submitted as ablation study.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…arams + PR#1851 Base — val_bpb 1.06614 (3-seed mean)

3-seed mean: 1.06614 (std 0.00131), seeds 1337/42/2024
All artifacts under 16MB, training under 600s, eval under 600s
@X-Abhishek-X X-Abhishek-X changed the title Record: Partial SpinQuant (start_layer=5) + EMBED_BITS=6 + PR#1855 Hparams + PR#1851 Base — val_bpb 1.06614 (3-seed mean) Record: Partial SpinQuant (start_layer=5) + PR#1851 Stack — val_bpb 1.06614 (3-seed mean) Apr 28, 2026
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 28, 2026
Six follow-on specs to spec 060A (openai#1855 port):

- 060B: SDClip ATTN tightening (config-only, eval via RESUME_FROM_CKPT)
- 060C: 046L deploy-time quant repair (~150 lines code port from
  exp/046-quant-repair @ fcb816f); eval-side, free
- 060D: 046G-tighter SDClip (config-only, fits within openai#1855 lrzip headroom)
- 060E: full stack (060B + 060C combined)
- 060F: LQER bumps (RANK=5, TOP_K=4, ASYM_GROUP=32; config-only)
- 060G: Partial SpinQuant from PR openai#1898 (~100 lines code port)

Plus tmp_exec/launch_060_eval.sh: shared eval-only launcher for
RESUME_FROM_CKPT mode, used by 060B/D/E/F. Loads 060A's final_model.pt,
re-quantizes + re-evals with overridden env vars. ~-3 per arm vs
~ for full retrain.

All specs reference 060A's checkpoint at runs/060A-1855-port/seed_42/
final_model.pt as their hotstart.
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 28, 2026
Three arms test whether int6 embedding (saves ~150-300 KB after pergroup
compression) can be recovered by LQER asym capacity bumps:

- H1: EMBED_BITS=6 alone (worst case)
- H2: EMBED_BITS=6 + LQER_RANK=5 (recommended)
- H3: EMBED_BITS=6 + LQER_RANK=5 + TOP_K=4 (most capacity)

PR openai#1898 made int6-embed work via SpinQuant rotation (-2x quant noise on
tok_emb distribution). Without rotation, LQER alone may or may not recover;
this spec measures it. If H2/H3 are quality-neutral or better, stacking
with 060G (Partial SpinQuant) becomes very interesting — could compound
to 060A val_bpb -0.005 with bytes saved.

Eval-only via RESUME_FROM_CKPT (~-3 per arm).
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 28, 2026
PR openai#1898 (X-Abhishek-X) ran Partial SpinQuant + EMBED_BITS=6 reinvest on
the same chain and reported val_bpb 1.06614 vs their base openai#1851's 1.06128
= +0.00486 REGRESSION. Their PR framed it as -0.01486 vs the 2-week-old
merged SOTA openai#1493 (1.0810) instead of vs their actual parent.

Implications:
- 060G (Partial SpinQuant): empirically null/negative on this chain.
- 060H (EMBED_BITS=6 alone or with LQER reinvest): even riskier without
  SpinQuant's rotation protection.

Both specs marked as DEPRECATED at the top. Not deleted (kept as
documentation for if conditions change later, e.g., deploy-time repair
specifically targeting tok_emb precision).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant