Record: Partial SpinQuant (start_layer=5) + PR#1851 Stack — val_bpb 1.06614 (3-seed mean)#1898
Open
X-Abhishek-X wants to merge 3 commits intoopenai:mainfrom
Open
Conversation
…07063, healing-property observation
…b 1.06182, seed 42 SpinQuant V1 Hadamard pre-rotation grafted onto PR openai#1851 stack (CaseOps + SmearGate-BOS-fix + LQER-Asym + 3-phase Phased TTT). Proves SpinQuant composes cleanly with LQER: GPTQ damage reduced from +0.00916 to +0.00640 (30% improvement). Artifact 957KB oversize due to Brotli entropy on rotated tensors — submitted as ablation study. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…arams + PR#1851 Base — val_bpb 1.06614 (3-seed mean) 3-seed mean: 1.06614 (std 0.00131), seeds 1337/42/2024 All artifacts under 16MB, training under 600s, eval under 600s
leon2k2k2k
added a commit
to leon2k2k2k/parameter-golf
that referenced
this pull request
Apr 28, 2026
Six follow-on specs to spec 060A (openai#1855 port): - 060B: SDClip ATTN tightening (config-only, eval via RESUME_FROM_CKPT) - 060C: 046L deploy-time quant repair (~150 lines code port from exp/046-quant-repair @ fcb816f); eval-side, free - 060D: 046G-tighter SDClip (config-only, fits within openai#1855 lrzip headroom) - 060E: full stack (060B + 060C combined) - 060F: LQER bumps (RANK=5, TOP_K=4, ASYM_GROUP=32; config-only) - 060G: Partial SpinQuant from PR openai#1898 (~100 lines code port) Plus tmp_exec/launch_060_eval.sh: shared eval-only launcher for RESUME_FROM_CKPT mode, used by 060B/D/E/F. Loads 060A's final_model.pt, re-quantizes + re-evals with overridden env vars. ~-3 per arm vs ~ for full retrain. All specs reference 060A's checkpoint at runs/060A-1855-port/seed_42/ final_model.pt as their hotstart.
leon2k2k2k
added a commit
to leon2k2k2k/parameter-golf
that referenced
this pull request
Apr 28, 2026
Three arms test whether int6 embedding (saves ~150-300 KB after pergroup compression) can be recovered by LQER asym capacity bumps: - H1: EMBED_BITS=6 alone (worst case) - H2: EMBED_BITS=6 + LQER_RANK=5 (recommended) - H3: EMBED_BITS=6 + LQER_RANK=5 + TOP_K=4 (most capacity) PR openai#1898 made int6-embed work via SpinQuant rotation (-2x quant noise on tok_emb distribution). Without rotation, LQER alone may or may not recover; this spec measures it. If H2/H3 are quality-neutral or better, stacking with 060G (Partial SpinQuant) becomes very interesting — could compound to 060A val_bpb -0.005 with bytes saved. Eval-only via RESUME_FROM_CKPT (~-3 per arm).
leon2k2k2k
added a commit
to leon2k2k2k/parameter-golf
that referenced
this pull request
Apr 28, 2026
PR openai#1898 (X-Abhishek-X) ran Partial SpinQuant + EMBED_BITS=6 reinvest on the same chain and reported val_bpb 1.06614 vs their base openai#1851's 1.06128 = +0.00486 REGRESSION. Their PR framed it as -0.01486 vs the 2-week-old merged SOTA openai#1493 (1.0810) instead of vs their actual parent. Implications: - 060G (Partial SpinQuant): empirically null/negative on this chain. - 060H (EMBED_BITS=6 alone or with LQER reinvest): even riskier without SpinQuant's rotation protection. Both specs marked as DEPRECATED at the top. Not deleted (kept as documentation for if conditions change later, e.g., deploy-time repair specifically targeting tok_emb precision).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
3-seed mean: 1.06614 (std 0.00131), seeds 1337/42/2024
All artifacts under 16MB, training under 600s, eval under 600s
Score-first Phased TTT (3 phases, post-quant only)
What's new in this PR
Introduces
SPINQUANT_START_LAYER=5— Hadamard pre-rotation applied to layers 5–10 only (12 modules) instead of all 66. Reduces brotli entropy overhead from ~1MB to ~200KB, freeing enough headroom for EMBED_BITS=6 under the 16MB cap. Zero serialized bytes — rotation regenerated from seed at eval time. Built on SpinQuant V1 (PR #1695).Results
Merged SOTA (PR #1493): 1.0810 → delta −0.01486
Credits