WIP: Sequential GPTQ with Groupwise Int6 — improved post-training quantization on SP4096 base by zoharb157 · Pull Request #1664 · openai/parameter-golf

zoharb157 · 2026-04-16T07:51:22Z

Summary

Improve post-training quantization on PR #1218 base (SP4096, MLP 4×, WD 0.085, XSA-all, brotli). Three algorithmic improvements with zero training-time cost:

Sequential cross-layer GPTQ propagation: Quantize layers one at a time, inject quantized weights back into the model, then collect Hessians for later layers. Later layers' Hessians reflect actual quantized activations, capturing cross-layer error accumulation.
Groupwise int6 scales (group_size=128): Per-group fp16 scales instead of per-row, giving finer control over heterogeneous weight distributions. ~2% scale storage overhead for significant MSE reduction.
Hessian-weighted scale selection: Minimize sum(H_diag * (W-Q)^2) instead of MSE when selecting per-row clip percentiles, directly optimizing output reconstruction quality.

Implementation is complete (280 lines changed). Requesting compute credits for 3-seed validation on 8×H100.

Expected −0.004 to −0.008 dBPB improvement from recovering quantization damage (pre-quant→post-quant gap is 0.012 BPB in baseline).

Test plan

Reproduce Record: 4096-Vocab + 4.0-MLP-mult + 0.085-WD + Simplifications — val_bpb 1.09785 (3-seed mean) #1218 baseline at 3 seeds (1337, 42, 2025)
Ablate: sequential propagation only (GPTQ_SEQUENTIAL=1 GPTQ_GROUP_SIZE=0)
Ablate: groupwise scales only (GPTQ_SEQUENTIAL=0 GPTQ_GROUP_SIZE=128)
Ablate: Hessian-weighted selection only (per-row mode)
Full stack: all three combined (default config)
Paired t-test across 3 seeds, p < 0.01, dBPB > 0.003
Verify artifact stays under 16MB cap with groupwise scale overhead

Improve post-training quantization on PR openai#1218 base (SP4096, MLP 4x, WD 0.085). Three changes: sequential cross-layer error propagation, groupwise int6 scales (group_size=128), and Hessian-weighted scale selection. Expected -0.004 to -0.008 dBPB with zero training-time cost. Made-with: Cursor

Three improvements to the post-training quantization pipeline on PR openai#1218: 1. Sequential cross-layer GPTQ: quantize layers one at a time, injecting quantized weights back before collecting later layers' Hessians. This propagates quantization error forward so later Hessians are accurate. 2. Groupwise int6 scales (group_size=128): per-group fp16 scales instead of per-row, giving finer control over weight variance within rows. 3. Hessian-weighted scale selection: minimize H_diag-weighted error instead of MSE when selecting per-row clip percentiles. Zero training-time cost. Expected -0.004 to -0.008 dBPB. Made-with: Cursor

zoharb157 added 2 commits April 16, 2026 10:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Sequential GPTQ with Groupwise Int6 — improved post-training quantization on SP4096 base#1664

WIP: Sequential GPTQ with Groupwise Int6 — improved post-training quantization on SP4096 base#1664
zoharb157 wants to merge 2 commits intoopenai:mainfrom
zoharb157:submission/sp4096-sequential-gptq-groupwise-int6

zoharb157 commented Apr 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zoharb157 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zoharb157 commented Apr 16, 2026 •

edited

Loading