Non-record: SDClip-matched FakeQuantize — reduces quant degradation from +0.17 to +0.044 by Amanbig · Pull Request #1773 · openai/parameter-golf

Amanbig · 2026-04-22T11:45:55Z

Non-record submission

Documenting a QAT/quantizer mismatch fix.

Key finding

When QAT FakeQuantize uses a different clipping formula than the save-time quantizer, the model learns to rely on patterns that disappear post-quant:

Version	Pre-quant BPB	Post-quant BPB	Degradation
v8 (naive absmax FakeQuantize)	1.1387	1.3103	+0.17 💀
v11 (SDClip-matched)	1.1872	1.2313	+0.044 ✓

Stack

SP8192, 11L × 512, 40.5M params
GQA 8H/4KV + Partial RoPE 16/64 + QK-Gain 5.25
MuonEq-R WD=0.095, matrix_lr=0.022
Parallel Residuals (layers 7+)
3-Layer Depth Recurrence (L3,4,5) @ 35%
BigramHash + SmearGate + Value Embeddings
EMA 0.9965 from 50%, Warmdown 72%
SDClip-matched FakeQuantize from 80% QAT
Legal Score-First TTT (SGD lr=0.005, mom=0.9, 3 cosine epochs)
Mixed int5 MLP / int6 Attn / int8 Embed (k=12.85/20.0)

Compute

1×H100 Kaggle, 4000 steps, single seed 1337. Not a record — gap to SOTA (1.0810) is compute, not architecture. Submitted to document the QAT fix.

Credits

Builds on PR #1394 (@clarkkev), PR #1412 (@Robby955), PR #1493 (@bigbag).

Note

Final BPB numbers reflect the trajectory through step 3500 + estimated post-quant/TTT based on v10's measured +0.044 degradation. Happy to re-run with scaled compute for verification.

….044 vs +0.17)

Non-record: SDClip-matched FakeQuantize reduces quant degradation (+0…

0197e28

….044 vs +0.17)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: SDClip-matched FakeQuantize — reduces quant degradation from +0.17 to +0.044#1773

Non-record: SDClip-matched FakeQuantize — reduces quant degradation from +0.17 to +0.044#1773
Amanbig wants to merge 1 commit intoopenai:mainfrom
Amanbig:submission/v11-sdclip-fakequant

Amanbig commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Amanbig commented Apr 22, 2026

Non-record submission

Key finding

Stack

Compute

Credits

Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant