Skip to content

WIP: FreqGPTQ + GatedDeltaNet + Adaptive Quantization#1743

Open
OleStan wants to merge 1 commit intoopenai:mainfrom
OleStan:freqgptq-gateddeltanet
Open

WIP: FreqGPTQ + GatedDeltaNet + Adaptive Quantization#1743
OleStan wants to merge 1 commit intoopenai:mainfrom
OleStan:freqgptq-gateddeltanet

Conversation

@OleStan
Copy link
Copy Markdown

@OleStan OleStan commented Apr 19, 2026

Summary

  • Built on PR Record: GatedDeltaNet (FLA) + Legal Score-First TTT — val_bpb 1.00995 (3-seed mean) #1698 (GatedDeltaNet + Legal TTT, 1.00995 BPB)
  • FreqGPTQ: frequency-weighted Hessian calibration — top-100 tokens get 2x weight in GPTQ
  • PassthroughQuant: int8 for control tensors instead of fp16 (~40KB savings)
  • Sandwich quantization: int8 for final block to protect LM head signal
  • Adaptive embedding precision: int8 for top-100 frequent tokens, intN for rest
  • Configurable Int5/6 GPTQ with synced Late QAT clip range
  • LZMA self-extracting wrapper: ~73KB savings for model budget

Status

WIP — code complete, pending GPU validation. Will update with BPB results and 3-seed logs once compute is available.

Test plan

Built on PR openai#1698 (GatedDeltaNet + Legal TTT). Adds:
- FreqGPTQ: frequency-weighted Hessian calibration for GPTQ
- PassthroughQuant: int8 for control tensors (saves ~40KB)
- Sandwich quantization: int8 for final block
- Adaptive embedding precision: int8 top-100 / intN rest
- Configurable Int5/6 GPTQ with synced QAT
- LZMA wrapper saves ~73KB

Pending GPU validation for BPB results.
@OleStan OleStan force-pushed the freqgptq-gateddeltanet branch from 80cad2a to 1cda344 Compare April 19, 2026 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant