Skip to content

SP8192 + Adaptive Hessian-Sensitivity GPTQ Clipping — 1.0822 bpb#1689

Open
chris-colinsky wants to merge 1 commit intoopenai:mainfrom
chris-colinsky:submission/2026-04-17_SP8192_AdaptiveHessianClip
Open

SP8192 + Adaptive Hessian-Sensitivity GPTQ Clipping — 1.0822 bpb#1689
chris-colinsky wants to merge 1 commit intoopenai:mainfrom
chris-colinsky:submission/2026-04-17_SP8192_AdaptiveHessianClip

Conversation

@chris-colinsky
Copy link
Copy Markdown

Summary

  • val_bpb = 1.0822 (3-seed mean, std 0.0009) on 8xH100 SXM
  • Novel contribution: Per-tensor adaptive GPTQ clip_sigmas derived from Hessian sensitivity (H_diag * row_var, exponent -0.15, binary-search offset
    preserving compression budget)
  • All 3 seeds under 15.91 MB, training ~588s

3-Seed Results

Seed Sliding BPB Artifact
1337 1.0811 15,906,928
42 1.0826 15,909,023
999 1.0828 15,911,535

Attribution

Built on @clarkkev's SP8192+GPTQ+SDClip base (PR #1394), with depth recurrence (@dexhunter), parallel residuals (@Robby955, @msisovic), and hyperparameter
tuning (@X-Abhishek-X, PR #1445).

resouer added a commit to resouer/parameter-golf that referenced this pull request Apr 18, 2026
The open frontier is too inconsistent to optimize against directly, but openai#1689
contributes a quantization-only idea that is at least surface-coherent. This
ports its adaptive Hessian-sensitivity clip schedule into the trusted W23
control line so we can measure the quantization effect without changing the
training or eval-time mechanism stack.

Constraint: Keep the W23 training and eval surfaces unchanged; modify only GPTQ clip selection
Rejected: Import the entire openai#1689 public surface | it is weaker than the trusted baseline and would reintroduce public-surface drift
Confidence: medium
Scope-risk: narrow
Reversibility: clean
Directive: Judge W81 as a quantization-only donor lane, not as a public-family reproduction
Tested: python3 -m py_compile train_gpt.py
Not-tested: Remote train/eval on Lepton
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant