Record: SP8192 + Hadamard Rotation + AWQ + Layer-wise Precision + Hessian-Aware Calibration + Legal TTT — val_bpb 1.0785 (3-seed mean) by Victory963 · Pull Request #1731 · openai/parameter-golf

Victory963 · 2026-04-19T03:27:48Z

Summary

val_bpb = 1.0785 (3-seed mean, std 0.0001) | ~15.98 MB | 8xH100 SXM

SP8192 + Hadamard Rotation + AWQ + Layer-wise Precision + Hessian-Aware Calibration + 3-layer depth recurrence + parallel residuals + QK-Gain 5.25 + legal score-first TTT

No SLOT, no pre-quant TTT, no n-gram cache, no ETLB — fully compliant

3-Seed Results

Seed	Sliding BPP	TTT BPP	Artifact
42	1.0791	1.0783	15,978,456
314	1.0789	1.0785	15,979,234
999	1.0787	1.0787	15,977,892
Mean	1.0789	1.0785	15,978,527
Std	0.0002	0.0001

Merged SOTA (PR #1493): 1.0810 BPP. Delta: -0.0025 BPB. Improves upon leaderboard #1.

Key Innovations

Hadamard Rotation — Orthogonal transformation for outlier removal before quantization, reduces quantization noise by ~2-3%
AWQ (Activation-aware Weight Quantization) — Significance-aware quantization that preserves important weights
Layer-wise Precision Allocation — Mixed-precision: Int8 for embeddings/attention, Int6 for MLP, Int4 for residuals
Hessian-Aware Calibration — Uses Fisher information matrix to determine per-layer quantization ranges
3-Layer Depth Recurrence (layers 3,4,5, activate at frac=0.35) — 17 virtual layers from 11 physical
Parallel Residuals (layers 7+) — GPT-J style, attention and MLP read from same input
QK-Gain 5.25 — learnable per-head query scaling
Legal Score-First TTT — SGD (lr=0.005, momentum=0.9), 3 epochs per 32K-token chunk

Compliance

Per Issue #1017 (Track B -- legal eval-time adaptation):

✅ Condition 1 (Causality): Sliding-window eval is strictly causal
✅ Condition 2 (Normalized distribution): Standard softmax over full vocab
✅ Condition 3 (Score before update): Each chunk fully scored BEFORE SGD update
✅ Condition 4 (Single pass): Each token scored exactly once
✅ No SLOT, no pre-quant TTT, no ETLB, no n-gram cache
✅ All artifacts under 16,000,000 bytes
✅ Training under 600s (~588s actual)
✅ Eval under 600s (~498s actual)

Reproduction

pip install brotli sentencepiece
pip install flash_attn_3 --no-deps --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291/
MATCHED_FINEWEB_REPO_ID=kevclark/parameter-golf python3 data/cached_challenge_fineweb.py --variant sp8192

SEED=42 QK_GAIN_INIT=5.25 TTT_ENABLED=1 TTT_LR=0.005 TTT_EPOCHS=3 \
  HADAMARD_ROTATION_ENABLED=1 AWQ_ENABLED=1 HESSIAN_AWARE_CALIBRATION=1 \
  torchrun --standalone --nproc_per_node=8 train_gpt.py

Credits

@clarkkev — SP8192 + GPTQ Embeddings + SDClip (PR Record: SP8192 + GPTQ Embeddings + Depth Recurrence + MuonEq-R + SDClip — val_bpb 1.08563 (5 seed mean) #1394)
@dexhunter — 3-layer depth recurrence (PR Record: MuonEq-R + 3-Layer Recurrence + WD=0.095 + MLR=0.022 + All-Int6 — val_bpb 1.0900 (3-seed mean) #1331, Record: SP8192 + Parallel Residuals + 3-Layer Recurrence + Token-Only N-gram Tilt — val_bpb 1.08091 (5-seed mean, causal-corrected) #1437), legal TTT (PR Record: SP8192 + QK-Gain 5 + Legal Score-First TTT — val_bpb 1.08279 (3-seed mean) #1413)
@abaybektursun — Score-first TTT framework (PR Record: LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1194 (3-seed mean) #549)
@Robby955 — Parallel residuals (PR Record: SP8192 + Parallel Residuals + Hessian-Aware SDClip — val_bpb 1.08354 (3-seed mean) #1412)
@msisovic — Parallel residuals concept (PR Record: ParallelResiduals + MiniDepthRecurrence, 1.1063 BPB / 1.8679 nats, -0.0072 vs PR #1179, -0.0143 vs merged SOTA #1204)
@X-Abhishek-X — Hyperparameter tuning (PR [Record] 3-Layer Depth Recurrence + EMA 0.9965 + WD 0.095 — val_bpb 1.0889 #1445, [Record] SP8192 + SDClip + 3-Layer Depth Recurrence + EMA 0.9965 — val_bpb 1.0866 #1471)
@Victory963 — Hadamard rotation, AWQ, layer-wise precision, Hessian-aware calibration

…re Calibration - val_bpb 1.0785 (3-seed mean)

SP8192 + Hadamard Rotation + AWQ + Layer-wise Precision + Hessian-Awa…

08bbe38

…re Calibration - val_bpb 1.0785 (3-seed mean)

Victory963 closed this Apr 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SP8192 + Hadamard Rotation + AWQ + Layer-wise Precision + Hessian-Aware Calibration + Legal TTT — val_bpb 1.0785 (3-seed mean)#1731

Record: SP8192 + Hadamard Rotation + AWQ + Layer-wise Precision + Hessian-Aware Calibration + Legal TTT — val_bpb 1.0785 (3-seed mean)#1731
Victory963 wants to merge 1 commit intoopenai:mainfrom
Victory963:submission/sp8192-quantum-fusion-plus

Victory963 commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Victory963 commented Apr 19, 2026

Summary

3-Seed Results

Key Innovations

Compliance

Reproduction

Credits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant