Record: Compliant PR #1934 Reproduction (GPTQ_RESERVE=5.5) — val_bpb 1.06003 (3-seed) by Christopher-Lee-McClendon · Pull Request #1950 · openai/parameter-golf

Christopher-Lee-McClendon · 2026-04-29T22:49:04Z

Record: Compliant PR #1934 Reproduction — val_bpb 1.06003 (3-seed mean)

Summary

Compliance audit reproduction of PR #1934's exact recipe with GPTQ_RESERVE_SECONDS=5.5 (vs #1934's 0.5) to ensure GPTQ hessian collection completes within the 600s training budget.

3-seed mean val_bpb: 1.06003 (std: 0.000385)

Results

Seed	Post-TTT val_bpb	Artifact Bytes	Steps	Train+Hessians
42	1.05987	15,971,933	4962	598.1s ✓
314	1.05975	15,970,997	4952	598.1s ✓
999	1.06047	15,974,305	4954	598.2s ✓

Compliance

Metric	Value	Budget	Status
Training loop + hessians	598.2s max	600s	✅
Artifact size	15,974,305 B max	16,000,000 B	✅
TTT eval time	547.1s max	600s	✅

Timing breakdown (typical seed):

Training loop (gradient steps): 594.6s
GPTQ hessian collection: 3.5s → cumulative 598.1s < 600s ✓
GPTQ quantization: 10.0s (post-budget serialization)
Per-group lrzip compression: 118.3s (post-budget serialization)

Comparison to PR #1934

Metric	PR #1934	This Run	Delta
Mean val_bpb	1.05993	1.06003	+0.00010
GPTQ_RESERVE_SECONDS	0.5	5.5	+5.0
Hessians finish at	~603.0s	~598.0s	-5.0s
Steps achieved	4974–4984	4952–4962	-22

The BPB delta of +0.00010 is well within 1σ (std=0.000385), confirming that reserving adequate time for GPTQ hessians does not meaningfully degrade performance.

Architecture

11L 512d 8H/4KV transformer with U-Net skips, parallel residuals (start layer 8), partial RoPE, depth recurrence (loop layers 3–5, NUM_LOOPS=2), CaseOps SP8192, LQER asymmetric INT6+INT7 embed, per-group lrzip compression, SmearGate (window 12), sparse attention gate, fused CE, phased TTT (3 phases, score-first, prefix 2000 docs).

Key Environment Variables

GPTQ_RESERVE_SECONDS=5.5  COMPRESSOR=pergroup  EMBED_WD=0.06
MATRIX_CLIP_SIGMAS=12.85  ATTN_CLIP_SIGMAS=12.0  MLP_CLIP_SIGMAS=12.0
EMBED_BITS=7  EMBED_CLIP_SIGMAS=12.0  MATRIX_LR=0.026  MIN_LR=0.1
CASEOPS_ENABLED=1  SMEAR_GATE_ENABLED=1  GATE_WINDOW=12
LQER_ENABLED=1  LQER_RANK=4  LQER_TOP_K=3  LQER_FACTOR_BITS=4
LQER_ASYM_ENABLED=1  LQER_ASYM_GROUP=64
PHASED_TTT_PREFIX_DOCS=2000  PHASED_TTT_NUM_PHASES=3  TTT_WARM_START_A=1
SPARSE_ATTN_GATE_ENABLED=1  FUSED_CE_ENABLED=1  NCCL_NET=Socket

Credits

PR Record: SP8192 CaseOps + TTT + GPTQ + LRZIP — val_bpb 1.05993 (3-seed mean) #1934 @liujshi — Recipe (pergroup lrzip + embed_wd + clip tuning)
PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855 @liujshi — Per-group lrzip compression pipeline
PR Record: PR #1736 + Polar Express NS + MIN_LR + Sparse Attn Gate + Fused CE + PR #1767 TTT — val_bpb 1.06335 #1787 @nprime06 — 11L base architecture
PR Record: PR #1787 base + Smear Gate + LQER Asym — val_bpb 1.06157 #1797 @dexhunter — SmearGate + LQER
PR Record: CaseOps Tokenizer + Tapered WD - val_bpb 1.0678 (3-seed mean) #1729 @romeerp — CaseOps SP8192
PR Record: SP8192 + GPTQ Embeddings + Depth Recurrence + MuonEq-R + SDClip — val_bpb 1.08563 (5 seed mean) #1394 @clarkkev — GPTQ + SP8192
PR Record: LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1194 (3-seed mean) #549 @abaybektursun — Score-first TTT

Hardware

8×H100 SXM 80GB, PyTorch 2.9+, Docker: matotezitanka/proteus-pytorch:community

Reproduces PR openai#1934's exact recipe (pergroup lrzip compression, EMBED_WD=0.06, tightened clip sigmas) with GPTQ_RESERVE_SECONDS=5.5 to ensure GPTQ hessians complete within the 600s training budget. Results (3-seed mean: 1.06003, std: 0.000385): - Seed 42: 1.05987 (4962 steps, artifact 15,971,933 B) - Seed 314: 1.05975 (4952 steps, artifact 15,970,997 B) - Seed 999: 1.06047 (4954 steps, artifact 15,974,305 B) Compliance: train_loop + hessians = 598.2s max (< 600s) Delta vs PR openai#1934: +0.00010 BPB (negligible, within noise) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Christopher-Lee-McClendon mentioned this pull request Apr 30, 2026

[Non-record] Long-Train Artifact Scaling: post-TTT BPB = 1.0399, artifact size constant across 10–60 min #1979

Open

jamesEmerson112 mentioned this pull request Apr 30, 2026

Record: SP8192 Full Stack + Headwise Gated Attention + PreQuantTTT (1.0511 BPB, 3-seed) #1992

Closed

Christopher-Lee-McClendon mentioned this pull request Apr 30, 2026

[Non-Record] 6h Long-Train Scaling + TTT Sweep: Post-TTT BPB 1.03387 #2008

Open

cocohearts mentioned this pull request May 2, 2026

Update leaderboard with May 1 audited rows #2146

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Compliant PR #1934 Reproduction (GPTQ_RESERVE=5.5) — val_bpb 1.06003 (3-seed)#1950

Record: Compliant PR #1934 Reproduction (GPTQ_RESERVE=5.5) — val_bpb 1.06003 (3-seed)#1950
Christopher-Lee-McClendon wants to merge 1 commit intoopenai:mainfrom
Christopher-Lee-McClendon:submission/record-1934-compliance-audit

Christopher-Lee-McClendon commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Christopher-Lee-McClendon commented Apr 29, 2026

Record: Compliant PR #1934 Reproduction — val_bpb 1.06003 (3-seed mean)

Summary

Results

Compliance

Comparison to PR #1934

Architecture

Key Environment Variables

Credits

Hardware

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant