Skip to content

Record: Compliant PR #1934 Reproduction (GPTQ_RESERVE=5.5) — val_bpb 1.06003 (3-seed)#1950

Open
Christopher-Lee-McClendon wants to merge 1 commit intoopenai:mainfrom
Christopher-Lee-McClendon:submission/record-1934-compliance-audit
Open

Record: Compliant PR #1934 Reproduction (GPTQ_RESERVE=5.5) — val_bpb 1.06003 (3-seed)#1950
Christopher-Lee-McClendon wants to merge 1 commit intoopenai:mainfrom
Christopher-Lee-McClendon:submission/record-1934-compliance-audit

Conversation

@Christopher-Lee-McClendon
Copy link
Copy Markdown
Contributor

Record: Compliant PR #1934 Reproduction — val_bpb 1.06003 (3-seed mean)

Summary

Compliance audit reproduction of PR #1934's exact recipe with GPTQ_RESERVE_SECONDS=5.5 (vs #1934's 0.5) to ensure GPTQ hessian collection completes within the 600s training budget.

3-seed mean val_bpb: 1.06003 (std: 0.000385)

Results

Seed Post-TTT val_bpb Artifact Bytes Steps Train+Hessians
42 1.05987 15,971,933 4962 598.1s ✓
314 1.05975 15,970,997 4952 598.1s ✓
999 1.06047 15,974,305 4954 598.2s ✓

Compliance

Metric Value Budget Status
Training loop + hessians 598.2s max 600s
Artifact size 15,974,305 B max 16,000,000 B
TTT eval time 547.1s max 600s

Timing breakdown (typical seed):

  • Training loop (gradient steps): 594.6s
  • GPTQ hessian collection: 3.5s → cumulative 598.1s < 600s
  • GPTQ quantization: 10.0s (post-budget serialization)
  • Per-group lrzip compression: 118.3s (post-budget serialization)

Comparison to PR #1934

Metric PR #1934 This Run Delta
Mean val_bpb 1.05993 1.06003 +0.00010
GPTQ_RESERVE_SECONDS 0.5 5.5 +5.0
Hessians finish at ~603.0s ~598.0s -5.0s
Steps achieved 4974–4984 4952–4962 -22

The BPB delta of +0.00010 is well within 1σ (std=0.000385), confirming that reserving adequate time for GPTQ hessians does not meaningfully degrade performance.

Architecture

11L 512d 8H/4KV transformer with U-Net skips, parallel residuals (start layer 8), partial RoPE, depth recurrence (loop layers 3–5, NUM_LOOPS=2), CaseOps SP8192, LQER asymmetric INT6+INT7 embed, per-group lrzip compression, SmearGate (window 12), sparse attention gate, fused CE, phased TTT (3 phases, score-first, prefix 2000 docs).

Key Environment Variables

GPTQ_RESERVE_SECONDS=5.5  COMPRESSOR=pergroup  EMBED_WD=0.06
MATRIX_CLIP_SIGMAS=12.85  ATTN_CLIP_SIGMAS=12.0  MLP_CLIP_SIGMAS=12.0
EMBED_BITS=7  EMBED_CLIP_SIGMAS=12.0  MATRIX_LR=0.026  MIN_LR=0.1
CASEOPS_ENABLED=1  SMEAR_GATE_ENABLED=1  GATE_WINDOW=12
LQER_ENABLED=1  LQER_RANK=4  LQER_TOP_K=3  LQER_FACTOR_BITS=4
LQER_ASYM_ENABLED=1  LQER_ASYM_GROUP=64
PHASED_TTT_PREFIX_DOCS=2000  PHASED_TTT_NUM_PHASES=3  TTT_WARM_START_A=1
SPARSE_ATTN_GATE_ENABLED=1  FUSED_CE_ENABLED=1  NCCL_NET=Socket

Credits

Hardware

8×H100 SXM 80GB, PyTorch 2.9+, Docker: matotezitanka/proteus-pytorch:community

Reproduces PR openai#1934's exact recipe (pergroup lrzip compression,
EMBED_WD=0.06, tightened clip sigmas) with GPTQ_RESERVE_SECONDS=5.5
to ensure GPTQ hessians complete within the 600s training budget.

Results (3-seed mean: 1.06003, std: 0.000385):
- Seed 42: 1.05987 (4962 steps, artifact 15,971,933 B)
- Seed 314: 1.05975 (4952 steps, artifact 15,970,997 B)
- Seed 999: 1.06047 (4954 steps, artifact 15,974,305 B)

Compliance: train_loop + hessians = 598.2s max (< 600s)
Delta vs PR openai#1934: +0.00010 BPB (negligible, within noise)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant