Record: Compliant PR #1934 Reproduction (GPTQ_RESERVE=5.5) — val_bpb 1.06003 (3-seed)#1950
Open
Christopher-Lee-McClendon wants to merge 1 commit intoopenai:mainfrom
Conversation
Reproduces PR openai#1934's exact recipe (pergroup lrzip compression, EMBED_WD=0.06, tightened clip sigmas) with GPTQ_RESERVE_SECONDS=5.5 to ensure GPTQ hessians complete within the 600s training budget. Results (3-seed mean: 1.06003, std: 0.000385): - Seed 42: 1.05987 (4962 steps, artifact 15,971,933 B) - Seed 314: 1.05975 (4952 steps, artifact 15,970,997 B) - Seed 999: 1.06047 (4954 steps, artifact 15,974,305 B) Compliance: train_loop + hessians = 598.2s max (< 600s) Delta vs PR openai#1934: +0.00010 BPB (negligible, within noise) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Record: Compliant PR #1934 Reproduction — val_bpb 1.06003 (3-seed mean)
Summary
Compliance audit reproduction of PR #1934's exact recipe with
GPTQ_RESERVE_SECONDS=5.5(vs #1934's0.5) to ensure GPTQ hessian collection completes within the 600s training budget.3-seed mean val_bpb: 1.06003 (std: 0.000385)
Results
Compliance
Timing breakdown (typical seed):
Comparison to PR #1934
The BPB delta of +0.00010 is well within 1σ (std=0.000385), confirming that reserving adequate time for GPTQ hessians does not meaningfully degrade performance.
Architecture
11L 512d 8H/4KV transformer with U-Net skips, parallel residuals (start layer 8), partial RoPE, depth recurrence (loop layers 3–5, NUM_LOOPS=2), CaseOps SP8192, LQER asymmetric INT6+INT7 embed, per-group lrzip compression, SmearGate (window 12), sparse attention gate, fused CE, phased TTT (3 phases, score-first, prefix 2000 docs).
Key Environment Variables
Credits
Hardware
8×H100 SXM 80GB, PyTorch 2.9+, Docker:
matotezitanka/proteus-pytorch:community