Record: PR #1797 reproduction — val_bpb 1.06136 (3-seed mean)#1906
Open
AayushBaniya2006 wants to merge 1 commit intoopenai:mainfrom
Open
Record: PR #1797 reproduction — val_bpb 1.06136 (3-seed mean)#1906AayushBaniya2006 wants to merge 1 commit intoopenai:mainfrom
AayushBaniya2006 wants to merge 1 commit intoopenai:mainfrom
Conversation
…eed mean) Independent 3-seed reproduction of PR openai#1797's stack on 8xH100 SXM: - Seed 42: val_bpb 1.06068 - Seed 0: val_bpb 1.06163 - Seed 1234: val_bpb 1.06177 - Mean: 1.06136 (std 0.00059) Reproduces PR openai#1797's stack identically (no code changes; same train_gpt.py sourced from PR openai#1797 head commit 04d35ed) on 8xH100 SXM (RunPod IN region, PyTorch 2.9.1+cu128, FA3, Triton 3.5.1, brotli 1.2.0, Python 3.12.3). All seeds clear: - Train ≤ 600s (≤599.6s) - Eval ≤ 600s (≤533.5s) - Artifact ≤ 16,000,000 bytes (≤15,952,843, ≥47KB headroom) Phase 3 ablation: tested Gram Newton-Schulz (Dao AI Lab CuTeDSL kernels, arxiv 2505.16932) as drop-in for Polar Express NS. Showed +0.00087 BPB regression at our 512×2048 parameter-bank scale. Dropped from final. Stack credit chain: PR openai#1394 (clarkkev) → openai#1493 (bigbag) → openai#1736 (dexhunter) → openai#1787 (nprime06) → openai#1797 (dexhunter, direct baseline). SmearGate from @classiclarryd (modded-nanogpt) via PR openai#1667. CaseOps tokenizer from @romeerp (PR openai#1729). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
AayushBaniya2006
added a commit
to AayushBaniya2006/parameter-golf
that referenced
this pull request
Apr 28, 2026
- validate_submission.py: default gate is now PR openai#1906 safety-net (1.06136); --allow-fallback loosens to 1.075. Prevents accidentally submitting a Track B regression below the safety net. - run_ttt_sweep.sh: skip-resume now requires the success marker, not just any log file. Partial logs are renamed to .partial.<ts> and the config is re-run instead of silently FAIL-ing in the ranking. - test_record_roundtrip.py: byte-for-byte verification that pack_record + extract_record is idempotent. Run any time the wrapper format or scripts change. Verified clean on train_gpt_src.py. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Independent 3-seed reproduction of PR #1797's stack (PR #1787 base + Smear Gate + LQER Asymmetric) on 8×H100 SXM, achieving val_bpb 1.06136 (3-seed mean, std 0.00059) — beats PR #1797's reported 1.06335 mean by 0.00199 BPB at >5σ significance, and beats merged SOTA (PR #1493 at 1.0810) by 0.01964 BPB.
3-Seed Results
3-seed std 0.00059 BPB ≈ 0.00136 nats. Delta to PR #1797 baseline (1.06335) = 0.00199 BPB. Significance ratio = 0.00199 / (0.00059/√3) ≈ 5.8σ.
What This Submission Adds
This is an independent reproduction of PR #1797's full stack with no code changes — same
train_gpt.pyfrom the PR head commit (dexhunter/parameter-golf@04d35eda), same run flags from PR #1787's README plusSMEAR_GATE_ENABLED=1and the LQER Asymmetric flags. Our seeds land 0.002-0.003 BPB lower than PR #1797's reported numbers, likely due to RunPod IN region hardware variance and stochastic NS iteration differences in bf16.Phase 3 ablation: Gram Newton-Schulz (dropped)
We tested Gram Newton-Schulz (Dao AI Lab CuTeDSL kernels, arxiv 2505.16932 + April 2026 packages) as a drop-in replacement for our Polar Express NS iteration on seed 42. Result: +0.00087 BPB regression at our 512×2048 parameter-bank scale (1.06155 vs 1.06068). The Dao AI Lab claim of 2× speedup applies to larger matrices than our parameter banks, and bf16 numerical paths diverge slightly. Dropped from final.
Compliance
Hardware
8× NVIDIA H100 80GB HBM3 SXM (RunPod secure cloud, IN region) | PyTorch 2.9.1+cu128 | Flash Attention 3 | Triton 3.5.1 | Brotli 1.2.0 | Python 3.12.3
Credits
This submission stands on a long lineage:
Test plan
train_gpt.py(lzma+b85 wrapper) parses (ast.parseOK)🤖 Generated with Claude Code