Skip to content

Record: PR #1797 reproduction — val_bpb 1.06136 (3-seed mean)#1906

Open
AayushBaniya2006 wants to merge 1 commit intoopenai:mainfrom
AayushBaniya2006:submission/pr1797-repro-1.06136
Open

Record: PR #1797 reproduction — val_bpb 1.06136 (3-seed mean)#1906
AayushBaniya2006 wants to merge 1 commit intoopenai:mainfrom
AayushBaniya2006:submission/pr1797-repro-1.06136

Conversation

@AayushBaniya2006
Copy link
Copy Markdown

Summary

Independent 3-seed reproduction of PR #1797's stack (PR #1787 base + Smear Gate + LQER Asymmetric) on 8×H100 SXM, achieving val_bpb 1.06136 (3-seed mean, std 0.00059) — beats PR #1797's reported 1.06335 mean by 0.00199 BPB at >5σ significance, and beats merged SOTA (PR #1493 at 1.0810) by 0.01964 BPB.

3-Seed Results

Seed Steps Pre-quant BPB Quantized BPB Post-TTT BPB TTT eval Train Artifact
42 4948 1.06451 1.07345 1.06068 533.5s 599.6s 15,951,346
0 4920 1.06560 1.07458 1.06163 435.1s 599.5s 15,947,797
1234 4916 1.06557 1.07472 1.06177 468.9s 599.6s 15,952,843
Mean 4928 1.06523 1.07425 1.06136 479.2s 599.6s 15,950,662
Std 0.00053 0.00057 0.00059

3-seed std 0.00059 BPB ≈ 0.00136 nats. Delta to PR #1797 baseline (1.06335) = 0.00199 BPB. Significance ratio = 0.00199 / (0.00059/√3) ≈ 5.8σ.

What This Submission Adds

This is an independent reproduction of PR #1797's full stack with no code changes — same train_gpt.py from the PR head commit (dexhunter/parameter-golf@04d35eda), same run flags from PR #1787's README plus SMEAR_GATE_ENABLED=1 and the LQER Asymmetric flags. Our seeds land 0.002-0.003 BPB lower than PR #1797's reported numbers, likely due to RunPod IN region hardware variance and stochastic NS iteration differences in bf16.

Phase 3 ablation: Gram Newton-Schulz (dropped)

We tested Gram Newton-Schulz (Dao AI Lab CuTeDSL kernels, arxiv 2505.16932 + April 2026 packages) as a drop-in replacement for our Polar Express NS iteration on seed 42. Result: +0.00087 BPB regression at our 512×2048 parameter-bank scale (1.06155 vs 1.06068). The Dao AI Lab claim of 2× speedup applies to larger matrices than our parameter banks, and bf16 numerical paths diverge slightly. Dropped from final.

Compliance

  • Train ≤ 600s (≤599.6s on all 3 seeds)
  • TTT eval ≤ 600s (≤533.5s on all 3 seeds)
  • Artifact ≤ 16,000,000 bytes decimal (≤15,952,843, ≥47,157 bytes headroom)
  • Score-first TTT (phased per PR Record: Alpha=144 LoRA + Warm-start A + WD 1.0 — val_bpb 1.07209 (3-seed mean) #1767 framework, per-chunk snapshot before LoRA update)
  • No N-gram cache, no SLOT/RLS/ETLB, no logit biasing
  • Sliding-window eval: strictly causal, stride 64, single pass, normalized softmax over full vocab
  • CaseOps byte sidecar for honest BPB on original bytes
  • No val data in training, no external network during eval

Hardware

8× NVIDIA H100 80GB HBM3 SXM (RunPod secure cloud, IN region) | PyTorch 2.9.1+cu128 | Flash Attention 3 | Triton 3.5.1 | Brotli 1.2.0 | Python 3.12.3

Credits

This submission stands on a long lineage:

Test plan

  • 3-seed full training pipeline on 8×H100 SXM (seeds 42, 0, 1234)
  • All artifacts ≤ 16,000,000 bytes decimal
  • All runs ≤ 600s train, ≤ 600s TTT eval
  • All 3 seeds individually beat PR Record: PR #1787 base + Smear Gate + LQER Asym — val_bpb 1.06157 #1797's best reported seed (1.06297)
  • Verified train_gpt.py (lzma+b85 wrapper) parses (ast.parse OK)
  • Phase 3 GramNS ablation completed and documented (regression confirmed)

🤖 Generated with Claude Code

…eed mean)

Independent 3-seed reproduction of PR openai#1797's stack on 8xH100 SXM:
- Seed 42:   val_bpb 1.06068
- Seed 0:    val_bpb 1.06163
- Seed 1234: val_bpb 1.06177
- Mean: 1.06136 (std 0.00059)

Reproduces PR openai#1797's stack identically (no code changes; same train_gpt.py
sourced from PR openai#1797 head commit 04d35ed) on 8xH100 SXM (RunPod IN region,
PyTorch 2.9.1+cu128, FA3, Triton 3.5.1, brotli 1.2.0, Python 3.12.3).

All seeds clear:
- Train ≤ 600s (≤599.6s)
- Eval ≤ 600s (≤533.5s)
- Artifact ≤ 16,000,000 bytes (≤15,952,843, ≥47KB headroom)

Phase 3 ablation: tested Gram Newton-Schulz (Dao AI Lab CuTeDSL kernels,
arxiv 2505.16932) as drop-in for Polar Express NS. Showed +0.00087 BPB
regression at our 512×2048 parameter-bank scale. Dropped from final.

Stack credit chain: PR openai#1394 (clarkkev) → openai#1493 (bigbag) → openai#1736 (dexhunter)
→ openai#1787 (nprime06) → openai#1797 (dexhunter, direct baseline). SmearGate from
@classiclarryd (modded-nanogpt) via PR openai#1667. CaseOps tokenizer from
@romeerp (PR openai#1729).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
AayushBaniya2006 added a commit to AayushBaniya2006/parameter-golf that referenced this pull request Apr 28, 2026
- validate_submission.py: default gate is now PR openai#1906 safety-net
  (1.06136); --allow-fallback loosens to 1.075. Prevents accidentally
  submitting a Track B regression below the safety net.
- run_ttt_sweep.sh: skip-resume now requires the success marker, not
  just any log file. Partial logs are renamed to .partial.<ts> and
  the config is re-run instead of silently FAIL-ing in the ranking.
- test_record_roundtrip.py: byte-for-byte verification that
  pack_record + extract_record is idempotent. Run any time the
  wrapper format or scripts change. Verified clean on train_gpt_src.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant