Skip to content

Record: SP4096 + Polar Express NS + MuonEq-R + WD=0.090 — 1.0959 BPB (3-seed mean)#1332

Closed
Omrigotlieb wants to merge 1 commit intoopenai:mainfrom
Omrigotlieb:v2-submission
Closed

Record: SP4096 + Polar Express NS + MuonEq-R + WD=0.090 — 1.0959 BPB (3-seed mean)#1332
Omrigotlieb wants to merge 1 commit intoopenai:mainfrom
Omrigotlieb:v2-submission

Conversation

@Omrigotlieb
Copy link
Copy Markdown

Summary

  • val_bpb: 1.0959 (3-seed mean, std 0.0003) — beats SOTA (1.1147) by 0.0188 BPB
  • Artifact: 15.97 MB (under 16,000,000 bytes)
  • Clean submission — no SLOT, no TTT
  • 8×H100 SXM, PyTorch 2.9.1+cu128, 600s training

Results

Seed Sliding BPB Steps Artifact
1337 1.0961 5,910 15,974,826
42 1.0959 5,919 15,977,408
2025 1.0956 5,915 15,969,915
Mean 1.0959 ±0.0003

Key Innovations (on clarkkev PR #1218 SP4096 base)

  1. Polar Express Newton-Schulz (arXiv:2505.16932) — per-iteration minimax polynomials, 4 steps instead of 5
  2. MuonEq-R — row-normalize gradient before NS orthogonalization
  3. WD=0.090 — higher weight decay for quantization-friendly compression (dexhunter insight)
  4. XSA all 11 layers — zero new parameters

Run Command

SEED=1337 torchrun --standalone --nproc_per_node=8 train_gpt.py

Test plan

  • 3-seed validation (1337, 42, 2025)
  • All artifacts under 16,000,000 bytes
  • Script compiles and runs from records folder
  • No SLOT, no TTT — fully clean
  • Statistical significance: gap=0.0188, z=62.7 (p << 0.01)

🤖 Generated with Claude Code

@Omrigotlieb Omrigotlieb closed this Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant