Skip to content

Record: Improved Parallel Residuals + Systems Optimization — val_bpb 1.0752 (3-seed mean)#1584

Open
codemath3000 wants to merge 1 commit intoopenai:mainfrom
codemath3000:submission/systems-opt-parallel-residuals
Open

Record: Improved Parallel Residuals + Systems Optimization — val_bpb 1.0752 (3-seed mean)#1584
codemath3000 wants to merge 1 commit intoopenai:mainfrom
codemath3000:submission/systems-opt-parallel-residuals

Conversation

@codemath3000
Copy link
Copy Markdown
Contributor

@codemath3000 codemath3000 commented Apr 13, 2026

Summary

Submission series: This PR is one of three related submissions applying the same systems optimizations to different base stacks (PR #1493, PR #1529, PR #1578). We submit against multiple bases so that a ready-to-merge option exists regardless of how the pending PRs are resolved. Judges should feel free to evaluate whichever base(s) they consider valid and disregard the rest.

Results

Seed TTT BPB Artifact
1337 1.0745 15,983,819
2024 1.0755 15,982,374
42 1.0755 15,979,637
Mean 1.0752 15,981,943

CUTLASS EVT Build

Required for full throughput. Source included in cutlass_evt_fusion/:

git clone https://github.com/NVIDIA/cutlass.git /opt/cutlass
cd /opt/cutlass && git checkout 08185b9c3e90510ee2b656662ed0d53b06d28157
pip install --no-build-isolation ./cutlass_evt_fusion

Test plan

  • 3-seed training on 8xH100 SXM (seeds 1337, 2024, 42)
  • All artifacts under 16MB
  • All runs under 600s training + 600s eval
  • Round-trip quantization + TTT verified
  • Judges verify reproducibility

🤖 Generated with Claude Code

…1.0752

Systems-level optimizations (fused Muon, EMA foreach, loader prealloc)
on PR openai#1529's dual-lane parallel residual architecture. Identical ML;
faster step time yields extra training steps. 3-seed mean: 1.0752 BPB
/ 2.7773 nats.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 13, 2026
…ai#1586 per-layer GPTQ highest-EV

- PR openai#758 n-gram effectively dead: MatoTeziTanka (Apr 12) flagged XOR hash
  includes target token, same illegality as openai#727/openai#741
- GDN-Hybrid BPB bug confirmed: PR openai#1576 space-token double-count inflates
  denominator ~14%; actual score ~1.16-1.18, not 1.01671
- PR openai#1586 (dexhunter, 1.07493): Per-Layer Adaptive GPTQ MLP=12σ/Attn=13σ +
  int7 Emb (saves 530KB) + MLR=0.026; -0.0127 nats vs SOTA; implement now
- PR openai#1584: systems-only (fused Muon, batched EMA, loader prealloc) ~+20 steps
- Casefold Tokenizer (openai#1578/openai#1585): legality debated; await organizer ruling
- New paper: arXiv:2604.06169 In-Place TTT (Apr 7) NTP-aligned score-first TTT
- Merged SOTA 1.0810 unchanged (4-day stable streak); target ≤1.0760; 17 days

https://claude.ai/code/session_01BE8wc8zxvZAo52QBXSNiL8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant