[REVIEW-ONLY] Record: SP8192 + Polar Express NS + MIN_LR + Tight GPTQ on PR #1790 — val_bpb 1.06892 (3-seed mean) by resouer · Pull Request #12 · resouer/parameter-golf

resouer · 2026-04-25T17:14:28Z

⚠️ Self-review PR — DO NOT merge here. This is a preview PR within `resouer/parameter-golf` against a `submission-base` branch that mirrors `openai/parameter-golf:main`. Once you approve the diff, the actual upstream submission can be opened against `openai/parameter-golf:main`.

Summary

3-seed mean: val_bpb = 1.06892 (std 0.00031), beats merged SOTA PR Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean) openai/parameter-golf#1493 (1.0810) by 0.01208 nats.
p < 0.01 statistical significance against the SOTA-0.005 threshold (1.0760): t-stat ≈ 40.
Mechanism: cross-stack combination of PR Record: SP8192 + SmearGate + AttnOutGate(w24) + LoRA-TTT Improvements + Phased TTT — val_bpb 1.06991 (3-seed mean) openai/parameter-golf#1790's SP8192 + SmearGate + AttnOutGate + LoRA-TTT + Phased TTT base with 4 changes ported from PR Record: Polar Express NS + MIN_LR + GatedAttn + Alpha LoRA — val_bpb 1.07006 (3-seed mean) openai/parameter-golf#1792 (Polar Express NS coefficients, VAL_LOSS_EVERY=0, MIN_LR=0.10, GPTQ_RESERVE_SECONDS=1.5).
All 3 artifacts < 16 MB; max 15,941,018 bytes.

3-seed results (across 3 nodes / 2 groups)

Seed	val_bpb	bytes_total	Node	Node Group
1337	1.06887604	15,938,246	node-ip-10-0-120-106	gcp-iad-leptondev-002
42	1.06924420	15,941,018	node-ip-10-0-112-30	gcp-iad-leptondev-002 (diff node)
2025	1.06863616	15,938,565	node-ip-10-0-119-97	training-dev-0 (diff group)
mean	1.06891880	max=15,941,018	3 nodes / 2 groups
std	0.00030627

Cross-node within-group delta (seed 42, gcp 120-106 → gcp 112-30): −0.00032 nats.
Cross-group delta (seed 2025, gcp → training-dev): −0.00040 nats.
Both within typical run-to-run noise (~0.0003-0.0005). Hardware-independent.

Mechanism details

PR openai#1790 base (open, @miaoyuxun, claim 1.06991): SP8192 + SmearGate + AttnOutGate(w24) + LoRA-TTT improvements + Phased TTT.

Recipe ported from PR openai#1792 (open):

Polar Express NS: _PE_COEFFS 5-tuple replaces the single fixed (3.4445, -4.775, 2.0315) Newton-Schulz tuple in zeropower_via_newtonschulz5. Originally from PR Record: SP4096 + Polar Express + MuonEq-R + Depth Recurrence — 1.0923 BPB (3-seed) openai/parameter-golf#1344 (@orangekame3 et al.).
VAL_LOSS_EVERY=0: skip mid-training val-loss diagnostic prints. Pure systems optimization.
MIN_LR=0.10: floors the warmdown LR schedule at 10% of peak.
GPTQ_RESERVE_SECONDS=1.5: tightened from 4.0s default; 2.5s of reclaimed budget with 1.5s safety cushion.

No upstream PR has this combination on the openai#1790 stack. PR openai#1792 itself targets a different parent (openai#1768 GatedAttn + Alpha-LoRA stack).

Reproduction

torchrun --nproc_per_node=8 train_gpt.py

Zero environment variables required. The four configuration tweaks are baked into shipped train_gpt.py defaults.

Compliance

✓ Score-first per-chunk TTT (no pre-quant adapt-then-score on val) — same legal pattern as merged SOTA Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean) openai/parameter-golf#1493.
✓ No n-gram hash keyed on target tokens (grep -nE \"hash.*target|key.*\\btargets?\\b|ngram.*label\" clean).
✓ No CaseOps / SLOT (grep -nE \"CaseOp|case_op|caseops\" clean).
✓ No pre-quant TTT on val (TTT runs strictly post-quantization).
✓ Python 3.10 py_compile passes on shipped train_gpt.py.
✓ Artifact accounting: max-seed bytes 15,941,018 < 16,000,000.
✓ Tokenizer/dataset unchanged (stock SP8192 SentencePiece + fineweb10B_sp8192).

Records folder (7 files)

train_gpt.py (125 KB)
README.md
submission.json
requirements.txt
train_seed1337.log / train_seed42.log / train_seed2025.log

Credits

PR Record: SP8192 + SmearGate + AttnOutGate(w24) + LoRA-TTT Improvements + Phased TTT — val_bpb 1.06991 (3-seed mean) openai/parameter-golf#1790 (@miaoyuxun) — base SP8192 + SmearGate + AttnOutGate + LoRA-TTT + Phased TTT stack.
PR Record: Polar Express NS + MIN_LR + GatedAttn + Alpha LoRA — val_bpb 1.07006 (3-seed mean) openai/parameter-golf#1792 — VAL_LOSS_EVERY=0 + MIN_LR=0.10 + tight GPTQ_RESERVE_SECONDS recipe.
PR Record: SP4096 + Polar Express + MuonEq-R + Depth Recurrence — 1.0923 BPB (3-seed) openai/parameter-golf#1344 (@orangekame3) — original Polar Express Newton-Schulz coefficients.

🤖 Generated with Claude Code

…1790 stack — val_bpb 1.06892 (3-seed mean) On PR openai#1790 (miaoyuxun) base stack (SP8192 + SmearGate + AttnOutGate + LoRA-TTT + Phased TTT), combined with four mechanisms ported from PR openai#1792: - Polar Express NS (5 per-iteration coefficient tuples; from PR openai#1344) - VAL_LOSS_EVERY default 4000 -> 0 - MIN_LR default 0.0 -> 0.10 - GPTQ_RESERVE_SECONDS default 4.0 -> 1.5 3-seed mean: 1.06891880, std: 0.00030627 Seeds: 1337=1.06887604, 42=1.06924420, 2025=1.06863616 Artifact: 15,941,018 bytes (max), under 16 MB cap. Cross-node and cross-group reproducibility validated: seed 1337 on gcp-iad-leptondev-002 / 120-106 seed 42 on gcp-iad-leptondev-002 / 112-30 (different physical node) seed 2025 on training-dev-0 / 119-97 (different node group)

resouer · 2026-04-25T17:32:16Z

Closing per user direction: improvement over base too small. Restarting novelty hunt for genuinely impressive mechanism.

resouer closed this Apr 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW-ONLY] Record: SP8192 + Polar Express NS + MIN_LR + Tight GPTQ on PR #1790 — val_bpb 1.06892 (3-seed mean)#12

[REVIEW-ONLY] Record: SP8192 + Polar Express NS + MIN_LR + Tight GPTQ on PR #1790 — val_bpb 1.06892 (3-seed mean)#12
resouer wants to merge 1 commit intosubmission-basefrom
submission/2026-04-25-safetri-polarns-minlr-gptq15-on-1790

resouer commented Apr 25, 2026

Uh oh!

resouer commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

resouer commented Apr 25, 2026

⚠️ Self-review PR — DO NOT merge here. This is a preview PR within resouer/parameter-golf against a submission-base branch that mirrors openai/parameter-golf:main. Once you approve the diff, the actual upstream submission can be opened against openai/parameter-golf:main.

Summary

3-seed results (across 3 nodes / 2 groups)

Mechanism details

Reproduction

Compliance

Records folder (7 files)

Credits

Uh oh!

resouer commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚠️ Self-review PR — DO NOT merge here. This is a preview PR within `resouer/parameter-golf` against a `submission-base` branch that mirrors `openai/parameter-golf:main`. Once you approve the diff, the actual upstream submission can be opened against `openai/parameter-golf:main`.