[REVIEW-ONLY] Record: SP8192 + Polar Express NS + MIN_LR + Tight GPTQ on PR #1790 — val_bpb 1.06892 (3-seed mean)#12
Closed
resouer wants to merge 1 commit intosubmission-basefrom
Conversation
…1790 stack — val_bpb 1.06892 (3-seed mean) On PR openai#1790 (miaoyuxun) base stack (SP8192 + SmearGate + AttnOutGate + LoRA-TTT + Phased TTT), combined with four mechanisms ported from PR openai#1792: - Polar Express NS (5 per-iteration coefficient tuples; from PR openai#1344) - VAL_LOSS_EVERY default 4000 -> 0 - MIN_LR default 0.0 -> 0.10 - GPTQ_RESERVE_SECONDS default 4.0 -> 1.5 3-seed mean: 1.06891880, std: 0.00030627 Seeds: 1337=1.06887604, 42=1.06924420, 2025=1.06863616 Artifact: 15,941,018 bytes (max), under 16 MB cap. Cross-node and cross-group reproducibility validated: seed 1337 on gcp-iad-leptondev-002 / 120-106 seed 42 on gcp-iad-leptondev-002 / 112-30 (different physical node) seed 2025 on training-dev-0 / 119-97 (different node group)
Owner
Author
|
Closing per user direction: improvement over base too small. Restarting novelty hunt for genuinely impressive mechanism. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
resouer/parameter-golfagainst asubmission-basebranch that mirrorsopenai/parameter-golf:main. Once you approve the diff, the actual upstream submission can be opened againstopenai/parameter-golf:main.Summary
VAL_LOSS_EVERY=0,MIN_LR=0.10,GPTQ_RESERVE_SECONDS=1.5).3-seed results (across 3 nodes / 2 groups)
Cross-node within-group delta (seed 42, gcp 120-106 → gcp 112-30): −0.00032 nats.
Cross-group delta (seed 2025, gcp → training-dev): −0.00040 nats.
Both within typical run-to-run noise (~0.0003-0.0005). Hardware-independent.
Mechanism details
PR openai#1790 base (open, @miaoyuxun, claim 1.06991): SP8192 + SmearGate + AttnOutGate(w24) + LoRA-TTT improvements + Phased TTT.
Recipe ported from PR openai#1792 (open):
_PE_COEFFS5-tuple replaces the single fixed(3.4445, -4.775, 2.0315)Newton-Schulz tuple inzeropower_via_newtonschulz5. Originally from PR Record: SP4096 + Polar Express + MuonEq-R + Depth Recurrence — 1.0923 BPB (3-seed) openai/parameter-golf#1344 (@orangekame3 et al.).VAL_LOSS_EVERY=0: skip mid-training val-loss diagnostic prints. Pure systems optimization.MIN_LR=0.10: floors the warmdown LR schedule at 10% of peak.GPTQ_RESERVE_SECONDS=1.5: tightened from 4.0s default; 2.5s of reclaimed budget with 1.5s safety cushion.No upstream PR has this combination on the openai#1790 stack. PR openai#1792 itself targets a different parent (openai#1768 GatedAttn + Alpha-LoRA stack).
Reproduction
Zero environment variables required. The four configuration tweaks are baked into shipped
train_gpt.pydefaults.Compliance
grep -nE \"hash.*target|key.*\\btargets?\\b|ngram.*label\"clean).grep -nE \"CaseOp|case_op|caseops\"clean).py_compilepasses on shippedtrain_gpt.py.Records folder (7 files)
train_gpt.py(125 KB)README.mdsubmission.jsonrequirements.txttrain_seed1337.log/train_seed42.log/train_seed2025.logCredits
VAL_LOSS_EVERY=0+MIN_LR=0.10+ tightGPTQ_RESERVE_SECONDSrecipe.🤖 Generated with Claude Code