Record: PR #1945 base + 2560 long-context + no_qv TTT mask + TTT LR 0.75 + QK_GAIN 5.25 — val_bpb 1.05855 (3-seed mean)#1953
Open
andrewbaggio1 wants to merge 2 commits intoopenai:mainfrom
Conversation
…T LR 0.75 + QK_GAIN 5.25 3-seed mean val_bpb 1.05855370 (std 0.00029539). Clears merge bar 1.05893 by -0.00038 BPB. Improves on PR openai#1855 (1.06108) by -0.00253 BPB. All seeds under 16 MB artifact, 600s train cap, 600s eval cap.
Contributor
|
i played around with long-context during eval time earlier, but tried some more complicated things involving dynamic selection that never made it. it's cool to see that just changing the hyperparam for eval seq length was able to improve performance without seeing it in training. really nice result there |
TanishGudise
added a commit
to TanishGudise/parameter-golf
that referenced
this pull request
Apr 30, 2026
…enai#1953 territory. ARTIFACT 142KB OVER CAP.
Fija
pushed a commit
to Fija/parameter-golf
that referenced
this pull request
Apr 30, 2026
- Pull PR openai#1953's record dir (train_gpt.py + README) from openai/parameter-golf - Phase U script combines retokenize SP8192 (no HF dataset cached) + 3-seed train using PR openai#1953's exact stack with one new lever: PREFIX_DOCS 2500 -> 2800 - Goal: clear record bar (1.05914) by stacking on top of openai#1953's 1.05855 - Robust: heartbeat, GPU keepalive, no trap-on-EXIT, per-seed HF upload, apt-installs lrzip for pergroup compressor - Aborts seed-1 on BPB > 1.060 OR artifact > 16MB Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
6 tasks
aerosta
pushed a commit
to aerosta/parameter-golf
that referenced
this pull request
Apr 30, 2026
3-seed mean val_bpb = 1.05851479 (std 0.000762, seeds 42/0/1234) on track_10min_16mb. Stack: - PR openai#1945 (alertcat) V21 base = PR openai#1908 + AWQ-Lite + AsymLogit Rescale - PR openai#1953 (andrewbaggio1) TTT/QK env knobs (TTT_LR=0.75, QK_GAIN=5.25, no_qv mask) - PR openai#1948 (TimS-ml + lijuncheng16) LeakyReLU squared slope 0.3 - PR openai#1145 (AnirudhRahul, valerio-endorsed) closed-form n-gram tilt with Σ P=1 Z renormalization Compliance: causal hints, single-pass, Σ P=1 by construction, no SLOT, no n-gram cache, no Pre-Quant TTT. System deps: gcc + lrzip auto-installed by setup.sh; PyTorch 2.9.1 + Triton + Flash Attn 3. One-command reproduction: bash setup.sh SEED={42,0,1234} bash run.sh
leon2k2k2k
added a commit
to leon2k2k2k/parameter-golf
that referenced
this pull request
Apr 30, 2026
Hypothesis: PR openai#1953 verbatim + LeakyReLU2 slope 0.5->0.3 lands at ~1.0578 (3-seed). Skip 2xH100 mini (5-site numeric flip on validated commit). Ladder = 8xH100 official direct, 3 seeds (42, 0, 1234). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Fija
pushed a commit
to Fija/parameter-golf
that referenced
this pull request
Apr 30, 2026
…d-attempt arms Phase V: TTT_OPTIMIZER=muon for LoRA SGD, swap from default AdamW. New env vars TTT_MUON_LR_MULT (default 8x adam) and TTT_MUON_BACKEND_STEPS (default 5). Hypothesis: Newton-Schulz orthogonalized momentum better suits low-rank LoRA. Phase W: TTT_LORA_RANK 80->96, PHASED_TTT_NUM_PHASES 3->4, PHASED_TTT_PREFIX_DOCS 2500->2000 on PR openai#1953 stack. Hypothesis: more LoRA capacity + extra phase boundary captures more cross-doc structure. Artifact-cap risk noted; seed-1 abort guards in script. Both phases run openai#1953 base + their respective lever changes only. Robust pattern (heartbeat, GPU keepalive, no trap-on-EXIT, per-seed HF upload) preserved from Phase U / Phase S debugging. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
leon2k2k2k
added a commit
to leon2k2k2k/parameter-golf
that referenced
this pull request
Apr 30, 2026
250D: PHASED_TTT_PREFIX_DOCS 2500 -> 3000 (~+80s, 6s cap margin — risky) 250E: TTT_LOCAL_LR_MULT 0.75 -> 0.65 (compute-neutral, sub-step around openai#1953 optimum) Both eval-only via TTT_EVAL_ONLY=1 + RESUME_FROM_CKPT on spec 250's final_model.pt. ~9-15 USD per spec for 3 seeds. No code change, no retrain. Family complete: 250B (PREFIX 2750), 250C (PHASES 4), 250D (PREFIX 3000), 250E (LR_MULT 0.65). Independent eval-only sweeps; do not stack.
alertcat
added a commit
to alertcat/parameter-golf
that referenced
this pull request
Apr 30, 2026
Final attempt to overtake PR openai#1953 (1.05855) and PR openai#1967 (1.05851). Stack: - V21 base (PR openai#1908 + AWQ-lite + AsymLogit) — your existing record - + PR openai#1953's 7 verified levers (EVAL=2560, no_qv, TTT_LR_MULT=0.75, QK_GAIN=5.25) - + EVAL_SEQ_LEN=2816 (intermediate safe value, ~5% eval timing risk) - All other hparams identical to V21 Safety: EVAL_SEQ_LEN=2816 vs PR openai#1953's 2560 = ~10% eval time penalty. Expected eval times: 470s/485s/564s (PR openai#1953 was 430/441/513). Seed 1234 has thinnest margin (564s of 600s cap = 36s buffer). Expected V22 BPB: 1.0578-1.0586 (3-seed mean) P(beat PR openai#1953 1.05855): ~50% P(beat PR openai#1967 1.05851): ~30-35% (timing-pending PR ahead)
alertcat
added a commit
to alertcat/parameter-golf
that referenced
this pull request
Apr 30, 2026
…mean 1.05877 Layers PR openai#1953 (@andrewbaggio1)'s 7 hparam levers (TTT_MASK=no_qv, TTT_Q_LORA=0, TTT_V_LORA=0, TTT_LOCAL_LR_MULT=0.75, QK_GAIN_INIT=5.25, EVAL_SEQ_LEN, TTT_EVAL_SEQ_LEN) on top of V21 v2 base (PR openai#1908 + AWQ-lite + Asymmetric Logit Rescale + WD=2.0). EVAL_SEQ_LEN raised from PR openai#1953's 2560 to 2816 for longer eval context. 3-seed mean 1.05877 (std 0.00102), all strict <600s train wallclock (596.087-596.152s) and 475-522s eval. Improvement over V21 v2 mean 1.05943 is -0.00066 BPB (matches community 0.0006 floor for meaningful delta). Run on Hyperbolic eu-north-4 Iceland VM (8xH100 SXM5 80GB, PyTorch 2.9.1+cu128 with CUDA 13 forward-compat driver 580). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
12 tasks
aquariouseworkman
added a commit
to aquariouseworkman/parameter-golf
that referenced
this pull request
Apr 30, 2026
# Record: PR openai#1953 stack — no_qv TTT + AWQ-lite + AsymLogit + long-context eval **val_bpb = 1.05847** (3-seed mean, std 0.00063) | **max artifact 15,985,934 bytes** | 8x H100 SXM | strict 600s train + eval ## Results | Seed | Stop step | Train time | Pre-quant BPB | Quantized BPB | **Post-TTT BPB** | Eval time | Artifact bytes | |------|-----------|------------|---------------|---------------|------------------|-----------|----------------| | 42 | 4892 | 595.97s ✅ | 1.06126 | 1.06962 | **1.05788** | 493.2s ✅ | 15,979,342 | | 0 | 4884 | 595.97s ✅ | 1.06181 | 1.07019 | **1.05840** | 420.5s ✅ | 15,979,187 | | 1234 | 4894 | 596.14s ✅ | 1.06232 | 1.07093 | **1.05914** | 428.4s ✅ | 15,985,934 | | **Mean** | **4890** | **596.03s** | **1.06180** | **1.07025** | **1.05847** | **447.4s** | **15,981,488** | vs merged PR openai#1855 (1.06108): **-0.00261 BPB / -0.00571 nats** ## Stack Inherits the full PR openai#1855 base (codemath3000) and layers: 1. **AWQ-lite mixed-precision GPTQ** (PR openai#1908, romeerp) — activation-aware salient-group int8 promotion 2. **Asymmetric Logit Rescale** (PR openai#1923, jorge-asenjo) — learnable pos/neg softcap during TTT eval 3. **no_qv TTT mask** (PR openai#1953, himanshudongre) — disable Q/V LoRA in TTT, keep K/MLP/O 4. **TTT_LOCAL_LR_MULT=0.75** — scaled TTT optimizer LR 5. **QK_GAIN_INIT=5.25** — per-head Q-gain initialization 6. **EVAL_SEQ_LEN=2560** — extended eval context 7. **PHASED_TTT_PREFIX_DOCS=3000** — larger global-TTT prefix 8. **TTT_LORA_RANK=56** — reduced LoRA rank (compute reallocation) ## Compliance - [x] Artifact under 16,000,000 bytes (max 15,985,934) - [x] Train wallclock under 600s (max 596.14s) - [x] Eval wallclock under 600s (max 493.2s) - [x] No PPM, no SLOT, no pre-quant TTT, no n-gram cache - [x] Single left-to-right pass, score-before-update - [x] Full normalized softmax distribution ## Reproduction ```bash apt-get install -y lrzip pip install sentencepiece brotli huggingface_hub numpy python-minifier pip install --no-deps flash_attn_3 --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291/ # Dataset HF_HUB_ENABLE_HF_TRANSFER=1 python3 -c " from huggingface_hub import snapshot_download snapshot_download(repo_id='romeerp/parameter-golf-caseops-v1', repo_type='dataset', local_dir='/workspace/caseops_data') " # Run for SEED in 42 0 1234; do SEED=$SEED \ DATA_PATH=/workspace/caseops_data/datasets/datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved \ TOKENIZER_PATH=/workspace/caseops_data/datasets/tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model \ CASEOPS_ENABLED=1 VOCAB_SIZE=8192 MAX_WALLCLOCK_SECONDS=600 VAL_LOSS_EVERY=0 \ SMEAR_GATE_ENABLED=1 GATE_WINDOW=12 SPARSE_ATTN_GATE_ENABLED=1 SPARSE_ATTN_GATE_SCALE=0.5 \ GATED_ATTN_QUANT_GATE=1 FUSED_CE_ENABLED=1 QK_GAIN_INIT=5.25 \ EMBED_BITS=7 MATRIX_CLIP_SIGMAS=12.85 ATTN_CLIP_SIGMAS=13.0 MLP_CLIP_SIGMAS=11.5 EMBED_CLIP_SIGMAS=14.0 \ GPTQ_RESERVE_SECONDS=4.0 GPTQ_CALIBRATION_BATCHES=16 COMPRESSOR=pergroup \ LQER_ENABLED=1 LQER_ASYM_ENABLED=1 LQER_RANK=4 LQER_FACTOR_BITS=4 LQER_ASYM_GROUP=64 LQER_TOP_K=3 \ AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64 \ ASYM_LOGIT_RESCALE=1 \ TTT_ENABLED=1 PHASED_TTT_ENABLED=1 PHASED_TTT_NUM_PHASES=3 PHASED_TTT_PREFIX_DOCS=3000 \ TTT_LORA_RANK=56 TTT_MASK=no_qv TTT_Q_LORA=0 TTT_V_LORA=0 TTT_LOCAL_LR_MULT=0.75 \ TTT_CHUNK_SIZE=48 TTT_BETA2=0.99 TTT_WEIGHT_DECAY=0.5 \ EVAL_SEQ_LEN=2560 TTT_EVAL_SEQ_LEN=2560 \ WARMDOWN_FRAC=0.85 BETA2=0.99 GRAD_CLIP_NORM=0.3 MIN_LR=0.1 MATRIX_LR=0.026 \ NCCL_NET=Socket GLOBAL_TTT_MOMENTUM=0.9 \ torchrun --standalone --nproc_per_node=8 train_gpt.py > train_seed${SEED}.log 2>&1 done ```
6 tasks
Fija
pushed a commit
to Fija/parameter-golf
that referenced
this pull request
Apr 30, 2026
Pull PR openai#2014's record dir from openai/parameter-golf and reproduce its 1.05759 3-seed mean. Key new levers vs openai#1953: EVAL_SEQ_LEN=3072, train_seq_schedule 1024->2048->3072, single-phase TTT (NUM_PHASES=1, PREFIX=2500), short-doc score-first chunking (TTT_SHORT_SCORE_FIRST_STEPS=256:8,2000:24). Even with our infra's ~1.5-2 milli-BPB inflation pattern, reproducing openai#2014 should land ~1.0590 — close enough to record bar to potentially clear it. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Open
8 tasks
12 tasks
Fija
pushed a commit
to Fija/parameter-golf
that referenced
this pull request
May 1, 2026
…ean 1.05831 BPB Clears record bar (1.05914) by 0.83 milli-BPB. Welch t = -6.49 vs PR openai#1855 (1.06108), p < 0.0001. All 3 seeds produce 15.99 MB artifacts under the 16 MB cap, all under the 600s wallclock budget. Per-seed: - 42: ttt=1.05793 art=15,986,149 eval=572.6s - 314: ttt=1.05852 art=15,987,257 eval=553.7s - 1234: ttt=1.05849 art=15,989,895 eval=574.1s Submission directory at records/track_10min_16mb/2026-04-30_PR2014_Reproduction_1.0583/ contains PR openai#2014's verbatim train_gpt.py + tokenizer + our seed_results.csv + a detailed README documenting the lineage (openai#1797 -> openai#1851 -> openai#1855 -> openai#1908 -> openai#1923 -> openai#1953 -> openai#2014), the new levers vs each parent, and the full 4-condition C1-C4 legality check. submission.json author/github_id are placeholders pending the user's choice of submitting account. Reproduction script: runpod/phase_x_pr2014.sh — runs end-to-end on a single 8xH100 SXM pod (~2.5h wall, ~$66 cost). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This was referenced May 1, 2026
leon2k2k2k
added a commit
to leon2k2k2k/parameter-golf
that referenced
this pull request
May 1, 2026
Audits every CaseOps-lineage record-track PR (merged + unmerged) since 2026-04-18 for whether val docs are also in the training set. Working set: 34 PRs (31 from chronological seed list + 3 discovered ancestors: openai#1908, openai#1923, openai#2007). Boundary nodes openai#1493 / openai#1626 (pre-CaseOps). Verdicts: - CLEAN (8): openai#1729, openai#1851, openai#1868, openai#1908, openai#2019, openai#2027, openai#2031, openai#2068 - LEAK (25): openai#1736 (our research baseline) → openai#1769 → openai#1787 → openai#1797 → openai#1855 → V21 family (openai#1945, openai#1923, openai#1953, openai#1967) → openai#2018 → openai#2118 (current claimed frontier 1.04350), plus siblings. - INHERIT (1): openai#2050 (eval-only on frozen openai#1915) Code-level evidence (not README claims): - Every shipped prepare_caseops_data.py is byte-identical: SHARD_TOKENS=10_000_000, default=10_000 for --val-docs - NO PR overrides --val-docs (searched all .sh files in all 34 PRs) - cached_challenge_fineweb.py downloads from romeerp/parameter-golf-caseops-v1 HF dataset whose manifest pins docs_val=50000, docs_train=8181945, sums match → CLEAN by construction - PR openai#2018's DATASET_AUDIT.md is gold-standard explicit leak description - PR openai#2118's submission.json admits "--val-docs=10000 train shards + 50k val eval" Three signposts: - Leak introduced: PR openai#1736 by @dexhunter (Apr 19) — first prepare_caseops_data.py default invocation - Leak fixed: PR openai#1851 by @aquariouseworkman (Apr 27) — switched to HF dataset - Leak re-introduced: PR openai#1855 by @codemath3000 (same day) — rebuilt locally The merged-leaderboard SOTA (openai#1851/openai#1868 at 1.06128/1.06141) is CLEAN. The unmerged frontier (openai#2118 at 1.04350) is LEAK. The 0.018 bpb gap is inflated by val memorization; spec 301 was designed to measure how much remains under clean data. Files: caseops-memory-leakage/README.md — overview, methodology, takeaways caseops-memory-leakage/verdicts.md — 34-row master table with evidence caseops-memory-leakage/family-tree.md — ASCII trees with [C]/[L] annotations
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
leon2k2k2k
added a commit
to leon2k2k2k/parameter-golf
that referenced
this pull request
May 1, 2026
…ence After user feedback that LEAK calls relied too heavily on lineage-inheritance and path heuristics, applied stricter criterion: a LEAK verdict requires at least one of (a) explicit shell-script invocation of prepare_caseops_data.py without --val-docs=50000, (b) README "Data setup" matching actual train log path, (c) audit/submission.json admission text, (d) train log path with `_caseops/datasets/datasets/<name>` triple-nesting OR single `<root>/datasets/<name>` (which only local prep produces; HF always gives double-nesting). Records that previously got LEAK by lineage-inheritance alone are now AMBIGUOUS unless they meet at least one of those tests. Changes: - openai#1945 LEAK → CLEAN (finalize_v18.sh has snapshot_download from HF; actual run path matches HF target; README's prepare_caseops_data.py section is stale documentation) - openai#1953 LEAK → AMBIGUOUS (PR ships only train_gpt.py + logs; no prep evidence; path matches HF target; parent openai#1945 confirmed CLEAN — leans CLEAN but no direct PR evidence) - openai#2041 LEAK → AMBIGUOUS (no prep invocation; double-nested path consistent with EITHER HF or local prep) - openai#2075 LEAK → AMBIGUOUS (ships prep file but no explicit invocation; path matches HF target) Updated tally: CLEAN 9, LEAK 21, AMBIGUOUS 3, INHERIT 1 (was 8/25/0/1). Headline impact: realistic clean SOTA is at most ~0.012 bpb below the claimed frontier openai#2118 (1.04350). Best clean BPB candidates in order: openai#2019 1.05847 (HF, confirmed) openai#1953 1.05855 (AMBIGUOUS, leans CLEAN) openai#1945 1.05943 (HF, confirmed via re-audit) openai#2031 1.05985 (HF, confirmed) openai#1908 1.06081 (HF, confirmed) openai#1851 1.06128 (HF, MERGED SOTA)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Record: PR #1945 base + 2560 long-context + no_qv TTT mask + TTT LR 0.75 + QK_GAIN 5.25 (val_bpb 1.05855)
val_bpb = 1.05855370 (3-seed mean, std 0.00029539) | max artifact 15,992,914 B | 8x H100 SXM | 600s train / 600s eval
Stacks four small, individually validated levers on the exact PR #1945 alertcat V21 record source (which is itself PR #1855 + PR #1908 AWQ-lite + PR #1923 Asymmetric Logit Rescale). Each lever was already measured on prior bases. The contribution here is the orthogonal stack and the production verification.
3-seed Results
Population std on final BPB: 0.00029539.
vs current rank 1 (PR #1855 at 1.06108): -0.00253 BPB.
vs PR #1945 reported mean (1.05943381): -0.00088 BPB.
vs merge bar (1.05893): -0.00038 BPB.
All seeds clear the 600s train cap, 600s eval cap, and 16,000,000-byte artifact cap.
What changed vs PR #1945
Four literal-constant additions on top of the exact alertcat V21 source. No new code paths, no new mechanisms, no architectural changes:
Everything else is verbatim PR #1945. AWQ-lite, Asymmetric Logit Rescale, CaseOps tokenizer, Polar Express NS, MIN_LR, fused softcapped CE, LQER asymmetric rank-4, sparse attention gate, BOS-fixed SmearGate, phased TTT (3 phases, 2500 prefix docs), per-group lrzip + brotli compression, GPTQ int6 + int7 embeddings.
Why each lever
Each lever was already publicly measured on a closely related base. None alone clears the merge bar. Combined on the PR #1945 base, they compose into a clearing stack.
EVAL_SEQ_LEN=2560withTTT_MASK=no_qv: extends eval and TTT score-first context past 2048. The baseline #1855 measurement reported 2560 + no_qv at val_bpb 1.06109776 in 473.4s, an improvement of about -0.00058 BPB vs the 2048 anchor at 473.4s eval time. Legal under the 600s eval cap.TTT_LOCAL_LR_MULT=0.75: scales local LoRA-TTT optimizer LR. The baseline #1855 sweep at 2560 no_qv showed 0.75 was the best multiplier in{0.50, 0.75, 1.00, 1.25, 1.50, 2.00}at val_bpb 1.06104597. Same direction holds here.QK_GAIN_INIT=5.25: replaces the 5.0 default per-head learnable Q-gain initialization. The baseline #1855 measurement reported QK_GAIN_INIT=5.25 seed-1234 post-TTT at -0.00019364 vs 5.0. Train-time init only.Asymmetric Logit Rescale via PR #1945 / PR #1923: replaces the single
logit_softcap=30.0with two learnable scalarssoftcap_posandsoftcap_neg, trained inside Phased TTT global SGD. PR #1945 finds Asym is positive when stacked with AWQ-lite due to better TTT recovery. Initialized at the symmetric value (30.0) so eval is identity at start. Inherited from PR #1945.AWQ-lite mixed precision via PR #1945 / PR #1908: during GPTQ calibration, collect activation RMS per layer, select the most-salient 64-column group, keep that group at int8 inside the GPTQ solve. Inherited from PR #1945.
Compliance (Issue #1017)
no_qvmask only zeroes Q and V LoRA paths, K / MLP / O / lm_head LoRA still trained per the PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855/Record: V22 = V21 + PR #1953 levers + EVAL_SEQ_LEN=2816 -- val_bpb 1.05877 (3-seed mean, all strict <600s) #1945 implementation.fineweb_val_*.bin+ CaseOps byte sidecar) per PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855 base.Reproduction
Repeat for
SEED=0andSEED=1234.Lineage
This stands on a long chain of prior submissions. The four added levers and the PR #1945 core are all from public PRs:
The diagnostic context (long-context score gating at 2560, no_qv mask, TTT_LOCAL_LR_MULT sweep, QK_GAIN_INIT sweep) was originally measured on exact #1855 in private experiments before this stack. None alone cleared the merge bar on #1855. The contribution here is recognizing that they compose orthogonally on the PR #1945 base.
Files
train_gpt.py: training and eval script. Verbatim PR Record: V22 = V21 + PR #1953 levers + EVAL_SEQ_LEN=2816 -- val_bpb 1.05877 (3-seed mean, all strict <600s) #1945 source plus the four literal-constant overrides above.submission.json: per-seed metadata, BPBs, wallclocks, artifact sizes.train_seed{42,0,1234}.log: per-seed train and eval logs.