Record candidate: 1.06032 CaseOps + Matrix-LR 0.028 + TTT n=1#1925
Record candidate: 1.06032 CaseOps + Matrix-LR 0.028 + TTT n=1#1925simon-marcus wants to merge 3 commits intoopenai:mainfrom
Conversation
- spec 060N: compound AWQ-lite (PR openai#1908) + 4 TTT phases + 3000 prefix + 2 global-SGD epochs, eval-only on 060A's final_model.pt. Single-shot compound to use openai#1918's ~205s eval-time slack; safe fallback drops GLOBAL_TTT_EPOCHS if wallclock blows. - new idea 1925-matrix-lr-ttt-prefix-tune (PR openai#1925, hyperparam-only on openai#1855: MATRIX_LR=0.028 + PHASED_TTT_PREFIX_DOCS=3500 → 1.06109). - new idea 1915-per-doc-lora-ttt (PR openai#1915, per-doc-only LoRA TTT discipline; parked as fallback if global-SGD class is ruled out). - frontier scan: 21 new PRs (openai#1906-openai#1931). Headline: PRs openai#1908+openai#1918 independently confirm AWQ-lite mixed-bit GPTQ pattern at ~1.0608 on openai#1855 base; openai#1925 hyperparam-only at 1.06109; openai#1923 Asymmetric Logit Rescale = empirical negative; openai#1929 banned SLOT+prequant-TTT. - frontier-state.json: 21 PRs added; total 200. - diary/2026-04-29-frontier-scan.md: full scan report. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ams) After 4 parallel research agents reviewed 30+ open PRs and compliance issues, two new findings: 1. PR openai#1923 (AsymLogit) flagged "empirical negative" by sunnypatneedi 4-29 frontier-scan, BUT only on PR openai#1855 base with default WD=1.0. Never tested on PR openai#1908 + WD=2.0 combo. V19's specific stack is NOT directly invalidated. 2. PR openai#1925 simon-marcus 1.06049 (3-seed verified, vs PR openai#1855 base 1.06108 = -0.00059 BPB). Just 2 hparam env vars: MATRIX_LR 0.026 -> 0.028 PHASED_TTT_PREFIX_DOCS 2500 -> 3500 Orthogonal axis to AsymLogit (LR/TTT prefix vs logit head). Adds two new scout scripts: - run_v19c_stacked_scout.sh: PR openai#1908 + AsymLogit + simon-marcus + WD=2.0 (full stack, recommended first scout) - run_v19b_simonmarcus_scout.sh: PR openai#1908 + simon-marcus + WD=2.0 (ablation if V19c wins partially) Decision rule (CaseOps val baseline 0.97651, community floor 0.0006): V19c < 0.97591 -> CLEAR WIN, run 3-seed V19c 0.97591-0.9755 -> borderline, ablate via V19a/V19b V19c > 0.9755 -> abandon stack, try Lead B (PR openai#1884) Other research findings: - PR openai#1898 SpinQuant flagged regression vs parent openai#1851 (skip) - PR openai#1929 SLOT banned per openai#1722 precedent - PR openai#1911 pre-quant TTT chain banned per openai#1735 precedent - cocohearts 4-28 PR openai#1902 confirmed PR openai#1855 as official openai#1 - regina-openai + Alex Zhao 48h zero activity - CaseOps de-facto legal (PR openai#1855 merged into chain)
…LoRA LR=8e-5) eval-only on spec 250 outputs
…LoRA LR=8e-5) eval-only on spec 250 outputs
|
@cocohearts Gently flagging a possible missed row: #1925. I think it may have fallen into a scan-range gap. #1902 says it scanned PRs The chronology is a little hard to see from final PR states because both #1925 and #1945 were updated after opening, but here's how it looks:
I don’t think #1925 affects the final top row. But if technically acceptable, it seems like it would be a chronological frontier/support row after No worries if I’m missing an exclusion rationale; just flagging because the scan ranges appear to skip this PR numerically. Thanks. |
Record candidate: CaseOps + Matrix-LR 0.028 + TTT n=1 / LoRA LR 8e-5
val_bpb: 1.06032 (seed-matched 3-seed composite mean vs #1855, std 0.00114) | 15.90 MB max | 8xH100 SXM | 600s train | score-first TTT eval
This updates the original PR #1925 result by reusing the same trained/quantized artifacts and changing only the legal score-first TTT eval procedure:
PHASED_TTT_PREFIX_DOCS=3500PHASED_TTT_NUM_PHASES=1TTT_LORA_LR=0.00008No retraining or re-quantization is included in the updated headline result. The composite logs make that explicit: each contains the original train/quant section, a
COMPOSITE EVAL-ONLY UPDATE FOR PR #1925marker, and the updated TTT eval continuation.Seed-Matched 3-Seed Results
Primary score report uses the exact seed set from #1855 (
42,0,1234) for a direct paired comparison.train_seed42_ttt_n1_lora8e5.logtrain_seed0_ttt_n1_lora8e5.logtrain_seed1234_ttt_n1_lora8e5.logSeed-matched std over the updated TTT BPBs is
0.00114. The matched #1855 mean is1.06107587, so this improves the paired comparison by0.00075852BPB. It also improves the original PR #1925 mean1.06049099by0.00017363BPB, with all three matched seeds individually better.What Changed Since Initial PR #1925
train_gpt.pydefaultTTT_LORA_LRfrom0.0001to0.00008.PHASED_TTT_PREFIX_DOCS=3500andPHASED_TTT_NUM_PHASES=1default.TTT_EVAL_UPDATE.mdwith the compact matched eval-only table.0,42, and1234.Compliance / Legality
599.45sto599.64s.367.2sto370.1s.15,902,776.Key Techniques
lrzip+ brotli compression.MATRIX_LR=0.028,PHASED_TTT_PREFIX_DOCS=3500,PHASED_TTT_NUM_PHASES=1, andTTT_LORA_LR=0.00008.Reproduction