Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,20 @@ Happy training!

| Run | Score | Author | Summary | Date | Info |
|-----|------:|--------|---------|------|------|
| BOS-Fixed SmearGate + LQER + SparseAttnGate + 9-Hparam Stack | 1.0608 | codemath3000 | On PR #1855: BOS-fixed #1797-derived stack with LQER, PR #1787 SparseAttnGate/PolarNS/FusedCE base, per-group lrzip compression, and 9 greedy hyperparameter overrides; 6-sample evidence including independent reproduction passes p<0.25 vs #1851/#1868 | 2026-04-27 | [info](https://github.com/openai/parameter-golf/pull/1855), [repro](https://github.com/openai/parameter-golf/pull/1855#issuecomment-4336629746) |
| BOS-Fixed SmearGate + LQER Asymmetric + PR1787 SparseAttn + Phased TTT | 1.0615 | aquariouseworkman | On PR #1851 with 3-seed support from PR #1868: follow-on run of dexhunter's BOS-masked SmearGate + LQER stack from PR #1797, using the PR #1787 SparseAttnGate/PolarNS/FusedCE base plus CaseOps and phased score-first TTT | 2026-04-27 | [info](https://github.com/openai/parameter-golf/pull/1851), [3-seed](https://github.com/openai/parameter-golf/pull/1868) |
| PR1736 + PolarNS + MIN_LR + SparseAttnGate + FusedCE + Warm-A TTT | 1.0634 | nprime06 | On PR #1787: PR #1736 CaseOps stack plus Polar Express Newton-Schulz coefficients, MIN_LR=0.1, SparseAttnGate, fused softcapped CE, and PR #1767-style warm-start-A TTT | 2026-04-23 | [info](https://github.com/openai/parameter-golf/pull/1787) |
| CaseOps + MLPClip12 + SmearGate/LoRA-TTT | 1.0645 | dexhunter | On PR #1769: CaseOps stack with SmearGate/LoRA-TTT refinements and MLPClip12; 5-seed mean improves the accepted CaseOps frontier under p<0.25 | 2026-04-22 | [info](https://github.com/openai/parameter-golf/pull/1769) |
| SP8192 + CaseOps + GatedAttn + QuantGate + Loop45 + Phased TTT | 1.0655 | dexhunter | On PR #1736: adopts romeerp's lossless CaseOps transform from PR #1729 with byte-sidecar BPB accounting, then adds gated attention and quant-gate scaling on the PR #1530 SP8192 phased-TTT stack | 2026-04-19 | [info](https://github.com/openai/parameter-golf/pull/1736) |
| CaseOps Tokenizer + Tapered WD + Phased TTT | 1.0678 | romeerp | On PR #1729: lossless CaseOps bijective case transform with validation byte sidecars, plus mild late Muon weight-decay taper on the PR #1626 legal phased-TTT stack | 2026-04-19 | [info](https://github.com/openai/parameter-golf/pull/1729) |
| SmearGate + Attention Output Gate + Legal TTT | 1.0714 | MarioPaerle | On PR #1667: SmearGate, attention output gate, depth recurrence, parallel residuals, QK-Gain 5.25, quantization, and score-first TTT | 2026-04-16 | [info](https://github.com/openai/parameter-golf/pull/1667) |
| VarLen Attention + Fused MLP + Multi-Phase Global SGD TTT | 1.0719 | dexhunter | On PR #1626: VarLen attention, fused MLP, multi-phase global SGD TTT, trimmed GPTQ, MLR 0.026, int7 embeddings, and adaptive clip | 2026-04-14 | [info](https://github.com/openai/parameter-golf/pull/1626) |
| VarLenAttn + PhasingTTT | 1.0728 | romeerp | On PR #1610: #1530-style VarLen/fused stack plus phased TTT over already-scored validation chunks | 2026-04-13 | [info](https://github.com/openai/parameter-golf/pull/1610) |
| VarLen Attention + Fused MLP + Doc-Independent Legal TTT | 1.0734 | samacqua | On PR #1530: variable-length FA3 attention, fused Triton MLP, grouped small-parameter all-reduces, and doc-independent score-first LoRA TTT | 2026-04-11 | [info](https://github.com/openai/parameter-golf/pull/1530) |
| Asymmetric Two-Lane Parallel Routing + Tap-In V6 + Legal TTT | 1.0739 | Abay Bektursun | On PR #1518: two-lane parallel residual routing, Tap-In V6 prefix n-gram eval nudge, and score-first legal TTT | 2026-04-12 | [info](https://github.com/openai/parameter-golf/pull/1518) |
| Improved Parallel Residuals + Systems Optimization | 1.0752 | codemath3000 | On PR #1584: systems-only speedup of PR #1529's dual-lane parallel-residual stack with fused Muon, batched EMA, and loader prealloc; identical ML, so the statistical-significance requirement is waived | 2026-04-13 | [info](https://github.com/openai/parameter-golf/pull/1584) |
| Improved Parallel Residuals + CUTLASS EVT | 1.0758 | msisovic | On PR #1529: dual-lane parallel residual routing on the PR #1523/#1518-era stack with inlined CUTLASS EVT path and legal score-first TTT; included by score-at-opening chronology before #1518's later score update | 2026-04-11 | [info](https://github.com/openai/parameter-golf/pull/1529) |
| SP8192 + Muon 0.97 + Legal Score-First TTT | 1.0798 | dexhunter | On PR #1514: SP8192 with Muon 0.97 and legal score-first TTT; under the current p<0.25 rule, its 3-seed sweep beats #1493 (p≈0.020) | 2026-04-09 | [info](https://github.com/openai/parameter-golf/pull/1514) |
| SP8192 + 3-Layer Recurrence + Parallel Residuals + Legal TTT | 1.0810 | bigbag | On PR #1493: 3-layer recurrence, parallel residuals, QK-Gain 5.25, and legal score-first TTT on the PR #1394 stack | 2026-04-09 | [info](records/track_10min_16mb/2026-04-09_SP8192_3LayerRecur_ParResid_QK525_LegalTTT/README.md) |
| SP8192 + Parallel Residuals + Score-First TTT | 1.0822 | aryanbhosale | On PR #1477: parallel residuals on the PR #1413 SP8192 + legal score-first TTT stack | 2026-04-08 | [info](records/track_10min_16mb/2026-04-08_SP8192_ParallelResid_ScoreFirstTTT/README.md) |
| SP8192 + QK-Gain 5 + Legal Score-First TTT | 1.0828 | dexhunter | On PR #1413: QK-Gain 5.0 + legal score-first TTT on the PR #1394 SP8192 stack | 2026-04-06 | [info](records/track_10min_16mb/2026-04-06_SP8192_QK5_LegalTTT_1.0828/README.md) |
Expand Down