Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA by pentxayc · Pull Request #731 · openai/parameter-golf

pentxayc · 2026-03-25T15:15:24Z

Summary

1.0400 BPB (seed 42, 2 additional seeds pending)
11L transformer (26.99M params) with Value Residual Learning (VRL), LeakyReLU(0.5)², XSA-4
5-expert Hedge Mixer during eval: neural model + unigram + bigram + trigram (64K hashed) + entropy
Hedge algorithm (eta=0.1) with deferred between-chunk weight updates (legal score-first)
AdamW TTT (lr=0.0005) + Polyak EMA (decay=0.998) + byte-weighted loss + adaptive cosine LR
Freeze first 9/11 blocks during TTT, unfreeze last 2 + norms/scales
Int6 mixed quantization + lzma compression
Artifact: 15,999,919 bytes (under 16MB limit)
Training: 6104 steps in 600s on 8xH100 SXM
Eval (TTT + Hedge): 404s / 600s budget

Legality

All eval-time adaptations are strictly score-first:

Hedge weights for chunk N computed from chunks 0..N-1 only (deferred update after all windows scored)
N-gram tables updated after chunk scoring completes
Polyak EMA uses fixed decay, no snapshot selection
TTT trains only on already-scored chunks
No validation data during training; no training data during evaluation

Test plan

Seed 42: 1.0400 BPB
Seed 1337: pending
Seed 2024: pending

🤖 Generated with Claude Code

5-expert Hedge Mixer (neural + unigram + bigram + trigram + entropy) with deferred between-chunk weight updates, combined with AdamW TTT + Polyak EMA + byte-weighted loss + adaptive cosine LR on an 11L VRL + LeakyReLU² + XSA-4 base. Seed 42 = 1.0400 BPB. Two additional seeds pending.

@valerio-oai

… Parallel Residuals path - PR openai#771 confirmed CLOSED/REJECTED (train-then-score TTT) - N-gram PRs openai#727/openai#741 CLOSED (illegal); openai#758/openai#731 open but same risk - Merged SOTA unchanged at 1.1147 - New high-EV targets: PR openai#1351 (Discriminative TTT, 1.0807) and PR openai#1334 (SP4096 + Depth Recurrence + Parallel Residuals + MuonEq-R, 1.0897) - SLOT still unruled in Issue openai#140 — blocked until @valerio-oai rules - CLAUDE.md updated to v8.0 with corrected strategy and Session 5 lessons https://claude.ai/code/session_01X5rVjJpYyqm8DuWTNy2gkt

MatoTeziTanka · 2026-04-12T04:52:16Z

Community Review — Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA

BPB: 1.0400 | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1413 dexhunter pattern)

What I found in the code (head SHA 6cff4df0d716, file records/track_10min_16mb/2026-03-25_HedgeMixer_VRL_AdamWTTT_1.0400/train_gpt.py):

The TTT path at line 1017 implements the score-first-per-chunk pattern: each chunk is scored under torch.no_grad() / inference_mode() before the base_model.train() + SGD adaptation runs on that same chunk, with an is_last_chunk guard so the final chunk gets no adaptation pass. This is the structural shape of the current leaderboard's legal frontier (PR #1413 dexhunter, the 1.0828 SP8192 + QK-Gain 5 + Legal TTT entry — verified at its head SHA against the is_last_chunk + torch.no_grad() score-first accumulator pattern).

Per Issue #402 and Issue #677, TTT is legal when each token is scored before the adapter updates on it, and that's what the code does here — chunk ci is scored under weights adapted only on chunks 0..ci-1. No prequant_ttt_adapt_adamw(val_tokens, ...) multi-epoch fine-tune, no scored-region SLOT, no target-in-key n-gram cache.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.12s, dim=512, layers=11, vocab=1024, code=94305 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending standard checks (3-seed validation, 16MB artifact cap, 10-min wallclock on 8×H100 SXM). The compliance picture matches the legal reference frontier and no flags were raised by the classification pass.

Auto-classification caveat: this review was drafted by the AST-based classifier against a template derived from manually-reviewed cluster PRs (#1420, #1450, #1487, #1541, #1529, #1533, #1518). If I've misread a subtlety in your eval path — e.g., multi-epoch TTT that I mistook for single-pass, or a target-in-key lookup I missed in a helper function — please flag it and I'll re-run the audit manually.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.12s, dim=512, layers=11, vocab=1024, code=94305 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

…1.01710 Merged SOTA changed from 1.1147 to 1.0810 (PR openai#1493, bigbag, 2026-04-09). Six PRs merged in 5 days (PRs openai#1334, openai#1285, openai#1394, openai#1412, openai#1413, openai#1477, openai#1493). New target: ≤1.0760 val_bpb. 18 days to deadline. Key findings: - GDN-Hybrid (PR openai#1564): 1.01710 BPB, no TTT/SLOT — monitor for organizer review - VarLen Attention + Doc-TTT (PR openai#1560): 1.07406 BPB — implement next - TMA Megakernel + Tap-In (PR openai#1555): 1.07636 BPB — add after openai#1560 - PR openai#731 n-gram (dense count + Laplace): reviewer says LOOKS CLEAN, awaiting 3rd seed - PR openai#758: major legality flags, do not implement Updated CLAUDE.md: Competition Strategy, Technique Reference, Lessons Learned (Session 9). Updated logs/daily_research.md: new 2026-04-12 entry prepended. https://claude.ai/code/session_011WyxjcwdigLhMFQDjLL5ss

…beat 1.0810; PR openai#731 seeds pending; Issue openai#1604 unruled; implement openai#1586 overdue https://claude.ai/code/session_01WRwftr7PozyD9T9iQWpCur

@valerio-oai

…ai#1787 Polar Express NS new base; PR openai#1795 PPM 1.01252; Issue openai#1604 deadline passed; Session 20 - Merged SOTA 1.0810 confirmed Day 15 (README not updated despite Scylla record commit) - Scylla 0.9485 committed to track_10min_16mb/ on Apr 23 (PR openai#1184) but byte accounting disputed by PR openai#1271 (corrected ~1.1289 bpb); treat merged SOTA as 1.0810 - PR openai#771 CLOSED/REJECTED confirmed; PR openai#727 CLOSED (illegal); PR openai#758 open but dead; PR openai#731 still awaiting seeds 1337+2024 - Issue openai#1604 (CaseOps ruling): NO @valerio-oai response in 11 days; self-deadline Apr 24 passed; proceed with clean legal stack immediately - NEW: PR openai#1787 (nprime06, 1.06335) — new community-consensus clean base with Polar Express Newton-Schulz (arXiv:2505.16932, ICLR 2026) + MIN_LR=0.10 warmdown floor - NEW: PR openai#1795 (OE-GOD, 1.01252) — byte-level PPM order-4 adaptive mixture; gate legality concern fixed; await organizer ruling before implementing - NEW: PR openai#1797 (dexhunter, 1.06157) — PR openai#1787 + SmearGate + LQER Asym; new dexhunter best - NEW: PR openai#1802 (aamodbhatt, 1.0771) — Polar Express NS + Multi-Phase Global TTT - TECHNIQUE: Polar Express NS (arXiv:2505.16932) and Gram NS (Dao-AILab) added to table - TECHNIQUE: MIN_LR=0.10 warmdown floor added to best-stack approach - Updated competition strategy: stop waiting for CaseOps, implement clean stack with Polar Express NS + MIN_LR immediately (6 days to deadline) https://claude.ai/code/session_01JZ3FiS937NwLHt3Fv9WHPD

…1835 PPM-D 1.00136 new watch; NgramRes stackable; Day 17 plateau; Session 22 - Upstream commit 7427de2 (Alex Zhao, OpenAI Apr 26): Scylla 0.9485 (PR openai#1184) removed as invalid record; PR openai#1813 (djeidy Scylla 0.94166) effectively dead by proxy - PR openai#1835 (anmarhindi, 1.00136): PPM-D order-5 byte mixture, binary-λ gate, score-first, 15,993,020 bytes — most credible extraordinary claim yet; wait 24h for community BPB check - PR openai#1834 (ghrua, 1.08034): NgramRes 3-gram MLP +0.6M params + sliding-window attn layers 0-3 — modest, stackable - PR openai#731 (Hedge Mixer): still OPEN, 2 seeds pending, no merge - Merged SOTA 1.0810 definitively confirmed; target ≤1.0760; 4 days to deadline https://claude.ai/code/session_01XbdTRT7zPHoGp3LfQV4yXF

… competition closed - Merged SOTA dropped from 1.0810 → 1.0611 (codemath3000, PR openai#1855) with all organizer pending branches now in main (CaseOps + SmearGate BOS fix + lrzip) - New target was ≤1.0561; competition closes today (April 30) - PR openai#1967 (ndokutovich, 1.05851): best clean legal open PR, timing question pending - PR openai#1991 (joshuaswanson, 0.94290): Byte-PPM Mixer; Issue openai#1872 open, no ruling - PR openai#1992 / openai#1972: ILLEGAL (PreQuantTTT 21ep) - PR openai#731 (Hedge Mixer, 1.0400): seeds 1337/2024 never filed; competition closing - Session 25 lessons + final Competition Strategy update added to CLAUDE.md https://claude.ai/code/session_01QKHz6Vfu2DFZdc7GiuKSBQ

notapplica mentioned this pull request Mar 25, 2026

Parameter Golf Formerly Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes. Now disabled #140

Closed

sunnypatneedi mentioned this pull request Mar 26, 2026

Record: 11-gram Eval Cache + Hedge Mixer (val_bpb: 0.8609) #909

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA#731

Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA#731
pentxayc wants to merge 1 commit intoopenai:mainfrom
pentxayc:submission/hedge-mixer-vrl-1.0410

pentxayc commented Mar 25, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pentxayc commented Mar 25, 2026

Summary

Legality

Test plan

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Community Review — Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants