Skip to content

Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA#731

Open
pentxayc wants to merge 1 commit intoopenai:mainfrom
pentxayc:submission/hedge-mixer-vrl-1.0410
Open

Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA#731
pentxayc wants to merge 1 commit intoopenai:mainfrom
pentxayc:submission/hedge-mixer-vrl-1.0410

Conversation

@pentxayc
Copy link
Copy Markdown

Summary

  • 1.0400 BPB (seed 42, 2 additional seeds pending)
  • 11L transformer (26.99M params) with Value Residual Learning (VRL), LeakyReLU(0.5)², XSA-4
  • 5-expert Hedge Mixer during eval: neural model + unigram + bigram + trigram (64K hashed) + entropy
  • Hedge algorithm (eta=0.1) with deferred between-chunk weight updates (legal score-first)
  • AdamW TTT (lr=0.0005) + Polyak EMA (decay=0.998) + byte-weighted loss + adaptive cosine LR
  • Freeze first 9/11 blocks during TTT, unfreeze last 2 + norms/scales
  • Int6 mixed quantization + lzma compression
  • Artifact: 15,999,919 bytes (under 16MB limit)
  • Training: 6104 steps in 600s on 8xH100 SXM
  • Eval (TTT + Hedge): 404s / 600s budget

Legality

All eval-time adaptations are strictly score-first:

  1. Hedge weights for chunk N computed from chunks 0..N-1 only (deferred update after all windows scored)
  2. N-gram tables updated after chunk scoring completes
  3. Polyak EMA uses fixed decay, no snapshot selection
  4. TTT trains only on already-scored chunks
  5. No validation data during training; no training data during evaluation

Test plan

  • Seed 42: 1.0400 BPB
  • Seed 1337: pending
  • Seed 2024: pending

🤖 Generated with Claude Code

5-expert Hedge Mixer (neural + unigram + bigram + trigram + entropy) with
deferred between-chunk weight updates, combined with AdamW TTT + Polyak EMA
+ byte-weighted loss + adaptive cosine LR on an 11L VRL + LeakyReLU² + XSA-4
base. Seed 42 = 1.0400 BPB. Two additional seeds pending.
sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 4, 2026
… Parallel Residuals path

- PR openai#771 confirmed CLOSED/REJECTED (train-then-score TTT)
- N-gram PRs openai#727/openai#741 CLOSED (illegal); openai#758/openai#731 open but same risk
- Merged SOTA unchanged at 1.1147
- New high-EV targets: PR openai#1351 (Discriminative TTT, 1.0807) and PR openai#1334
  (SP4096 + Depth Recurrence + Parallel Residuals + MuonEq-R, 1.0897)
- SLOT still unruled in Issue openai#140 — blocked until @valerio-oai rules
- CLAUDE.md updated to v8.0 with corrected strategy and Session 5 lessons

https://claude.ai/code/session_01X5rVjJpYyqm8DuWTNy2gkt
@MatoTeziTanka
Copy link
Copy Markdown

Community Review — Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA

BPB: 1.0400 | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1413 dexhunter pattern)

What I found in the code (head SHA 6cff4df0d716, file records/track_10min_16mb/2026-03-25_HedgeMixer_VRL_AdamWTTT_1.0400/train_gpt.py):

The TTT path at line 1017 implements the score-first-per-chunk pattern: each chunk is scored under torch.no_grad() / inference_mode() before the base_model.train() + SGD adaptation runs on that same chunk, with an is_last_chunk guard so the final chunk gets no adaptation pass. This is the structural shape of the current leaderboard's legal frontier (PR #1413 dexhunter, the 1.0828 SP8192 + QK-Gain 5 + Legal TTT entry — verified at its head SHA against the is_last_chunk + torch.no_grad() score-first accumulator pattern).

Per Issue #402 and Issue #677, TTT is legal when each token is scored before the adapter updates on it, and that's what the code does here — chunk ci is scored under weights adapted only on chunks 0..ci-1. No prequant_ttt_adapt_adamw(val_tokens, ...) multi-epoch fine-tune, no scored-region SLOT, no target-in-key n-gram cache.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.12s, dim=512, layers=11, vocab=1024, code=94305 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending standard checks (3-seed validation, 16MB artifact cap, 10-min wallclock on 8×H100 SXM). The compliance picture matches the legal reference frontier and no flags were raised by the classification pass.

Auto-classification caveat: this review was drafted by the AST-based classifier against a template derived from manually-reviewed cluster PRs (#1420, #1450, #1487, #1541, #1529, #1533, #1518). If I've misread a subtlety in your eval path — e.g., multi-epoch TTT that I mistook for single-pass, or a target-in-key lookup I missed in a helper function — please flag it and I'll re-run the audit manually.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.12s, dim=512, layers=11, vocab=1024, code=94305 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 12, 2026
…1.01710

Merged SOTA changed from 1.1147 to 1.0810 (PR openai#1493, bigbag, 2026-04-09).
Six PRs merged in 5 days (PRs openai#1334, openai#1285, openai#1394, openai#1412, openai#1413, openai#1477, openai#1493).
New target: ≤1.0760 val_bpb. 18 days to deadline.

Key findings:
- GDN-Hybrid (PR openai#1564): 1.01710 BPB, no TTT/SLOT — monitor for organizer review
- VarLen Attention + Doc-TTT (PR openai#1560): 1.07406 BPB — implement next
- TMA Megakernel + Tap-In (PR openai#1555): 1.07636 BPB — add after openai#1560
- PR openai#731 n-gram (dense count + Laplace): reviewer says LOOKS CLEAN, awaiting 3rd seed
- PR openai#758: major legality flags, do not implement

Updated CLAUDE.md: Competition Strategy, Technique Reference, Lessons Learned (Session 9).
Updated logs/daily_research.md: new 2026-04-12 entry prepended.

https://claude.ai/code/session_011WyxjcwdigLhMFQDjLL5ss
sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 20, 2026
sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 24, 2026
…ai#1787 Polar Express NS new base; PR openai#1795 PPM 1.01252; Issue openai#1604 deadline passed; Session 20

- Merged SOTA 1.0810 confirmed Day 15 (README not updated despite Scylla record commit)
- Scylla 0.9485 committed to track_10min_16mb/ on Apr 23 (PR openai#1184) but byte accounting
  disputed by PR openai#1271 (corrected ~1.1289 bpb); treat merged SOTA as 1.0810
- PR openai#771 CLOSED/REJECTED confirmed; PR openai#727 CLOSED (illegal); PR openai#758 open but dead;
  PR openai#731 still awaiting seeds 1337+2024
- Issue openai#1604 (CaseOps ruling): NO @valerio-oai response in 11 days; self-deadline Apr 24
  passed; proceed with clean legal stack immediately
- NEW: PR openai#1787 (nprime06, 1.06335) — new community-consensus clean base with Polar Express
  Newton-Schulz (arXiv:2505.16932, ICLR 2026) + MIN_LR=0.10 warmdown floor
- NEW: PR openai#1795 (OE-GOD, 1.01252) — byte-level PPM order-4 adaptive mixture; gate legality
  concern fixed; await organizer ruling before implementing
- NEW: PR openai#1797 (dexhunter, 1.06157) — PR openai#1787 + SmearGate + LQER Asym; new dexhunter best
- NEW: PR openai#1802 (aamodbhatt, 1.0771) — Polar Express NS + Multi-Phase Global TTT
- TECHNIQUE: Polar Express NS (arXiv:2505.16932) and Gram NS (Dao-AILab) added to table
- TECHNIQUE: MIN_LR=0.10 warmdown floor added to best-stack approach
- Updated competition strategy: stop waiting for CaseOps, implement clean stack with
  Polar Express NS + MIN_LR immediately (6 days to deadline)

https://claude.ai/code/session_01JZ3FiS937NwLHt3Fv9WHPD
sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 26, 2026
…1835 PPM-D 1.00136 new watch; NgramRes stackable; Day 17 plateau; Session 22

- Upstream commit 7427de2 (Alex Zhao, OpenAI Apr 26): Scylla 0.9485 (PR openai#1184) removed as invalid record; PR openai#1813 (djeidy Scylla 0.94166) effectively dead by proxy
- PR openai#1835 (anmarhindi, 1.00136): PPM-D order-5 byte mixture, binary-λ gate, score-first, 15,993,020 bytes — most credible extraordinary claim yet; wait 24h for community BPB check
- PR openai#1834 (ghrua, 1.08034): NgramRes 3-gram MLP +0.6M params + sliding-window attn layers 0-3 — modest, stackable
- PR openai#731 (Hedge Mixer): still OPEN, 2 seeds pending, no merge
- Merged SOTA 1.0810 definitively confirmed; target ≤1.0760; 4 days to deadline

https://claude.ai/code/session_01XbdTRT7zPHoGp3LfQV4yXF
sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 30, 2026
… competition closed

- Merged SOTA dropped from 1.0810 → 1.0611 (codemath3000, PR openai#1855) with all
  organizer pending branches now in main (CaseOps + SmearGate BOS fix + lrzip)
- New target was ≤1.0561; competition closes today (April 30)
- PR openai#1967 (ndokutovich, 1.05851): best clean legal open PR, timing question pending
- PR openai#1991 (joshuaswanson, 0.94290): Byte-PPM Mixer; Issue openai#1872 open, no ruling
- PR openai#1992 / openai#1972: ILLEGAL (PreQuantTTT 21ep)
- PR openai#731 (Hedge Mixer, 1.0400): seeds 1337/2024 never filed; competition closing
- Session 25 lessons + final Competition Strategy update added to CLAUDE.md

https://claude.ai/code/session_01QKHz6Vfu2DFZdc7GiuKSBQ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants