Skip to content

[10min_16mb] 0.9641 BPB: LeakyReLU² + Score-First TTT + N-gram Backoff Cache#1185

Closed
skoustav35 wants to merge 6 commits intoopenai:mainfrom
skoustav35:main
Closed

[10min_16mb] 0.9641 BPB: LeakyReLU² + Score-First TTT + N-gram Backoff Cache#1185
skoustav35 wants to merge 6 commits intoopenai:mainfrom
skoustav35:main

Conversation

@skoustav35
Copy link
Copy Markdown

Submitting a new entry for the 10-minute 16MB track that achieves a 3-seed exact mean of 0.9641 BPB (1.6274 nats).

This improves upon the current merged 1.1147 BPB baseline (PR #1019) by 0.1506 BPB (0.2548 nats), which exceeds the required 0.005 nats threshold by ~51× (Welch t = -328.3, p ≪ 0.01).

Techniques Used

  • Architecture: 11 Layers, 512 dim, GQA = 8H/4KV, MLP 3x, LeakyReLU(0.5)², XSA-5 (layers 6-10), Tied embeddings, Value Residual, Gated Attention, VE(128) on layers 8/9/10, MTP-2, BigramHash 2048.
  • Eval-time N-gram Backoff Cache:
    • Multi-order backoff (orders 2–9), picking the highest matching order.
    • Laplace (add-1) smoothing: Ensures the returned probability is a proper normalized distribution over the vocabulary and does not depend on target-oracle knowledge.
    • Entropy-adaptive alpha scaling.
  • Test-Time Training (Legal, Score-First):
    • SGD, 3 epochs, 32K token chunks, stride 64.
    • Tokens are scored strictly backward-lookingly before updates.
  • Optimization & Quantization:
    • Muon + Adam split.
    • Int6 per-row quantization with LZMA compression. Late-stage CROWN-Q penalty.

Compliance & Margins

Reproducibility

The script resolves data paths relative to the repo root automatically.

SEED=1337 RUN_ID=seed_1337 VOCAB_SIZE=1024 \
torchrun --standalone --nproc_per_node=8 \
  records/track_10min_16mb/2026-03-31_LeakyReLU2_LegalTTT_NGramCache_XSA/train_gpt.py

sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Mar 31, 2026
- logs/daily_research.md: append 2026-03-31 research section
  - PR openai#771 CLOSED (score-first TTT rule violation)
  - PR openai#727 CLOSED (n-gram illegal — no renormalization)
  - Merged SOTA: 1.1147 (PR openai#1019, 2026-03-25)
  - New PRs: openai#1184 (0.9485 Scylla tokenizer), openai#1185 (0.9641)
  - SLOT eval technique, Full GPTQ, QK-Gain 4.0 documented
- CLAUDE.md: update Competition Strategy + lessons 21-24
  - Merged SOTA updated to 1.1147
  - Current Best Path rewritten for 2026-03-31
  - Lessons openai#21-24: TTT fix, n-gram risk, Scylla, SLOT
  - TTT constraint clarified to score-first protocol
  - Version bumped to v9.0

https://claude.ai/code/session_015z6QKyKzDSYzTniW1GPhAe
…ct-for-golf-challenge

Add opt-in MoD routing, SquareGLU MLP, EMA warmdown distillation, and Grokfast
@valerio-oai
Copy link
Copy Markdown
Contributor

valerio-oai commented Apr 2, 2026

Hi! Even though you aren't using the hashed n-gram cache and using Laplace smoothing instead, I think your implementation as currently coded still uses knowledge of the eval token ahead of time to calculate the blended ngram probability, which is not allowed, you should calculate and renormalize over the whole vocab size, or with some other heuristic that does not use oracle knowledge of the eval token. If you did that, I would be more inclined to treat this as legal. Closing for now.

@valerio-oai valerio-oai closed this Apr 2, 2026
MatoTeziTanka pushed a commit to MatoTeziTanka/parameter-golf that referenced this pull request Apr 6, 2026
NEW SECTION 'Maintainer Activity Tracker' immediately above Checklist:
4 cards for the validated OpenAI mods with merge/close authority on
openai/parameter-golf:

  - notapplica   ~13h ago  (closed openai#140)
  - valerio-oai  ~5 days   (closed PR openai#1185)
  - 0hq          ~12 days  (Will DePue, founder)
  - yuzhougu-oai ~17 days

Each card shows days-silent badge color-coded by severity, last action
summary, authority signal (how we validated they have authority), and
account creation date. Methodology disclosed inline.

Validation: author_association on closed PRs (only collaborators can
close other users' PRs), OAI account suffix, repo collaborator status,
cross-checked against /users/<handle>/events for last activity timestamp.

NAV: Added '⚠ Mod Tracker' link in red. Sits between Home and Checklist.

scripts/update_mod_tracker.py: Standalone script that hits the GitHub
events API for each validated handle, computes days-silent and severity,
writes data/mod_tracker.json. Run on every Agora rebuild to keep current.

BANNER FIX: valerio-oai last active April 2 (closed PR openai#1185 + comments)
not April 4 as previously stated. Confirmed via events API. The earlier
date was based on the openai#677 thread comment which was actually March 27.

CHANGELOG: 4 new entries for v0.8.0 covering the mod tracker, validation
methodology, auto-update script, and banner timestamp fix.
himanshudongre added a commit to himanshudongre/parameter-golf that referenced this pull request Apr 15, 2026
…at Scale

Rigorous negative result demonstrating that a legal causal n-gram additive-logit
blend does not scale to strong models, paired with the first clean reference
implementation verified against all three valerio-oai closure rulings (openai#993 hashed
caches, openai#1185 full-vocab renormalization, openai#959 two-pass rescoring).

Includes:
- 8-probe automated legality harness + 4-test integration suite
- Scaling curve across 6 model configurations (2L/4L, 128d/256d, 800-4000 steps,
  sp1024/sp8192) showing peak BPB improvement collapses from 0.0515 (weak baseline)
  to 0.00018 (strongest model) — well below the 0.0072 BPB record threshold
- Localized delta decomposition showing 100% of the gain comes from out-of-
  attention-window cache hits, and why the sp1024 → sp8192 transition erodes even
  that architectural floor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants