Record: GatedDeltaNet FLA + Score-First TTT + Brotli — val_bpb 1.00980 (3-seed mean)#1711
Closed
aamodbhatt wants to merge 1 commit intoopenai:mainfrom
Closed
Record: GatedDeltaNet FLA + Score-First TTT + Brotli — val_bpb 1.00980 (3-seed mean)#1711aamodbhatt wants to merge 1 commit intoopenai:mainfrom
aamodbhatt wants to merge 1 commit intoopenai:mainfrom
Conversation
…0 (3-seed mean) GatedDeltaNet linear attention (FLA) K_KVShare_Wider + legal score-first TTT (SGD 3ep freeze=2) + brotli-11 compression. 3-seed mean: 1.00980 BPB (std 0.0015). All artifacts under 16 MB. Seeds: 1337 (1.00803), 42 (1.01069), 2025 (1.01067) TTT gain: ~-0.009 BPB per seed Based on PR openai#1687 by @resouer, TTT adapted from PR openai#461.
resouer
added a commit
to resouer/parameter-golf
that referenced
this pull request
Apr 18, 2026
PR openai#1711's records README explicitly enables score-first TTT, but the extracted wrapper still defaulted TTT off. This worker-side patch keeps the reproduction surface aligned with the submitted command instead of silently falling back to the no-TTT path. Constraint: Round31 is meant to validate the public claimed surface, not code-default drift Rejected: Keep the default-off wrapper | would reproduce the wrong surface Confidence: high Scope-risk: narrow Directive: Treat W88 results before this commit as code-default/no-TTT evidence, not faithful PR openai#1711 reproduction Tested: python3 -m py_compile train_gpt.py train_gdn_7k.py architectures.py configs.py Not-tested: remote end-to-end score after relaunch
manfromnowhere143
added a commit
to manfromnowhere143/parameter-golf
that referenced
this pull request
Apr 18, 2026
Aweb's record-attempt submission, building on PR openai#1711 (1.00980 BPB) by adding EMA-Teacher Distillation (Tarvainen & Valpola NeurIPS 2017, 'Mean teachers are better role models') as the novel contribution. Loss: L = (1-α)·CE(target) + α·KL(student || teacher.detach()) Teacher is a separate copy of the student model, periodically (every K=16 steps) synchronized from the EMA-smoothed state already maintained by the frontier code. Alpha ramps linearly 0 → 0.3 over the middle 40%% of training (steps 30%%-70%%). Temperature scaling per Hinton soft-target convention (KL × T²). Verified novel via gh search (mean teacher / EMA teacher / distillation / KL soft targets) — zero matching open PRs in the competition. Verified legal under Issue openai#1017 conditions 1-4: - Causal (teacher uses same forward as student) - Full distribution (KL on full softmax over vocab) - Score-before-update (distillation is training-time only; eval unchanged) - Single L→R pass (no rescoring) CPU smoke test (8 cases, FLA-independent) passes: CE-only path correct, EMT path differs from CE, gradient routes to student not teacher, temperature scaling active, alpha schedule correct, mini-training loss decreases, KL of identical distributions = 0. Credits: PR openai#1711 (aamodbhatt) GDN+brotli base; PR openai#1687 (resouer) GDN K_KVShare; PR openai#461 (Christopher-Lee-McClendon) Score-First Legal TTT; FLA library (sustcsonglin); Tarvainen & Valpola (NeurIPS 2017) Mean Teacher framework.
This was referenced Apr 18, 2026
Contributor
Author
|
Closing — byte-counting bug in inherited build_sentencepiece_luts double-credits the leading-space byte, inflating the denominator and deflating reported BPB. Will re-evaluate with the canonical LUT from PR #1019 before resubmitting. |
manfromnowhere143
added a commit
to manfromnowhere143/parameter-golf
that referenced
this pull request
Apr 18, 2026
…ing-space byte CRITICAL correctness fix. Our inherited train_gdn_7k.py from PR openai#1711's base contained a byte-counting bug that double-credits the leading-space byte: LUT set: base_bytes[i] = len(piece[1:].encode('utf-8')) + 1 Eval adds: tb += (has_leading_space[tgt] & ~is_boundary[prev]) This counts the space twice for every leading-space token (~80%% of tokens in SP1024). Denominator inflated by ~23%%, reported BPB deflated by same factor. PR openai#1711 self-closed for this same bug (author statement: 'byte- counting bug in inherited build_sentencepiece_luts double-credits the leading-space byte...'). Fix: adopt canonical LUT from PR openai#549 / PR openai#1019 verbatim. Base bytes hold UTF-8 length WITHOUT the leading-space marker; eval adds +1 conditionally. Also adds byte-fallback handling (sp.is_byte) and is_unused check that were missing. After this fix, our reported BPBs are on the same scale as the merged leaderboard. v6-v11 results are stale and need re-measurement.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Record Summary
Final submitted score:
val_bpb 1.00980(std0.0015)Reference pre-TTT roundtrip:
1.01902(std0.0017)Hardware: 8×H100 80GB SXM | Artifact: ~15.6 MB | Train: 600s wallclock | TTT eval: ~276s
What Changed
3-Seed Results
Submission Checklist
records/track_10min_16mb/Metric Verification
final_int6_ttt_exactin each seed logfinal_int6_roundtrip_exactin each seed logCredits