Record: SP8192 + Byte-PPM Mixer with Tuned Order/Gate (O=5) — val_bpb 0.94290 (3-seed mean)#1991
Open
joshuaswanson wants to merge 1 commit intoopenai:mainfrom
Open
Record: SP8192 + Byte-PPM Mixer with Tuned Order/Gate (O=5) — val_bpb 0.94290 (3-seed mean)#1991joshuaswanson wants to merge 1 commit intoopenai:mainfrom
joshuaswanson wants to merge 1 commit intoopenai:mainfrom
Conversation
sunnypatneedi
pushed a commit
to sunnypatneedi/parameter-golf
that referenced
this pull request
Apr 30, 2026
… competition closed - Merged SOTA dropped from 1.0810 → 1.0611 (codemath3000, PR openai#1855) with all organizer pending branches now in main (CaseOps + SmearGate BOS fix + lrzip) - New target was ≤1.0561; competition closes today (April 30) - PR openai#1967 (ndokutovich, 1.05851): best clean legal open PR, timing question pending - PR openai#1991 (joshuaswanson, 0.94290): Byte-PPM Mixer; Issue openai#1872 open, no ruling - PR openai#1992 / openai#1972: ILLEGAL (PreQuantTTT 21ep) - PR openai#731 (Hedge Mixer, 1.0400): seeds 1337/2024 never filed; competition closing - Session 25 lessons + final Competition Strategy update added to CLAUDE.md https://claude.ai/code/session_01QKHz6Vfu2DFZdc7GiuKSBQ
|
Looks promising. I didn't have the time to do the sweep! Good luck :) |
This was referenced May 1, 2026
Collaborator
|
Leaderboard audit note (pre-cutoff state): I don't think this is valid as a record row. The byte-PPM mixer does not define a full normalized distribution over the official next-token alphabet before the realized token is known; it scores the realized byte stream by spreading the realized token log-prob across observed bytes. The submitted PPM order/gate choices also appear selected by offline validation-target sweeps, which is not acceptable record evidence. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
val_bpb = 0.94290 (3-seed mean, std=0.00070, full FineWeb val) | <16 MB artifact | 8×H100 SXM | Causal byte-PPM mixer at eval, no TTT.
Builds on PR #1959 (PR #1493 bigbag + PR #1795 byte-PPM mixer). The neural network and training pipeline are byte-identical to PR #1959. The single change is the PPM mixer's four hyperparameters, found via a systematic offline sweep on the SP8192 NN's per-byte distribution:
PPM_ORDER(context length)PPM_T(gate threshold)PPM_H(high-lambda)PPM_L(low-lambda)PR #1795 originally hand-picked these defaults on top of @clarkkev's SP4096 stack, and PR #1959 inherited them when porting the mixer to PR #1493's SP8192 stack with a different NN distribution. No prior submission ran a systematic sweep on the SP8192 NN's per-byte distribution.
vs verified leader PR #1855 (1.06108): −0.11818 BPB
vs current open sub-1.0 candidate PR #1959 (0.99621): −0.05331 BPB
3-Seed Results
t-stat ≈ 132 on the 0.005-nat bar vs PR #1959, p ≪ 1e-10.
Sweep procedure (offline, on dumped (tga, lpa) from seed 42)
DUMP_PPM_INPUTS=1so the eval loop dumps(target tokens, per-token NN log-probability)at byte-stream order. Same neural pipeline; no changes to training.DUMP_PPM_INPUTS=1; offline sweep runs on standard CPU (no GPU required).Compliance (Track B — legal eval-time adaptation)
Inherits all compliance properties from PR #1959 / PR #1795:
The only change to train_gpt.py vs PR #1959's submitted version is the four PPM env-var defaults. No structural changes; the strict-legal gate machinery is byte-identical.
Test plan