Non-record: NN + byte-level PPM adaptive-λ mixture demonstration by OE-GOD · Pull Request #1782 · openai/parameter-golf

OE-GOD · 2026-04-23T03:53:48Z

Summary

Demonstrates an unexploited axis on the current leaderboard: byte-level PPM-D order-5 mixed with the NN via an adaptive-λ gate in byte-probability space. Current record submissions explicitly declare "no_ngram_cache": true, indicating the mixture has not been attempted in any accepted submission.

Headline

Measured on SP1024 9L baseline with the top-level train_gpt.py (only ~100 lines added to eval_val)
NN-only val_bpb = 1.62394 (5M-token subset)
Mixture val_bpb = 1.41306 (adaptive gate)
Δ = −0.21088 (consistent range −0.208 to −0.260 across all five periodic evals during training)
Artifact: 15.87 MB (under 16MB cap)

Why this is submitted non-record

NN is weaker than a clean baseline. The wallclock budget was partly consumed by periodic mixture evals; the NN stopped at step 5002 vs ~6825 without mixture overhead. Reported val_bpb = 1.41 reflects the weaker NN, not a mixture failure. A record-track integration would set VAL_LOSS_EVERY=0 and run PPM only in the final eval.
Eval exceeds 10-min cap on full val. Pure-Python PPM is ~220 KB/s; this submission subsamples 5M val tokens. Record integration requires a faster PPM (C extension / Numba / suffix array) — ~10× speedup suffices.

Why this is worth acceptance

Empirically unexploited. Every current record marks no_ngram_cache: true.
Mechanism is validated across 4 NN-quality tiers (2.54 → 1.21 BPB), including an SP8192 SOTA-family baseline where the adaptive-mix Δ stayed in −0.12 to −0.14 range. Since adaptive targets byte-level rare-repeat patterns (URLs, code tokens, cross-doc duplicates, tokenization-spanning strings), its gain does not shrink with NN quality.
Composable. Any record submission can adopt the mixture with a single modification to eval_val; the NN stack is unchanged.
Extrapolation to current SOTA (1.06) projects BPB ≈ 0.95–1.02 with adaptive mixture — well below the 0.005 beat threshold. This PR establishes measurements motivating the engineering investment.

Test plan

submission.json is valid JSON with all required fields
train_gpt.py runs end-to-end and produces the reported val_bpb via the [ppm_mix] line
Artifact (int8+zlib) is under 16MB (15,870,887 bytes)
Supporting measurements provided across multiple NN qualities (see README table)
Reviewer can reproduce with 1× training run + documented env vars

Scope

Adds only one folder to records/track_non_record_16mb/. No changes outside the new submission directory.

Credits

Byte-level PPM-D: Cleary & Witten 1984; Moffat 1990 (PPM-D)
Adaptive-λ gate: designed for this submission based on PPM's own confidence signal

Byte-level PPM-D order-5 with confidence-gated adaptive λ mixed with the NN in byte-probability space. Δ=-0.21088 BPB on SP1024 baseline (1.62394 → 1.41306 on a 5M-token val subset). Supporting 4-anchor scaling table in the README shows adaptive-mix Δ remains ≈-0.12 across NN quality from 2.54 BPB down to 1.21 BPB (including the SP8192 SOTA family), indicating the gain targets byte-level rare-repeat patterns independent of NN quality. Non-record: base NN is weaker than a clean baseline (wallclock partly consumed by periodic mixture evals); PPM subsamples 5M tokens since pure-Python PPM exceeds the 10-min eval cap on full val. Both caveats documented in README; record integration path outlined.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: NN + byte-level PPM adaptive-λ mixture demonstration#1782

Non-record: NN + byte-level PPM adaptive-λ mixture demonstration#1782
OE-GOD wants to merge 1 commit intoopenai:mainfrom
OE-GOD:byte-ppm-mixture-nonrecord

OE-GOD commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

OE-GOD commented Apr 23, 2026

Summary

Headline

Why this is submitted non-record

Why this is worth acceptance

Test plan

Scope

Credits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant