Non-record: NN + byte-level PPM adaptive-λ mixture demonstration#1782
Open
OE-GOD wants to merge 1 commit intoopenai:mainfrom
Open
Non-record: NN + byte-level PPM adaptive-λ mixture demonstration#1782OE-GOD wants to merge 1 commit intoopenai:mainfrom
OE-GOD wants to merge 1 commit intoopenai:mainfrom
Conversation
Byte-level PPM-D order-5 with confidence-gated adaptive λ mixed with the NN in byte-probability space. Δ=-0.21088 BPB on SP1024 baseline (1.62394 → 1.41306 on a 5M-token val subset). Supporting 4-anchor scaling table in the README shows adaptive-mix Δ remains ≈-0.12 across NN quality from 2.54 BPB down to 1.21 BPB (including the SP8192 SOTA family), indicating the gain targets byte-level rare-repeat patterns independent of NN quality. Non-record: base NN is weaker than a clean baseline (wallclock partly consumed by periodic mixture evals); PPM subsamples 5M tokens since pure-Python PPM exceeds the 10-min eval cap on full val. Both caveats documented in README; record integration path outlined.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Demonstrates an unexploited axis on the current leaderboard: byte-level PPM-D order-5 mixed with the NN via an adaptive-λ gate in byte-probability space. Current record submissions explicitly declare
"no_ngram_cache": true, indicating the mixture has not been attempted in any accepted submission.Headline
train_gpt.py(only ~100 lines added toeval_val)Why this is submitted non-record
val_bpb = 1.41reflects the weaker NN, not a mixture failure. A record-track integration would setVAL_LOSS_EVERY=0and run PPM only in the final eval.Why this is worth acceptance
no_ngram_cache: true.eval_val; the NN stack is unchanged.Test plan
submission.jsonis valid JSON with all required fieldstrain_gpt.pyruns end-to-end and produces the reportedval_bpbvia the[ppm_mix]lineScope
Adds only one folder to
records/track_non_record_16mb/. No changes outside the new submission directory.Credits