Implement moonshot stack for sub-1.0 BPB: EngramLite, SkipGram, Complementary Training, BackoffNgramMixer#3
Merged
kailean merged 2 commits intocopilot/brainstorm-novel-approachesfrom Apr 1, 2026
Conversation
There was a problem hiding this comment.
Copilot wasn't able to review any files in this pull request.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Owner
|
Make sure that all the rules from the official OpenAI parameter golf challenge are followed. |
… Complementary Training, BackoffNgramMixer) Agent-Logs-Url: https://github.com/kailean/parameter-golf/sessions/c8f0fa4c-b6cd-477c-bec4-17b901e9961d Co-authored-by: kailean <49617037+kailean@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Verify sub 1.0 pbp results and ensure functionality
Implement moonshot stack for sub-1.0 BPB: EngramLite, SkipGram, Complementary Training, BackoffNgramMixer
Apr 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The analysis in
pg_novel_ideas.mdidentified a three-component "moonshot stack" projected to reach 0.92–1.05 BPB, but all code existed only as stubs. This PR integrates all four components intotrain_gpt_mlx_kl.py.New components
EngramLiteEmbeddingENGRAM_LITE_ENABLED=1BigramHashEmbedding. Multi-head bigram+trigram hash tables; learned gate initialised suppressed (sigmoid(-2)≈0.12) fixes the TrigramHash regression (openai#609).SkipGramHashEmbeddingSKIPGRAM_HASH_SIZE=N[-1,-3], [-1,-5], [-2,-4]), capturing structural/template patterns.COMPLEMENT_ALPHA=0.5build_bigram_stats()pre-computes P(next|prev) vianp.bincount+ Laplace smoothing. Loss down-weights tokens with high bigram predictability, specialising the neural model on hard tokens.BackoffNgramMixer+eval_val_sliding_ngramNGRAM_MIXER_ENABLED=1NGRAM_MAX_ORDER. Integrated into a new sliding-window eval path.Design decisions
model.state— captured as a closure constant in_make_compiled_fns(), preventing ~4 MB of stats from appearing in the serialised artifact or QAT recompilation state._make_compiled_fns()helper — consolidates allmx.compilecalls (initial, EMA/SWA restore, QAT trigger) so complementary loss is always consistent.GPT._apply_hash_biases()— deduplicates logit-bias application acrossloss,token_losses, and newtoken_logits.SplitOptimizers._MANAGED_PREFIXES— extended to("blocks.", "bigram_hash.", "skipgram_hash.")so SkipGram embedding weights receive Muon/Adam updates.gate_biasinCONTROL_TENSOR_NAME_PATTERNS— keeps EngramLite's gate in float32, routes to Adam scalar optimizer.Full moonshot run
Smoke test (M1)
All env vars default to off (
0/0.0) so existing runs are unaffected.Original prompt