Record: K_KVShare_Wider FLA — val_bpb 1.0339 (3-seed mean)#1705
Closed
genji0306 wants to merge 3 commits intoopenai:mainfrom
Closed
Record: K_KVShare_Wider FLA — val_bpb 1.0339 (3-seed mean)#1705genji0306 wants to merge 3 commits intoopenai:mainfrom
genji0306 wants to merge 3 commits intoopenai:mainfrom
Conversation
…ranch This branch lifts the validated review package onto a clean upstream/main base so the official submission diff stays to one records folder and one commit. The package keeps the faithful multi-file surface because the packed single-file experiments drifted, while a direct smoke on the current multi-file surface matched the measured candidate within noise. Constraint: The submission branch must contain only records/ files and must keep the exact measured candidate surface. Rejected: Reuse the existing fork review branch as-is | it carries many exploratory commits and is noisier than a clean submit branch Rejected: Promote the packed single-file variant | it was not fidelity-cleared for this candidate Confidence: high Scope-risk: narrow Reversibility: clean Directive: If packaging changes again, rerun at least one packaged smoke before treating the branch as submission-ready Tested: py_compile on packaged Python files; exact folder-size audit (15,991,282 bytes total); packaged multi-file smoke on PR-head surface at 1.03971272 BPB Not-tested: Re-running the full 3-seed sweep on this rebased records-only branch (package contents unchanged)
Independent 3-seed reproduction of GatedDeltaNet K_KVShare_Wider on 8xH100 SXM. Builds on PR openai#1687 (resouer). No TTT, no SLOT, no n-gram. Seeds: 42 (1.0353), 1337 (1.0333), 2025 (1.0330) Mean: 1.0339 ± 0.0012 | Artifact: 15.88 MB mean
- is_boundary defaults to True (was zeros) - skip control/unknown/unused tokens early - handle byte tokens as 1 byte explicitly - strip sentencepiece space marker before UTF-8 encoding - use int16 for base_bytes (was float32) Same bug that closed PR openai#1687.
Author
|
Closing — corrected SentencePiece LUT scoring (same bug as #1687) gives 3-seed mean 1.223 BPB, far behind SOTA 1.081. The 1.034 claim was a byte-accounting artifact. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR packages an Opensens reproduction of the
K_KVShare_WiderFLA family on8xH100 SXM.Update: while preparing the submission, I confirmed that the scored path inherited the same SentencePiece byte-accounting bug that closed
#1687. Specifically,build_sentencepiece_lutsintrain_gdn_7k.pydid not match the base repo semantics for leading-space, byte, and unused tokens.I have patched that scorer path and am rerunning the 3-seed baseline under the corrected LUT logic. Until those reruns finish, please treat the previously reported
1.0339mean as invalid.Technique
K_KVShare_Wider)model_dim=5440.997) + SWA (every 50 steps) + late QAT (Int6 STE) +zstd-22Reproducibility
pip install -r requirements.txt MATCHED_FINEWEB_REPO_ID=kevclark/parameter-golf python3 data/cached_challenge_fineweb.py --variant sp8192 SEED=$SEED ARCH_MODE=K MAX_WALLCLOCK_SECONDS=600 VAL_LOSS_EVERY=0 EVAL_COMPILE_ENABLED=0 torchrun --standalone --nproc_per_node=8 train_gpt.pyFollow-up
I will update this PR with: