Skip to content

Record: K_KVShare_Wider FLA — val_bpb 1.0339 (3-seed mean)#1705

Closed
genji0306 wants to merge 3 commits intoopenai:mainfrom
genji0306:submission/gdn-kv-wider-ttt
Closed

Record: K_KVShare_Wider FLA — val_bpb 1.0339 (3-seed mean)#1705
genji0306 wants to merge 3 commits intoopenai:mainfrom
genji0306:submission/gdn-kv-wider-ttt

Conversation

@genji0306
Copy link
Copy Markdown

@genji0306 genji0306 commented Apr 17, 2026

Summary

This PR packages an Opensens reproduction of the K_KVShare_Wider FLA family on 8xH100 SXM.

Update: while preparing the submission, I confirmed that the scored path inherited the same SentencePiece byte-accounting bug that closed #1687. Specifically, build_sentencepiece_luts in train_gdn_7k.py did not match the base repo semantics for leading-space, byte, and unused tokens.

I have patched that scorer path and am rerunning the 3-seed baseline under the corrected LUT logic. Until those reruns finish, please treat the previously reported 1.0339 mean as invalid.

Technique

  • GatedDeltaNet / Flash Linear Attention (K_KVShare_Wider)
  • 10 GDN layers, model_dim=544
  • SP8192 tokenizer
  • EMA (0.997) + SWA (every 50 steps) + late QAT (Int6 STE) + zstd-22
  • No TTT, no SLOT, no n-gram overlay in the rescored baseline

Reproducibility

pip install -r requirements.txt
MATCHED_FINEWEB_REPO_ID=kevclark/parameter-golf python3 data/cached_challenge_fineweb.py --variant sp8192
SEED=$SEED ARCH_MODE=K MAX_WALLCLOCK_SECONDS=600 VAL_LOSS_EVERY=0 EVAL_COMPILE_ENABLED=0   torchrun --standalone --nproc_per_node=8 train_gpt.py

Follow-up

I will update this PR with:

  • corrected 3-seed results
  • exact train logs for all seeds
  • updated submission metadata

resouer and others added 2 commits April 16, 2026 22:59
…ranch

This branch lifts the validated review package onto a clean upstream/main base so the official submission diff stays to one records folder and one commit. The package keeps the faithful multi-file surface because the packed single-file experiments drifted, while a direct smoke on the current multi-file surface matched the measured candidate within noise.

Constraint: The submission branch must contain only records/ files and must keep the exact measured candidate surface.
Rejected: Reuse the existing fork review branch as-is | it carries many exploratory commits and is noisier than a clean submit branch
Rejected: Promote the packed single-file variant | it was not fidelity-cleared for this candidate
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If packaging changes again, rerun at least one packaged smoke before treating the branch as submission-ready
Tested: py_compile on packaged Python files; exact folder-size audit (15,991,282 bytes total); packaged multi-file smoke on PR-head surface at 1.03971272 BPB
Not-tested: Re-running the full 3-seed sweep on this rebased records-only branch (package contents unchanged)
Independent 3-seed reproduction of GatedDeltaNet K_KVShare_Wider on
8xH100 SXM. Builds on PR openai#1687 (resouer). No TTT, no SLOT, no n-gram.

Seeds: 42 (1.0353), 1337 (1.0333), 2025 (1.0330)
Mean: 1.0339 ± 0.0012 | Artifact: 15.88 MB mean
@genji0306 genji0306 marked this pull request as draft April 17, 2026 23:30
- is_boundary defaults to True (was zeros)
- skip control/unknown/unused tokens early
- handle byte tokens as 1 byte explicitly
- strip sentencepiece space marker before UTF-8 encoding
- use int16 for base_bytes (was float32)

Same bug that closed PR openai#1687.
@genji0306
Copy link
Copy Markdown
Author

Closing — corrected SentencePiece LUT scoring (same bug as #1687) gives 3-seed mean 1.223 BPB, far behind SOTA 1.081. The 1.034 claim was a byte-accounting artifact.

@genji0306 genji0306 closed this Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants