Record: GatedDeltaNet FLA + Brotli (No TTT) — val_bpb 1.01902 (3-seed mean) by aamodbhatt · Pull Request #1712 · openai/parameter-golf

aamodbhatt · 2026-04-18T04:15:15Z

Record Summary

Final submitted score: val_bpb 1.01902 (std 0.0017)

Hardware: 8×H100 80GB SXM | Artifact: ~15.6 MB | Train: 600s wallclock | Pure fixed predictor (Track A)

What Changed

GatedDeltaNet (FLA) K_KVShare_Wider architecture from PR Record: K_KVShare_Wider full-recipe FLA — val_bpb 1.04090 (3-seed mean) #1687 — O(n) linear attention replacing softmax
Brotli-11 compression instead of zstd-22 — saves ~900KB, all artifacts well under 16 MB
No TTT — pure fixed predictor, no eval-time adaptation

3-Seed Results

Seed	val_bpb	EMA BPB	Artifact bytes
1337	1.01720	0.998608	15,595,190
42	1.02054	1.001194	15,602,610
2025	1.01933	1.001260	15,608,600
Mean	1.01902	1.000354	—
Std	0.0017	—	—

Submission Checklist

One folder added under records/track_10min_16mb/
README.md, submission.json, train_gpt.py, train_gdn_7k.py, 3 seed logs present
Training ≤ 600s wallclock
All artifacts < 16,000,000 bytes (max: 15,608,600)
Eval < 600s
No tokenizer/dataset edits
No TTT, no SLOT, no ETLB, no n-gram cache
Pure fixed predictor — Track A compliant

Metric Verification

Score from final_int6_roundtrip_exact in each seed log

Credits

GatedDeltaNet architecture: PR Record: K_KVShare_Wider full-recipe FLA — val_bpb 1.04090 (3-seed mean) #1687 by @resouer
Flash Linear Attention: @sustcsonglin (fla-core 0.4.2)

@resouer

… mean) GatedDeltaNet linear attention (FLA) K_KVShare_Wider + brotli-11 compression. No TTT — pure fixed predictor (Track A). 3-seed mean: 1.01902 BPB (std 0.0017). All artifacts under 16 MB. Seeds: 1337 (1.01720), 42 (1.02054), 2025 (1.01933) Based on PR openai#1687 by @resouer.

dexhunter · 2026-04-18T13:32:42Z

Thank you for the submission. I believe there's a byte-accounting bug that invalidates the reported score — flagging in case you weren't aware.

The issue

In train_gdn_7k.py, the leading-space byte for ▁-prefixed SentencePiece tokens is credited twice.

LUT construction at train_gdn_7k.py:204-217:

if piece.startswith("\u2581"):
    has_space[i] = True
    base_bytes[i] = len(piece[1:].encode("utf-8")) + 1   # pre-credits +1

Then at eval accumulation at train_gdn_7k.py:373-375:

tb = base_bytes_lut[tgt].to(torch.float64)
tb += (has_leading_space_lut[tgt] & ~is_boundary_token_lut[prev]).to(torch.float64)  # +1 again

For any ▁-prefixed target after a non-boundary previous token, one byte is counted twice.

Canonical reference

Merged PR #1019 — records/track_10min_16mb/2026-03-25_ValCalib_GPTQ_XSA_BigramHash3072/train_gpt.py:266-290:

if piece.startswith("\u2581"):
    has_leading_space_np[token_id] = True
    piece = piece[1:]                                    # strip ▁ first
base_bytes_np[token_id] = len(piece.encode("utf-8"))     # no +1 in LUT
# +1 applied once in eval via has_leading_space & ~is_boundary_token

Numerical impact (sp8192 val stream, 40,540,160 tokens)

Current total bytes: 177,825,759
Canonical total bytes: ~151-152M
Inflation: ~1.17×

Applying the canonical LUT to the val_loss values in the seed logs:

Seed	val_loss (nats)	Reported val_bpb	Canonical val_bpb
42	3.11159	1.02341	~1.204
0	3.10000	1.01960	~1.200
1234	3.10645	1.02172	~1.203

Suggested fix

-    base_bytes[i] = len(piece[1:].encode("utf-8")) + 1
+    base_bytes[i] = len(piece[1:].encode("utf-8"))

After fixing, re-eval with the existing accumulator (which already adds +1 via has_leading_space_lut & ~is_boundary_token_lut). Self-check: the ratio val_loss / val_bpb should be ~2.58 for sp8192 (≈3.73 bytes/token × ln 2); the current ratio of ~3.04 indicates the inflation.

Family note: the same LUT pattern appears in several upstream GDN-family PRs (#1576, #1632, #1687, #1698, #1711) through inherited build_sentencepiece_luts. If you're planning follow-ups, worth fixing once and re-running the family.

Happy to help verify corrected numbers if useful.

aamodbhatt · 2026-04-18T13:42:26Z

Closing — same byte-counting bug as PR #1711. The build_sentencepiece_luts LUT double-counts the leading-space byte. Will fix and re-evaluate.

dexhunter mentioned this pull request Apr 18, 2026

Byte-accounting bug in build_sentencepiece_luts affects GDN-family submissions #1719

Open

aamodbhatt closed this Apr 18, 2026

dexhunter mentioned this pull request Apr 23, 2026

Record: K_KVShare_Wider FLA — val_bpb 1.0339 (3-seed mean) #1791

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: GatedDeltaNet FLA + Brotli (No TTT) — val_bpb 1.01902 (3-seed mean)#1712

Record: GatedDeltaNet FLA + Brotli (No TTT) — val_bpb 1.01902 (3-seed mean)#1712
aamodbhatt wants to merge 1 commit intoopenai:mainfrom
aamodbhatt:gdn-fla-no-ttt

aamodbhatt commented Apr 18, 2026

Uh oh!

dexhunter commented Apr 18, 2026

Uh oh!

aamodbhatt commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aamodbhatt commented Apr 18, 2026

Record Summary

What Changed

3-Seed Results

Submission Checklist

Metric Verification

Credits

Uh oh!

dexhunter commented Apr 18, 2026

Uh oh!

aamodbhatt commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants