Skip to content

Record: GatedDeltaNet FLA + Brotli (No TTT) — val_bpb 1.01902 (3-seed mean)#1712

Closed
aamodbhatt wants to merge 1 commit intoopenai:mainfrom
aamodbhatt:gdn-fla-no-ttt
Closed

Record: GatedDeltaNet FLA + Brotli (No TTT) — val_bpb 1.01902 (3-seed mean)#1712
aamodbhatt wants to merge 1 commit intoopenai:mainfrom
aamodbhatt:gdn-fla-no-ttt

Conversation

@aamodbhatt
Copy link
Copy Markdown
Contributor

Record Summary

Final submitted score: val_bpb 1.01902 (std 0.0017)

Hardware: 8×H100 80GB SXM | Artifact: ~15.6 MB | Train: 600s wallclock | Pure fixed predictor (Track A)

What Changed

3-Seed Results

Seed val_bpb EMA BPB Artifact bytes
1337 1.01720 0.998608 15,595,190
42 1.02054 1.001194 15,602,610
2025 1.01933 1.001260 15,608,600
Mean 1.01902 1.000354
Std 0.0017

Submission Checklist

  • One folder added under records/track_10min_16mb/
  • README.md, submission.json, train_gpt.py, train_gdn_7k.py, 3 seed logs present
  • Training ≤ 600s wallclock
  • All artifacts < 16,000,000 bytes (max: 15,608,600)
  • Eval < 600s
  • No tokenizer/dataset edits
  • No TTT, no SLOT, no ETLB, no n-gram cache
  • Pure fixed predictor — Track A compliant

Metric Verification

  • Score from final_int6_roundtrip_exact in each seed log

Credits

… mean)

GatedDeltaNet linear attention (FLA) K_KVShare_Wider + brotli-11
compression. No TTT — pure fixed predictor (Track A). 3-seed mean:
1.01902 BPB (std 0.0017). All artifacts under 16 MB.

Seeds: 1337 (1.01720), 42 (1.02054), 2025 (1.01933)

Based on PR openai#1687 by @resouer.
@dexhunter
Copy link
Copy Markdown
Contributor

Thank you for the submission. I believe there's a byte-accounting bug that invalidates the reported score — flagging in case you weren't aware.

The issue

In train_gdn_7k.py, the leading-space byte for -prefixed SentencePiece tokens is credited twice.

LUT construction at train_gdn_7k.py:204-217:

if piece.startswith("\u2581"):
    has_space[i] = True
    base_bytes[i] = len(piece[1:].encode("utf-8")) + 1   # pre-credits +1

Then at eval accumulation at train_gdn_7k.py:373-375:

tb = base_bytes_lut[tgt].to(torch.float64)
tb += (has_leading_space_lut[tgt] & ~is_boundary_token_lut[prev]).to(torch.float64)  # +1 again

For any -prefixed target after a non-boundary previous token, one byte is counted twice.

Canonical reference

Merged PR #1019records/track_10min_16mb/2026-03-25_ValCalib_GPTQ_XSA_BigramHash3072/train_gpt.py:266-290:

if piece.startswith("\u2581"):
    has_leading_space_np[token_id] = True
    piece = piece[1:]                                    # strip ▁ first
base_bytes_np[token_id] = len(piece.encode("utf-8"))     # no +1 in LUT
# +1 applied once in eval via has_leading_space & ~is_boundary_token

Numerical impact (sp8192 val stream, 40,540,160 tokens)

  • Current total bytes: 177,825,759
  • Canonical total bytes: ~151-152M
  • Inflation: ~1.17×

Applying the canonical LUT to the val_loss values in the seed logs:

Seed val_loss (nats) Reported val_bpb Canonical val_bpb
42 3.11159 1.02341 ~1.204
0 3.10000 1.01960 ~1.200
1234 3.10645 1.02172 ~1.203

Suggested fix

-    base_bytes[i] = len(piece[1:].encode("utf-8")) + 1
+    base_bytes[i] = len(piece[1:].encode("utf-8"))

After fixing, re-eval with the existing accumulator (which already adds +1 via has_leading_space_lut & ~is_boundary_token_lut). Self-check: the ratio val_loss / val_bpb should be ~2.58 for sp8192 (≈3.73 bytes/token × ln 2); the current ratio of ~3.04 indicates the inflation.

Family note: the same LUT pattern appears in several upstream GDN-family PRs (#1576, #1632, #1687, #1698, #1711) through inherited build_sentencepiece_luts. If you're planning follow-ups, worth fixing once and re-running the family.

Happy to help verify corrected numbers if useful.

@aamodbhatt
Copy link
Copy Markdown
Contributor Author

Closing — same byte-counting bug as PR #1711. The build_sentencepiece_luts LUT double-counts the leading-space byte. Will fix and re-evaluate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants