Skip to content

Record: GDN-Hybrid + Sliding Window Attention + compressed-code warmdown1000 - val_bpb 1.01671 (3-seed mean)#1576

Closed
joshkmartinez wants to merge 7 commits intoopenai:mainfrom
joshkmartinez:gdn-hybrid-warmdown
Closed

Record: GDN-Hybrid + Sliding Window Attention + compressed-code warmdown1000 - val_bpb 1.01671 (3-seed mean)#1576
joshkmartinez wants to merge 7 commits intoopenai:mainfrom
joshkmartinez:gdn-hybrid-warmdown

Conversation

@joshkmartinez
Copy link
Copy Markdown

Summary

val_bpb = 1.01671233 (3-seed mean, std 0.00134386)
15.71–15.90 MB

Improves the GDN-Hybrid fixed-predictor line with a warmdown1000 schedule and compressed-code packaging w/o eval-time adaptation.

Seed Steps EMA BPB val_bpb XSA BPB Artifact bytes
42 2227 1.007164 1.016200 1.021202 15,733,879
1337 2242 1.007164 1.015700 1.020105 15,903,365
2024 2227 1.009032 1.018237 1.024111 15,713,422
Mean 1.007787 1.01671233 1.021806 15,783,555.33
Std 0.00134386

Architecture / Technique Stack

  1. SP1024 tokenizer
  2. GDN-Hybrid backbone: [GDN×5] → SWA → [GDN×5] → SWA_shared
  3. Fixed-predictor evaluation path (no TTT / no SLOT / no eval-time adaptation)
  4. MuonEq-R + AdamW training mix
  5. EMA = 0.997
  6. warmdown = 1000
  7. GPTQ int6 + zstd-22 packaging
  8. Compressed-code packaging for train_gpt.py / architectures.py / configs.py to recover artifact headroom

Compliance

  • Fixed-predictor / Track A style submission
  • No TTT
  • No SLOT
  • No RLS
  • No eval-time adaptation
  • All three artifacts under 16,000,000 bytes
  • Training run stays within the 10-minute 8xH100 submission budget

Notes

XSA telemetry is reported for completeness, but the submitted score is the fixed-model quantized_bpb result above.

Credits

@bigbag
Copy link
Copy Markdown

bigbag commented Apr 13, 2026

BPB metric bug: space bytes double-counted (inherited from closed parent PR #1545)

The decompressed train_gpt.py in this PR contains the same build_sentencepiece_luts bug that @SPThole identified in PR #1545, and which @Abhishek8108 acknowledged when closing that PR ("The corrected BPB is ~1.18, not 1.028").

Bugged code in this PR (decompressed from the LZMA self-extractor):

# build_sentencepiece_luts, around line 217
if piece.startswith("▁"):
    has_space[i] = True
    base_bytes[i] = len(piece[1:].encode("utf-8")) + 1   # +1 adds the space byte

Then the eval loop adds the same space byte again:

tb += (has_leading_space_lut[tgt] & ~is_boundary_token_lut[prev]).to(torch.float64)

Reference implementation (train_gpt.py at repo root, lines 186–189):

piece = sp.id_to_piece(token_id)
if piece.startswith("▁"):
    has_leading_space_np[token_id] = True
    piece = piece[1:]                       # strip ▁
base_bytes_np[token_id] = len(piece.encode("utf-8"))   # NO +1 here

The reference counts the space byte exactly once (in the eval loop, conditioned on ~is_boundary_token_lut[prev]). The bugged version counts it in both places for every ▁-prefixed token, inflating the byte denominator and deflating the reported BPB.

Running the parent PR's corrected LUT on the same checkpoint lands in the ~1.16–1.18 range (per @Abhishek8108's own correction on #1545), not 1.01671.

Bug was missed here because the training code is wrapped in an LZMA self-extractor, which hides it from standard review. Suggest the maintainers decompress and re-score before this shifts the leaderboard.

sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 13, 2026
…ai#1586 per-layer GPTQ highest-EV

- PR openai#758 n-gram effectively dead: MatoTeziTanka (Apr 12) flagged XOR hash
  includes target token, same illegality as openai#727/openai#741
- GDN-Hybrid BPB bug confirmed: PR openai#1576 space-token double-count inflates
  denominator ~14%; actual score ~1.16-1.18, not 1.01671
- PR openai#1586 (dexhunter, 1.07493): Per-Layer Adaptive GPTQ MLP=12σ/Attn=13σ +
  int7 Emb (saves 530KB) + MLR=0.026; -0.0127 nats vs SOTA; implement now
- PR openai#1584: systems-only (fused Muon, batched EMA, loader prealloc) ~+20 steps
- Casefold Tokenizer (openai#1578/openai#1585): legality debated; await organizer ruling
- New paper: arXiv:2604.06169 In-Place TTT (Apr 7) NTP-aligned score-first TTT
- Merged SOTA 1.0810 unchanged (4-day stable streak); target ≤1.0760; 17 days

https://claude.ai/code/session_01BE8wc8zxvZAo52QBXSNiL8
@joshkmartinez
Copy link
Copy Markdown
Author

Good catch, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants