Record: GDN-Hybrid + Sliding Window Attention + compressed-code warmdown1000 - val_bpb 1.01671 (3-seed mean)#1576
Conversation
|
BPB metric bug: space bytes double-counted (inherited from closed parent PR #1545) The decompressed Bugged code in this PR (decompressed from the LZMA self-extractor): # build_sentencepiece_luts, around line 217
if piece.startswith("▁"):
has_space[i] = True
base_bytes[i] = len(piece[1:].encode("utf-8")) + 1 # +1 adds the space byteThen the eval loop adds the same space byte again: tb += (has_leading_space_lut[tgt] & ~is_boundary_token_lut[prev]).to(torch.float64)Reference implementation ( piece = sp.id_to_piece(token_id)
if piece.startswith("▁"):
has_leading_space_np[token_id] = True
piece = piece[1:] # strip ▁
base_bytes_np[token_id] = len(piece.encode("utf-8")) # NO +1 hereThe reference counts the space byte exactly once (in the eval loop, conditioned on Running the parent PR's corrected LUT on the same checkpoint lands in the ~1.16–1.18 range (per @Abhishek8108's own correction on #1545), not 1.01671. Bug was missed here because the training code is wrapped in an LZMA self-extractor, which hides it from standard review. Suggest the maintainers decompress and re-score before this shifts the leaderboard. |
…ai#1586 per-layer GPTQ highest-EV - PR openai#758 n-gram effectively dead: MatoTeziTanka (Apr 12) flagged XOR hash includes target token, same illegality as openai#727/openai#741 - GDN-Hybrid BPB bug confirmed: PR openai#1576 space-token double-count inflates denominator ~14%; actual score ~1.16-1.18, not 1.01671 - PR openai#1586 (dexhunter, 1.07493): Per-Layer Adaptive GPTQ MLP=12σ/Attn=13σ + int7 Emb (saves 530KB) + MLR=0.026; -0.0127 nats vs SOTA; implement now - PR openai#1584: systems-only (fused Muon, batched EMA, loader prealloc) ~+20 steps - Casefold Tokenizer (openai#1578/openai#1585): legality debated; await organizer ruling - New paper: arXiv:2604.06169 In-Place TTT (Apr 7) NTP-aligned score-first TTT - Merged SOTA 1.0810 unchanged (4-day stable streak); target ≤1.0760; 17 days https://claude.ai/code/session_01BE8wc8zxvZAo52QBXSNiL8
|
Good catch, thanks! |
Summary
val_bpb = 1.01671233 (3-seed mean, std 0.00134386)
15.71–15.90 MB
Improves the GDN-Hybrid fixed-predictor line with a warmdown1000 schedule and compressed-code packaging w/o eval-time adaptation.
Architecture / Technique Stack
Compliance
Notes
XSA telemetry is reported for completeness, but the submitted score is the fixed-model quantized_bpb result above.
Credits