Record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.0274 (2-seed mean)#1632
Record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.0274 (2-seed mean)#1632Hkoyuer wants to merge 1 commit intoopenai:mainfrom
Conversation
2-seed cold-cache runs on 8xH100 SXM: - Seed 1337: 1.026927 BPB (1857 steps, 14.81 MB) - Seed 42: 1.027811 BPB (1856 steps, 14.60 MB) - Mean: 1.027369 BPB Based on GDN-Hybrid architecture from PR openai#1545 by @dexhunter. Includes novel ECN research (see ECN_RESEARCH.md).
|
Two things worth flagging: Attribution — the PR credits PR #1545 to me, but that submission is by @Abhishek8108 (looks like an LLM hallucination). BPB bug — this inherits the same leading-space double-counting bug that led to PR #1545 being closed: if piece.startswith("▁"):
base_bytes[i] = len(piece[1:].encode("utf-8")) + 1 # space byte baked inThe @Abhishek8108 acknowledged this on closing PR #1545:
Independent reproduction with the same GDN-Hybrid architecture lands at ~1.26 BPB across six config variants on SP1024 — consistent with the corrected range, and well above the merged record at 1.0810. Worth re-scoring with the reference LUT before this is treated as a record. |
|
Thanks @dexhunter for catching this. Rescoring with |
|
Hey @Hkoyuer, I think this PR has the same The bug, in this PR
def build_sentencepiece_luts(sp, vocab_size, device):
base_bytes = torch.zeros(vocab_size, dtype=torch.float32, device=device)
has_space = torch.zeros(vocab_size, dtype=torch.bool, device=device)
is_boundary = torch.zeros(vocab_size, dtype=torch.bool, device=device)
for i in range(vocab_size):
piece = sp.id_to_piece(i)
raw = piece.encode("utf-8")
base_bytes[i] = len(raw)
if piece.startswith("\u2581"):
has_space[i] = True
base_bytes[i] = len(piece[1:].encode("utf-8")) + 1 # +1 is the space byte
if sp.is_control(i) or sp.is_unknown(i):
is_boundary[i] = True
return base_bytes, has_space, is_boundaryThen the eval loop at lines 376 to 377 adds the same space byte a second time: tb = base_bytes_lut[tgt].to(torch.float64)
tb += (has_leading_space_lut[tgt] & ~is_boundary_token_lut[prev]).to(torch.float64)So for any The main-branch reference
def build_sentencepiece_luts(...):
...
for token_id in range(sp_vocab_size):
...
piece = sp.id_to_piece(token_id)
if piece.startswith("\u2581"):
has_leading_space_np[token_id] = True
piece = piece[1:] # strip the underscore
base_bytes_np[token_id] = len(piece.encode("utf-8")) # no +1The Expected magnitudeFor SP1024 on FineWeb a typical Suggested fixDrop the current Concretely:
RequestCould you rescore the submitted Not trying to attack the GDN-Hybrid architecture work, just the inherited accounting. A GDN-Hybrid submission with the correct LUT would still be a useful datapoint on where that architecture actually sits. Related context: #1545 (closed), #1576 (closed), main branch |
Summary
Results
Novel Research
Includes ECN (Error Correction Network) research achieving -0.039 BPB at zero artifact cost. See ECN_RESEARCH.md.
Test plan
Based on GDN-Hybrid architecture from PR #1545 by @dexhunter.
Author: Hamza Koyuer (@Hkoyuer) — Helolinks.com