Skip to content

Record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.0274 (2-seed mean)#1632

Closed
Hkoyuer wants to merge 1 commit intoopenai:mainfrom
Hkoyuer:submission/gdn-hybrid-1.028
Closed

Record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.0274 (2-seed mean)#1632
Hkoyuer wants to merge 1 commit intoopenai:mainfrom
Hkoyuer:submission/gdn-hybrid-1.028

Conversation

@Hkoyuer
Copy link
Copy Markdown

@Hkoyuer Hkoyuer commented Apr 15, 2026

Summary

  • GDN-Hybrid reproduction: val_bpb 1.0274 (2-seed cold-cache mean)
  • 2 seeds (1337, 42) on fresh 8xH100 SXM pods, 590s training each
  • All artifacts under 16MB (14.6-14.8 MB)
  • No TTT, fixed predictor

Results

Seed Steps EMA BPB Quantized BPB Artifact
1337 1857 1.018060 1.026927 15,524,240
42 1856 1.018499 1.027811 15,305,698
Mean 1.018280 1.027369

Novel Research

Includes ECN (Error Correction Network) research achieving -0.039 BPB at zero artifact cost. See ECN_RESEARCH.md.

Test plan

  • 2-seed cold-cache training on 8xH100 SXM
  • All artifacts under 16MB
  • All runs under 590s training
  • Round-trip GPTQ quantization verified
  • Judges verify reproducibility

Based on GDN-Hybrid architecture from PR #1545 by @dexhunter.

Author: Hamza Koyuer (@Hkoyuer) — Helolinks.com

2-seed cold-cache runs on 8xH100 SXM:
- Seed 1337: 1.026927 BPB (1857 steps, 14.81 MB)
- Seed 42:   1.027811 BPB (1856 steps, 14.60 MB)
- Mean:      1.027369 BPB

Based on GDN-Hybrid architecture from PR openai#1545 by @dexhunter.
Includes novel ECN research (see ECN_RESEARCH.md).
@dexhunter
Copy link
Copy Markdown
Contributor

Two things worth flagging:

Attribution — the PR credits PR #1545 to me, but that submission is by @Abhishek8108 (looks like an LLM hallucination).

BPB bug — this inherits the same leading-space double-counting bug that led to PR #1545 being closed:

if piece.startswith("▁"):
    base_bytes[i] = len(piece[1:].encode("utf-8")) + 1  # space byte baked in

The +1 bakes the space byte in, and has_leading_space_lut adds it again in eval. With ~65% of SP1024 tokens carrying , the byte denominator is inflated ~14%, deflating the reported BPB by the same factor.

@Abhishek8108 acknowledged this on closing PR #1545:

The corrected BPB is ~1.18, not 1.028.

Independent reproduction with the same GDN-Hybrid architecture lands at ~1.26 BPB across six config variants on SP1024 — consistent with the corrected range, and well above the merged record at 1.0810.

Worth re-scoring with the reference LUT before this is treated as a record.

@Hkoyuer
Copy link
Copy Markdown
Author

Hkoyuer commented Apr 15, 2026

Thanks @dexhunter for catching this. Rescoring with
the corrected byte counting. Updated results coming.

@tejasnaladala
Copy link
Copy Markdown

Hey @Hkoyuer, I think this PR has the same build_sentencepiece_luts byte-counting bug that led to #1545 and #1576 being closed. Flagging it because the claimed val_bpb 1.0274 comes from the same byte accounting path.

The bug, in this PR

records/track_10min_16mb/2026-04-15_GDN_Hybrid_DeltaRule_1.028/train_gpt.py, lines 207 to 220:

def build_sentencepiece_luts(sp, vocab_size, device):
    base_bytes = torch.zeros(vocab_size, dtype=torch.float32, device=device)
    has_space = torch.zeros(vocab_size, dtype=torch.bool, device=device)
    is_boundary = torch.zeros(vocab_size, dtype=torch.bool, device=device)
    for i in range(vocab_size):
        piece = sp.id_to_piece(i)
        raw = piece.encode("utf-8")
        base_bytes[i] = len(raw)
        if piece.startswith("\u2581"):
            has_space[i] = True
            base_bytes[i] = len(piece[1:].encode("utf-8")) + 1   # +1 is the space byte
        if sp.is_control(i) or sp.is_unknown(i):
            is_boundary[i] = True
    return base_bytes, has_space, is_boundary

Then the eval loop at lines 376 to 377 adds the same space byte a second time:

tb = base_bytes_lut[tgt].to(torch.float64)
tb += (has_leading_space_lut[tgt] & ~is_boundary_token_lut[prev]).to(torch.float64)

So for any (prev, tgt) pair where tgt starts with \u2581 and prev is not a boundary token, the byte count includes the space byte twice. byte_count gets inflated and the reported BPB = loss / byte_count comes out lower than the reference metric would give.

The main-branch reference

train_gpt.py on main handles this correctly, around lines 180 to 202:

def build_sentencepiece_luts(...):
    ...
    for token_id in range(sp_vocab_size):
        ...
        piece = sp.id_to_piece(token_id)
        if piece.startswith("\u2581"):
            has_leading_space_np[token_id] = True
            piece = piece[1:]                     # strip the underscore
        base_bytes_np[token_id] = len(piece.encode("utf-8"))  # no +1

The piece = piece[1:] strips the underscore, base_bytes stores only the token content, and the eval loop adds the space byte exactly once when the previous token is not a boundary. One path one addition.

Expected magnitude

For SP1024 on FineWeb a typical \u2581-prefixed token has maybe 3 to 5 content bytes. Adding a second space byte on most non-boundary transitions inflates the denominator by roughly 14 to 18 percent across the validation set, which translates to a reported BPB that is materially lower than the reference would return. @Abhishek8108 wrote on #1545 that the corrected number on that submission was around 1.18 instead of 1.028. Since #1632 inherits the same LUT logic on the same architecture class and tokenizer I expect the corrected #1632 number to land in a similar range. The exact number has to come from a rescore on the submitted artifacts not from me, but the direction and rough magnitude should match.

Suggested fix

Drop the current build_sentencepiece_luts and copy the one from main verbatim. Main's version also handles sp.is_byte, sp.is_control, sp.is_unknown, sp.is_unused, and sizes the table defensively, which the current version does not fully cover. The eval loop in #1632 does not need changes once the LUT is fixed.

Concretely:

  1. Strip \u2581 before assigning base_bytes[i].
  2. Remove the + 1.
  3. Let the eval loop's tb += (has_leading_space_lut[tgt] & ~is_boundary_token_lut[prev]) do the single conditional add.

Request

Could you rescore the submitted run_seed_1337 and run_seed_42 artifacts with the main-branch build_sentencepiece_luts, or patch the LUT in this PR and update the claimed BPB? Flagging here because #1545 and #1576 already went through the same cycle and I don't want another closed PR if a patch is possible.

Not trying to attack the GDN-Hybrid architecture work, just the inherited accounting. A GDN-Hybrid submission with the correct LUT would still be a useful datapoint on where that architecture actually sits.

Related context: #1545 (closed), #1576 (closed), main branch train_gpt.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants