Record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.0274 (2-seed mean) by Hkoyuer · Pull Request #1632 · openai/parameter-golf

Hkoyuer · 2026-04-15T02:27:41Z

Summary

GDN-Hybrid reproduction: val_bpb 1.0274 (2-seed cold-cache mean)
2 seeds (1337, 42) on fresh 8xH100 SXM pods, 590s training each
All artifacts under 16MB (14.6-14.8 MB)
No TTT, fixed predictor

Results

Seed	Steps	EMA BPB	Quantized BPB	Artifact
1337	1857	1.018060	1.026927	15,524,240
42	1856	1.018499	1.027811	15,305,698
Mean	—	1.018280	1.027369	—

Novel Research

Includes ECN (Error Correction Network) research achieving -0.039 BPB at zero artifact cost. See ECN_RESEARCH.md.

Test plan

2-seed cold-cache training on 8xH100 SXM
All artifacts under 16MB
All runs under 590s training
Round-trip GPTQ quantization verified
Judges verify reproducibility

Based on GDN-Hybrid architecture from PR #1545 by @dexhunter.

Author: Hamza Koyuer (@Hkoyuer) — Helolinks.com

@dexhunter

2-seed cold-cache runs on 8xH100 SXM: - Seed 1337: 1.026927 BPB (1857 steps, 14.81 MB) - Seed 42: 1.027811 BPB (1856 steps, 14.60 MB) - Mean: 1.027369 BPB Based on GDN-Hybrid architecture from PR openai#1545 by @dexhunter. Includes novel ECN research (see ECN_RESEARCH.md).

dexhunter · 2026-04-15T07:43:37Z

Two things worth flagging:

Attribution — the PR credits PR #1545 to me, but that submission is by @Abhishek8108 (looks like an LLM hallucination).

BPB bug — this inherits the same leading-space double-counting bug that led to PR #1545 being closed:

if piece.startswith("▁"):
    base_bytes[i] = len(piece[1:].encode("utf-8")) + 1  # space byte baked in

The +1 bakes the space byte in, and has_leading_space_lut adds it again in eval. With ~65% of SP1024 tokens carrying ▁, the byte denominator is inflated ~14%, deflating the reported BPB by the same factor.

@Abhishek8108 acknowledged this on closing PR #1545:

The corrected BPB is ~1.18, not 1.028.

Independent reproduction with the same GDN-Hybrid architecture lands at ~1.26 BPB across six config variants on SP1024 — consistent with the corrected range, and well above the merged record at 1.0810.

Worth re-scoring with the reference LUT before this is treated as a record.

Hkoyuer · 2026-04-15T11:20:14Z

Thanks @dexhunter for catching this. Rescoring with
the corrected byte counting. Updated results coming.

tejasnaladala · 2026-04-16T20:41:39Z

Hey @Hkoyuer, I think this PR has the same build_sentencepiece_luts byte-counting bug that led to #1545 and #1576 being closed. Flagging it because the claimed val_bpb 1.0274 comes from the same byte accounting path.

The bug, in this PR

records/track_10min_16mb/2026-04-15_GDN_Hybrid_DeltaRule_1.028/train_gpt.py, lines 207 to 220:

def build_sentencepiece_luts(sp, vocab_size, device):
    base_bytes = torch.zeros(vocab_size, dtype=torch.float32, device=device)
    has_space = torch.zeros(vocab_size, dtype=torch.bool, device=device)
    is_boundary = torch.zeros(vocab_size, dtype=torch.bool, device=device)
    for i in range(vocab_size):
        piece = sp.id_to_piece(i)
        raw = piece.encode("utf-8")
        base_bytes[i] = len(raw)
        if piece.startswith("\u2581"):
            has_space[i] = True
            base_bytes[i] = len(piece[1:].encode("utf-8")) + 1   # +1 is the space byte
        if sp.is_control(i) or sp.is_unknown(i):
            is_boundary[i] = True
    return base_bytes, has_space, is_boundary

Then the eval loop at lines 376 to 377 adds the same space byte a second time:

tb = base_bytes_lut[tgt].to(torch.float64)
tb += (has_leading_space_lut[tgt] & ~is_boundary_token_lut[prev]).to(torch.float64)

So for any (prev, tgt) pair where tgt starts with \u2581 and prev is not a boundary token, the byte count includes the space byte twice. byte_count gets inflated and the reported BPB = loss / byte_count comes out lower than the reference metric would give.

The main-branch reference

train_gpt.py on main handles this correctly, around lines 180 to 202:

def build_sentencepiece_luts(...):
    ...
    for token_id in range(sp_vocab_size):
        ...
        piece = sp.id_to_piece(token_id)
        if piece.startswith("\u2581"):
            has_leading_space_np[token_id] = True
            piece = piece[1:]                     # strip the underscore
        base_bytes_np[token_id] = len(piece.encode("utf-8"))  # no +1

The piece = piece[1:] strips the underscore, base_bytes stores only the token content, and the eval loop adds the space byte exactly once when the previous token is not a boundary. One path one addition.

Expected magnitude

For SP1024 on FineWeb a typical \u2581-prefixed token has maybe 3 to 5 content bytes. Adding a second space byte on most non-boundary transitions inflates the denominator by roughly 14 to 18 percent across the validation set, which translates to a reported BPB that is materially lower than the reference would return. @Abhishek8108 wrote on #1545 that the corrected number on that submission was around 1.18 instead of 1.028. Since #1632 inherits the same LUT logic on the same architecture class and tokenizer I expect the corrected #1632 number to land in a similar range. The exact number has to come from a rescore on the submitted artifacts not from me, but the direction and rough magnitude should match.

Suggested fix

Drop the current build_sentencepiece_luts and copy the one from main verbatim. Main's version also handles sp.is_byte, sp.is_control, sp.is_unknown, sp.is_unused, and sizes the table defensively, which the current version does not fully cover. The eval loop in #1632 does not need changes once the LUT is fixed.

Concretely:

Strip \u2581 before assigning base_bytes[i].
Remove the + 1.
Let the eval loop's tb += (has_leading_space_lut[tgt] & ~is_boundary_token_lut[prev]) do the single conditional add.

Request

Could you rescore the submitted run_seed_1337 and run_seed_42 artifacts with the main-branch build_sentencepiece_luts, or patch the LUT in this PR and update the claimed BPB? Flagging here because #1545 and #1576 already went through the same cycle and I don't want another closed PR if a patch is possible.

Not trying to attack the GDN-Hybrid architecture work, just the inherited accounting. A GDN-Hybrid submission with the correct LUT would still be a useful datapoint on where that architecture actually sits.

Related context: #1545 (closed), #1576 (closed), main branch train_gpt.py.

tejasnaladala mentioned this pull request Apr 16, 2026

Record: GDN-Hybrid + TMA Megakernel + Brotli-11 — val_bpb 1.01195 (3-seed mean) #1672

Closed

6 tasks

Hkoyuer closed this Apr 16, 2026

bigbag mentioned this pull request Apr 17, 2026

Record: K_KVShare_Wider full-recipe FLA — val_bpb 1.04090 (3-seed mean) #1687

Closed

This was referenced Apr 18, 2026

Record: GatedDeltaNet FLA + Brotli (No TTT) — val_bpb 1.01902 (3-seed mean) #1712

Closed

Byte-accounting bug in build_sentencepiece_luts affects GDN-family submissions #1719

Open

Record: K_KVShare_Wider FLA — val_bpb 1.0339 (3-seed mean) #1791

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.0274 (2-seed mean)#1632

Record: GDN-Hybrid (Gated DeltaNet + SWA) — val_bpb 1.0274 (2-seed mean)#1632
Hkoyuer wants to merge 1 commit intoopenai:mainfrom
Hkoyuer:submission/gdn-hybrid-1.028

Hkoyuer commented Apr 15, 2026

Uh oh!

dexhunter commented Apr 15, 2026

Uh oh!

Hkoyuer commented Apr 15, 2026

Uh oh!

tejasnaladala commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Hkoyuer commented Apr 15, 2026

Summary

Results

Novel Research

Test plan

Uh oh!

dexhunter commented Apr 15, 2026

Uh oh!

Hkoyuer commented Apr 15, 2026

Uh oh!

tejasnaladala commented Apr 16, 2026

The bug, in this PR

The main-branch reference

Expected magnitude

Suggested fix

Request

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants