Non-Record: PR #1901 base + LQER Asymmetric + Brotli/Byte-Shuffle Compression by squ11z1 · Pull Request #1927 · openai/parameter-golf

squ11z1 · 2026-04-29T12:02:54Z

Summary

Non-record submission proposing two orthogonal additions to PR #1901's stack (DualHash + AdaMuon + MoE + SDClip, val_bpb 0.83353 pending):

LQER asymmetric rank-4 post-quantization correction (port from PR Record: PR #1787 base + Smear Gate + LQER Asym — val_bpb 1.06157 #1797 @dexhunter, first application to a Sigma-Delta-quantized stack).
Brotli-11 + stride-2 byte-shuffle replacing LZMA (idea from PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855).

Patched train_gpt.py is LZMA-base85-wrapped (18,204 bytes vs PR #1901's 53,443 raw — 65.9% code-byte saving).

Status: non-record

A $25 starter grant + remaining personal balance funded two single-seed bid attempts on 8×H100 SXM. Both were preempted before producing an artifact:

2026-04-26: preempted at training step ~4,000 of ~6,700 (different stack — PR Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean) #1493+LQER); train_loss 2.91 at step 4,000 preserved in partial_run_2026-04-26.log (HF dataset).
2026-04-29 (this stack): preempted at HuggingFace data prefetch 50% (125M / 250M tokens per rank); preserved as partial_run_2026-04-29.log.

A $500 development grant filed 2026-04-27 did not return a decision before deadline. Submitting as non-record discussion: implementation + theoretical δ-BPB estimate, no measured val_bpb.

Theoretical δ-BPB estimate

Contribution	Mechanism	Estimated δ
LQER asym rank-4 top-K=2	INT2 A + INT4 B per-group-64 SVD factors recover Sigma-Delta residual	−0.002 to −0.005 BPB
Brotli-11 + byte-shuffle	~150–280 KB compression saving → larger model	−0.002 to −0.005 BPB
Combined		−0.005 to −0.010 BPB

Projected on PR #1901 base 0.83353: 0.823–0.829 BPB.

LQER δ is conservatively lower than PR #1797's measured −0.009 BPB on Hessian-GPTQ, because Sigma-Delta error diffusion auto-compensates within-row error (smaller residual variance → less for LQER to recover).

Test plan

Patch applies cleanly to PR Record: 0.8335 BPB — DualHash + AdaMuon + MoE + SDClip (3-seed mean) #1901 (function-level replacement, syntax check)
LZMA-base85 wrapper round-trips (compile + decompress identity verified)
Patched code launches on 8×H100 SXM (verified up to data prefetch in partial_run_2026-04-29.log)
Pending: 3-seed val_bpb on 8×H100 SXM with full 600s training cap
Pending: artifact size verification under 16 MB
Pending: ablation LQER_TOP_K ∈ {1, 2, 3} × LQER_RANK ∈ {2, 4, 8}
Pending: Brotli vs LZMA artifact size A/B on identical model

If validated post-deadline, I commit to providing 3-seed logs as a follow-up update.

Attribution

Base stack PR Record: 0.8335 BPB — DualHash + AdaMuon + MoE + SDClip (3-seed mean) #1901: @Karen042009
LQER asymmetric: @dexhunter (PR Record: PR #1787 base + Smear Gate + LQER Asym — val_bpb 1.06157 #1797)
LQER paper: Lee et al. 2023 (arXiv:2310.18313)
Brotli + byte-shuffle idea: @dexhunter (PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855)

Compliance (verifiable from code)

INT6 SDClip + INT2/INT4 LQER + fp16 scales, Brotli-11 — all standard tensor types and public format
No PPM mixture (avoids the Issue Legality clarification: byte-level PPM-D mixture submissions (#1835 / #1850 / #1854 cluster) under Issue #1017 C2 #1872 / PR Report: PPM-D byte-level scoring is not a valid probability distribution, and why it appears to gain #1905 probability-distribution dispute)
Score-First TTT inherited verbatim from PR Record: 0.8335 BPB — DualHash + AdaMuon + MoE + SDClip (3-seed mean) #1901
Training within 600s wallclock cap (PR Record: 0.8335 BPB — DualHash + AdaMuon + MoE + SDClip (3-seed mean) #1901's auto-configured schedule unchanged)
Eval within 600s cap (LQER dequant adds <1s for top-K=2)

Files

README.md — submission documentation
submission.json — metadata with theoretical δ estimate (val_bpb fields null pending validation)
train_gpt.py — LZMA-wrapped patched code (18,204 bytes)
train_gpt_unwrapped.py — raw patched source for review
partial_run_2026-04-29.log — data-prefetch log up to preemption

squ11z1 added 5 commits April 29, 2026 14:01

Add README.md

440cc34

Add submission.json

c99a540

Add train_gpt.py

3b1f003

Add train_gpt_unwrapped.py

f865a24

Add partial_run_2026-04-29.log

f144aaf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-Record: PR #1901 base + LQER Asymmetric + Brotli/Byte-Shuffle Compression#1927

Non-Record: PR #1901 base + LQER Asymmetric + Brotli/Byte-Shuffle Compression#1927
squ11z1 wants to merge 5 commits intoopenai:mainfrom
squ11z1:non-record-pr1901-lqer-brotli

squ11z1 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

squ11z1 commented Apr 29, 2026

Summary

Status: non-record

Theoretical δ-BPB estimate

Test plan

Attribution

Compliance (verifiable from code)

Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant