Record: SP8192 CaseOps + TTT + GPTQ + LRZIP — val_bpb 1.05993 (3-seed mean)#1934
Open
liujshi wants to merge 5 commits intoopenai:mainfrom
Open
Record: SP8192 CaseOps + TTT + GPTQ + LRZIP — val_bpb 1.05993 (3-seed mean)#1934liujshi wants to merge 5 commits intoopenai:mainfrom
liujshi wants to merge 5 commits intoopenai:mainfrom
Conversation
alertcat
added a commit
to alertcat/parameter-golf
that referenced
this pull request
Apr 29, 2026
…review Seed 42 v1: FORCE_STOP_STEP=4920 + GPTQ_RESERVE=0.5 -> wallclock 602.048s (borderline) Seed 42 v2: GPTQ_RESERVE=4.0, no FORCE_STOP_STEP -> wallclock 596.102s (strict <600s) v2 results: seed 42: val_bpb 1.058675 (was 1.058336 in v1, +0.000339 due to 12 fewer steps) seed 0: val_bpb 1.059394 (unchanged) seed 1234: val_bpb 1.060243 (unchanged) MEAN: 1.059434 (was 1.059324 in v1, +0.000110) STD: 0.000642 (was 0.000780 in v1, TIGHTER) All 3 seeds now strict <600s wallclock (596.045-596.102s). All 3 seeds use IDENTICAL config (GPTQ_RESERVE=4.0, no FSS). Comparisons: vs PR openai#1908 frontier (1.06081): -0.00138 (Welch t=2.18, p=0.045) vs PR openai#1855 official openai#1 (1.06108): -0.00165 vs PR openai#1934 liujshi (1.05993): -0.00050 (Welch t=0.85, p=0.22, edge of p<0.25) vs win threshold (1.06021): -0.00078 vs MERGED SOTA bigbag (1.0810): -0.02157 Compliance: all 3 seeds train+eval strict <600s, artifact <16MB, 3-phase TTT score-first, lossless CaseOps tokenizer, lrzip pergroup. Files updated: - V21_README.md: revised results table + revisions note - submission.json: v2 numbers + revisions field - train_seed42.log: replaced with strict <600s redo log
12 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
11L 512d 8H/4KV transformer with U-Net skips, parallel residuals (start layer 8), partial RoPE (16 dims, base 10000), depth recurrence (loop layers 3–5, NUM_LOOPS=2), Polar-Express Newton-Schulz Muon optimizer, CaseOps bijective case transform (SP8192), LQER asymmetric INT2/INT4 rank-4 quant correction (top-3 tensors, group 64), sparse attention head-output gate, SmearGate (window 12), fused softcapped CE Triton kernel, GPTQ int6 + int7 embed, per-group lrzip + brotli compression pipeline (
COMPRESSOR=pergroup, from PR #1855), phased TTT eval (3 phases, score-first, prefix 2000 docs), with 3 hyperparameter overrides (tightened quant clips + embed weight decay) on 8×H100 SXM.3-seed mean: 1.05993 BPB (std 0.00059) / 2.31951 nats (std 0.00106) on 8×H100 SXM, all artifacts under the 16 MB cap.
vs current leaderboard (1.0810 BPB): −0.02107 BPB / −0.04609 nats.
Changes from PR #1797 base
This submission takes PR #1797 (@dexhunter) as its direct base (which itself extends PR #1787 by @nprime06) and applies three targeted changes:
COMPRESSOR=pergroup): replaces PR Record: PR #1787 base + Smear Gate + LQER Asym — val_bpb 1.06157 #1797's default brotli-only compressor with the PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855 (@codemath3000) per-group lrzip + brotli pipeline, saving ~280 KB in artifact size.ATTN_CLIP_SIGMAS13.0 → 12.0,EMBED_CLIP_SIGMAS15.0 → 12.0,MLP_CLIP_SIGMASstays at 12.0. These improve GPTQ quantization fidelity by ~0.001 BPB.EMBED_WD0.085 → 0.06, improving post-quant generalization.All other hyperparameters are unchanged from PR #1797 defaults (beta2=0.95, warmdown_frac=0.75, ttt_lora_rank=96, ttt_beta2=0.999, ttt_weight_decay=1.0, sparse_attn_gate_scale=1.0, phased_ttt_prefix_docs=2000).
Per-group compression pipeline
PR #1797's base only exposes
lzma/brotlicompressors. This submission adds a per-group serializer (COMPRESSOR=pergroup) from PR #1855:qo_bank,kv_bank,mlp_up_bank,mlp_down_bank, etc.) so similarly-distributed weights compress together._tok_emb,attn.c_q,mlp.fc), runs an L1 nearest-neighbour similarity sort on rows before transposing — adjacent rows in the serialized stream are now numerically close, giving the entropy coder longer runs of small deltas. Permutation indices are stored asuint16and brotli-compressed.lrzip -z -L 9(ZPAQ context-mixing back-end). lrzip's long-range deduplication catches cross-tensor repetition that brotli's 24-bit window misses.The
lrzipbinary must be present on the system (apt-get install lrzip). The script shells out viasubprocess.run.Hyperparameter overrides (vs PR #1797)
Full hyperparameter snapshot (from seed 42 log)
Lineage
See
README.mdin this folder for full architecture details, rule compliance, and credits.Test plan