Record: SP8192 + Improved Parallel Residuals + Muon 0.97 + TTT 5ep + N-gram Tilt + Hessian SDClip — val_bpb 1.07730#1557
Conversation
…t TTT 5ep + Causal N-gram Tilt + Hessian SDClip — val_bpb 1.07730 (3-seed mean)
Note on AT-RISK flag — legality of the n-gram tiltPosting this proactively since the community OLYMPUS tracker (@MatoTeziTanka's running audit in issue #140) has this submission flagged AT-RISK under the "N-gram cache (03-27)" category. OLYMPUS is a careful community compliance index, not an official maintainer ruling, but it's a reasonable heuristic that reviewers consult — so I'd rather put my reasoning on the record than let the flag stand unanswered. The right answer may still be that I'm wrong, but I'd rather the decision be informed. What
Relation to the March 27 sweep. The sweep closed submissions where unnormalized n-gram caches produced apparent gains around 1 BPB (Robby955 on #677, 2026-03-30: "unnormalized apparent gain was ~1 BPB at 1M buckets; after proper normalization the real signal is ~0.003 BPB"). My measured tilt contribution is in the ~0.003 BPB range, matching the "real signal" bound for normalized approaches rather than the bucket-counting artifact that triggered the sweep. Lineage. The kernel is derived from PR #1420 (ContextMixer). Two normalized tilt PRs — #1145 (collision-free trie) and #1420 — remained open after the sweep; I read that as the line being drawn at normalization discipline rather than at any use of hash-backed lookups. My implementation uses #1420's hash mechanism with the same explicit normalization. 4 conditions (Issue #1017). C1: hint at Where I may be wrong. The honest disagreement point is whether "normalized tilt output built on a collision-prone hash" satisfies the spirit of C2. My reading: collisions only affect which hint is selected, not the validity of the final distribution, which remains exact. If maintainers rule instead that hash-backed hint sources are incompatible with C2 regardless of output normalization, I accept that — closing this PR is the correct outcome under that reading. Not litigating — just making the reasoning visible. |
Record: SP8192 + Improved Parallel Residuals + Score-First TTT + Causal N-gram Tilt + Hessian SDClip
val_bpb = 1.07730 (3-seed mean, std 0.00040) | ~15.97 MB | 8xH100 SXM
3-Seed Results
Merged SOTA (PR #1493): 1.0810. Delta: -0.00370 nats.
Techniques
Compliance
Attribution
Compute
Funded by OpenAI Advanced Competitor grant ($500 RunPod credit). 8xH100-SXM, ~3 runs for 3 seeds.