Non-record: Confidence-Adaptive N-gram Boost on PR #2018 stack, val_bpb=1.05874 by okezue · Pull Request #2129 · openai/parameter-golf

okezue · 2026-05-01T22:50:46Z

Non-record submission

val_bpb = 1.05874 (seed 42, single-seed, ADAPTIVE_BOOST_GAMMA=1) | artifact 15,990,227 bytes | 8xH100 80GB SXM | strict 600s train + eval

This is a non-record submission per README §"Non-record Submissions". It does not clear the 0.005-nat threshold versus PR #1855 SOTA (1.06108) — margin is about 0.00162 nats/byte, below the 0.005 floor. It documents a clean two-line novel addition to the strict token-only n-gram tilt of PR #2018, with a small but consistent positive direction across gamma in {1, 2}.

What is novel

Confidence-Adaptive N-gram Boost. PR #2018's tilt applies a fixed boost beta whenever the prefix-derived hint counter exceeds threshold. This submission scales beta per-position by the NN's own predictive confidence:

beta_t = TOKEN_BOOST * (1 - q_hint_t)^gamma

where q_hint_t = p(h_t | x_<t) is the prefix-only NN distribution at position t evaluated on the hinted token, and gamma is a tunable exponent (env var ADAPTIVE_BOOST_GAMMA, default 0 = original behavior).

When NN already places high probability on the hinted token (q_hint -> 1) we tilt almost not at all (NN already agrees). When NN disagrees (q_hint -> 0) we apply the full tilt. This matches the intuition that the n-gram expert is most useful as a corrective signal precisely when the NN is uncertain or wrong.

Compliance

The closed-form tilt p'(a) = p(a) * exp(beta * 1[a==h]) / Z with Z = 1 + q*(exp(beta)-1) is normalized over the vocab axis for any beta_t >= 0 that depends only on prefix-derived state. Since q_hint_t and h_t are both prefix-only:

C1 causal: ✓ — q_hint_t is the NN distribution at t conditioned on tokens <t. h_t from prefix-only n-gram counter.
C2 normalized: ✓ — Z_t closed-form is the analytic normalizer for the per-position beta_t.
C3 score-before-update: ✓ — applied at scoring time only, no parameter updates from val tokens.
C4 single pass: ✓ — inherited from PR Record: Gated XSA + LQER top-1 + strict token-only n-gram TTT (val_bpb: 1.047) #2018's single-pass sliding eval.

Strict token-only path inherited (WITHIN_BOOST=0 WORD_BOOST=0 AGREE_ADD_BOOST=0). Run logs confirm within_gate=0 word_gate=0 agree2plus=0. This addresses the C1 concerns flagged on PR #2118 and resolved on PR #2018 / PR #1514.

Result

variant	seed	val_bpb	eval_ms	artifact_bytes
baseline (gamma=0, our reproduction)	42	1.05900	479,477	15,991,083
adaptive gamma=1 (this submission)	42	1.05874	449,899	15,990,227
adaptive gamma=2	42	1.05878	450,385	15,990,227

Monotonically positive across gamma in {1, 2}. gamma=1 is the best so far.

Reproduction gap honesty

Our reproduction of PR #2018 lands at val_bpb 1.05900 vs the PR #2018 README's reported 1.04617 for seed 42, a +0.013 BPB gap. The gap is consistent across pre-quant (1.06351 vs 1.04931), quantized (1.07123 vs 1.05773), and post-TTT (1.05900 vs 1.04617), so it's a base-model issue in our reproduction, not a tilt issue. The adaptive-boost layer is reported on top of our reproduction baseline. With a properly reproduced PR #2018 baseline (~1.046), gamma=1 could plausibly land below the 1.054 record threshold versus PR #1855 — we are not confirming that here.

Compliance summary

Train: 596,039 ms < 600,000 ms cap.
Eval: 449,899 ms < 600,000 ms cap.
Artifact: 15,990,227 bytes < 16,000,000 byte cap.
Tokenizer: SP8192 lossless caps caseops v1 reserved (md5 b73929616bf6303b953396b767a29b99).
8xH100 80GB SXM.

Files

README.md, submission.json
train_gpt.py (PR Record: Gated XSA + LQER top-1 + strict token-only n-gram TTT (val_bpb: 1.047) #2018 train script + 2-line adaptive-boost wiring at Hyperparameters.__init__ and apply_tilt_to_ptl_torch call site)
online_ngram_tilt.py (PR Record: Gated XSA + LQER top-1 + strict token-only n-gram TTT (val_bpb: 1.047) #2018 tilt module + adaptive-gamma multiplication in both apply_tilt_to_ptl_torch and apply_tilt_to_ptl_torch_fast)
online_ngram_state.c, prepare_caseops_data.py, lossless_caps.py, run.sh (unchanged from PR Record: Gated XSA + LQER top-1 + strict token-only n-gram TTT (val_bpb: 1.047) #2018)
train_seed42.log

Credits

PR Record: Gated XSA + LQER top-1 + strict token-only n-gram TTT (val_bpb: 1.047) #2018 by @simon-marcus — full base stack (strict token-only n-gram tilt + Gated XSA + LQER top-1 + AWQ-lite + AsymLogit + phased TTT). This submission is two small edits on top.
PR Record: 1.1109 BPB FullGPTQ XSA11 + online (legal) ngram augment #1145 by @AnirudhRahul — n-gram tilt origin
PR Record: SP8192 + Muon 0.97 + Legal Score-First TTT — val_bpb 1.07983 (3-seed mean) #1514 — token-only legality precedent
PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855 by @codemath3000 — prior architecture base
LQER (Yao et al. 2024), GPTQ (Frantar et al. 2023)

cc @cocohearts @valerio-oai @simon-marcus for visibility.

… val_bpb=1.05874 Single-seed non-record submission documenting a two-line novel addition to the strict token-only n-gram tilt path from PR openai#2018: scale the per-position boost beta_t by (1 - q_hint_t)^gamma, where q_hint_t is the prefix-only NN distribution at the scored position evaluated on the hinted token. Down-weights the n-gram tilt when NN already agrees with the hint, up-weights when NN disagrees. Result on seed 42 with ADAPTIVE_BOOST_GAMMA=1: val_bpb 1.05874 vs 1.05900 gamma=0 baseline, monotonically positive across gamma in {1, 2}. Below the 0.005-nat record threshold versus PR openai#1855 SOTA, hence non-record. The closed-form tilt Z_t = 1 + q_hint*(exp(beta_t)-1) preserves C2 normalization for any prefix-derived beta_t, and q_hint comes from the same prefix-only NN distribution used for scoring, so C1 / C2 / C3 / C4 are all satisfied.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Confidence-Adaptive N-gram Boost on PR #2018 stack, val_bpb=1.05874#2129

Non-record: Confidence-Adaptive N-gram Boost on PR #2018 stack, val_bpb=1.05874#2129
okezue wants to merge 1 commit intoopenai:mainfrom
okezue:adaptive-boost-nonrecord

okezue commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

okezue commented May 1, 2026

Non-record submission

What is novel

Compliance

Result

Reproduction gap honesty

Compliance summary

Files

Credits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant