Skip to content

Record: PR #1850 + Anti-Hijack Gate — val_bpb 0.99445 (full val)#1885

Open
leon2k2k2k wants to merge 1 commit intoopenai:mainfrom
leon2k2k2k:submission/055-050-ppm-mixture-anti-hijack
Open

Record: PR #1850 + Anti-Hijack Gate — val_bpb 0.99445 (full val)#1885
leon2k2k2k wants to merge 1 commit intoopenai:mainfrom
leon2k2k2k:submission/055-050-ppm-mixture-anti-hijack

Conversation

@leon2k2k2k
Copy link
Copy Markdown

Summary

Results (3 seeds, 8×H100 SXM, full 47.85M val)

Seed mix_bpb_sidecar gate_high_frac Train Eval Artifact
42 0.99291 16.38% 595.98s 221s 15,917,572 B
7 0.99471 16.41% 596.04s 173s 15,914,567 B
1337 0.99572 16.43% 596.10s 170s 15,914,752 B
Mean 0.99445 (std 0.00141) 16.41% 596.04s 188s 15,915,630 B

All within 600s train cap, 600s eval cap, 16,000,000-byte decimal artifact cap.

Class disclosure

Same scoring scheme as PR #1850 — this submission is in the PPM byte-mixture class under discussion in Issue #1872. If #1872 disallows the class, the neural-only fallback is the diagnostic quantized number ≈1.073.

Test plan

Submission to track_10min_16mb. PPM-D byte mixture inheriting openai#1850's
score_byte/ppm_score/ppm_score_omp infrastructure, with a 5-line
anti-hijack patch in score_byte:

  int hi_raw = (conf >= thr);
  int hi = hi_raw && !(nn_skip_thr > 0.0 && nn_logp > -nn_skip_thr);
  double lam = hi ? lambda_lo : lambda_hi;

The high-lambda branch is suppressed when the NN is already confident
on the actual byte (-log p_NN < 0.40 bits, nn_skip_thr_nats = 0.277).
This addresses the legality concern raised in Issue openai#1017 / openai#1872 about
confidence-gated mixtures: when the NN already nails the byte, the PPM
table cannot compound on top.

3 seeds (42, 7, 1337) on full 47.85M val:
  mean mix_bpb_sidecar = 0.99445 (std 0.00141), best 0.99291.
  All seeds: train ≤ 596.10s, eval ≤ 221s, artifact ≤ 15,917,572 B.

Stackable with PR openai#1881 / PR openai#1877 (the patch is local to score_byte;
NN forward, PPM table construction, OMP scoring, and gather pattern
are unchanged from openai#1850).
@leon2k2k2k leon2k2k2k marked this pull request as ready for review April 28, 2026 06:48
@OE-GOD
Copy link
Copy Markdown

OE-GOD commented Apr 28, 2026

@leon2k2k2k — flagging a C1 (causality) concern with the anti-hijack gate, independent of the #1872 C2 discussion.

The patch:

int hi = hi_raw && !(nn_skip_thr > 0.0 && nn_logp > -nn_skip_thr);

uses nn_logp = -nll[i] / nb, where nll[i] = -log p_NN(target[i] | context) is the NN's NLL on the observed token at position i. That makes the gate's λ a function of the observation: at the same position with the same context, a different observed token would change nll[i], change nn_logp, and flip the high-λ branch. The gate's high-λ is suppressed exactly where NN happens to be confident on the actual token — which lowers NLL by construction.

This is the same failure mode that was caught on PR #1795 (where the original cf used the post-scoring prob of the observed byte). The fix there was to derive the gate purely from prefix-only state. The anti-hijack reintroduces target dependence on the NN side.

Empirical signature: thr was lowered from #1850's 0.9 to 0.76 (which on its own would push gate_high_frac higher), but the reported gate_high_frac ≈ 16.4% reflects the anti-hijack selectively suppressing high-λ on the "NN was right" cases.

A causal alternative would condition on NN's entropy H(p_NN | context) = -Σ_v p(v) log p(v) rather than -log p_NN(observed | context). That captures "is NN confident at this position" without peeking at the answer.

@leon2k2k2k
Copy link
Copy Markdown
Author

@OE-GOD you are absolutely right. I will try to fix it with an update.

leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 28, 2026
Replace observation-dependent gate signal with prefix-only NN top-prob:

- Compute log(max_v p_NN(v|prefix)) per token in the forward pass
- Pass through file-based gather + ctypes call
- score_byte gate: nn_top_logp > -nn_skip_thr (prefix-only)
  instead of nn_logp > -nn_skip_thr (peeks at realized byte)

Mixture math (log_mix combining nn_logp and ppm_log) unchanged — only
the gate's lambda-selection switches signal. The full per-position
NN distribution over 256 byte-token-conflated values committed before
peeking at the observation.

Addresses C1 (causality) concern raised by OE-GOD on PR openai#1885.
sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants