Record: PR #1850 + Anti-Hijack Gate — val_bpb 0.99445 (full val)#1885
Record: PR #1850 + Anti-Hijack Gate — val_bpb 0.99445 (full val)#1885leon2k2k2k wants to merge 1 commit intoopenai:mainfrom
Conversation
Submission to track_10min_16mb. PPM-D byte mixture inheriting openai#1850's score_byte/ppm_score/ppm_score_omp infrastructure, with a 5-line anti-hijack patch in score_byte: int hi_raw = (conf >= thr); int hi = hi_raw && !(nn_skip_thr > 0.0 && nn_logp > -nn_skip_thr); double lam = hi ? lambda_lo : lambda_hi; The high-lambda branch is suppressed when the NN is already confident on the actual byte (-log p_NN < 0.40 bits, nn_skip_thr_nats = 0.277). This addresses the legality concern raised in Issue openai#1017 / openai#1872 about confidence-gated mixtures: when the NN already nails the byte, the PPM table cannot compound on top. 3 seeds (42, 7, 1337) on full 47.85M val: mean mix_bpb_sidecar = 0.99445 (std 0.00141), best 0.99291. All seeds: train ≤ 596.10s, eval ≤ 221s, artifact ≤ 15,917,572 B. Stackable with PR openai#1881 / PR openai#1877 (the patch is local to score_byte; NN forward, PPM table construction, OMP scoring, and gather pattern are unchanged from openai#1850).
|
@leon2k2k2k — flagging a C1 (causality) concern with the anti-hijack gate, independent of the #1872 C2 discussion. The patch: int hi = hi_raw && !(nn_skip_thr > 0.0 && nn_logp > -nn_skip_thr);uses This is the same failure mode that was caught on PR #1795 (where the original Empirical signature: A causal alternative would condition on NN's entropy |
|
@OE-GOD you are absolutely right. I will try to fix it with an update. |
Replace observation-dependent gate signal with prefix-only NN top-prob: - Compute log(max_v p_NN(v|prefix)) per token in the forward pass - Pass through file-based gather + ctypes call - score_byte gate: nn_top_logp > -nn_skip_thr (prefix-only) instead of nn_logp > -nn_skip_thr (peeks at realized byte) Mixture math (log_mix combining nn_logp and ppm_log) unchanged — only the gate's lambda-selection switches signal. The full per-position NN distribution over 256 byte-token-conflated values committed before peeking at the observation. Addresses C1 (causality) concern raised by OE-GOD on PR openai#1885.
…penai#1885 PPM-D anti-hijack; Day 19 plateau; Session 24 https://claude.ai/code/session_013L9as27k9K4K8JGVwtj8Vw
Summary
track_10min_16mbbuilding on PR Record: SP8192 + Strict Full-Val Byte PPM Mixture — 1.00495 BPB (3-seed mean) #1850's PPM-D byte-mixture infrastructure.score_byte: high-λ branch is suppressed when the NN is already confident on the actual byte (-log p_NN < 0.40 bits,nn_skip_thr_nats = 0.277).score_byte; NN forward, PPM table construction, OMP scoring, and gather pattern are unchanged from Record: SP8192 + Strict Full-Val Byte PPM Mixture — 1.00495 BPB (3-seed mean) #1850.Results (3 seeds, 8×H100 SXM, full 47.85M val)
All within 600s train cap, 600s eval cap, 16,000,000-byte decimal artifact cap.
Class disclosure
Same scoring scheme as PR #1850 — this submission is in the PPM byte-mixture class under discussion in Issue #1872. If #1872 disallows the class, the neural-only fallback is the diagnostic quantized number ≈1.073.
Test plan