Legality clarification: byte-level PPM-D mixture submissions (#1835 / #1850 / #1854 cluster) under Issue #1017 C2
hi @cocohearts there's a few recent record submissions like #1835, #1850, #1854, #1858, #1862, #1833, #1871, and #1865 reporting val_bpb in the 1.00 to 0.85 range using byte-level PPM-D mixtures.
what the submissions do
according to #1835 and ports in #1854 / #1858:
- NN side: standard sliding-window token scoring (unchanged from base
record). Each token's NLL is then "bit-conservingly spread" across
its bytes, an n-byte token with token probability p assigns
probability p^(1/n) to each of its constituent byte positions.
- PPM side: classical byte-level PPM-D order-5 with Cleary-Witten
escape, state accumulated from already-scored bytes only.
- Mix in probability space:
p_mix = λ · p_NN + (1−λ) · p_PPM,
binary-λ gate on PPM's local confidence. Score
−log p_mix(realized_byte_t). Counts updated AFTER scoring.
I think C1, C3, C4 are clean
- C1 (causality): PPM context at byte
t uses bytes <t only. ✅
- C3 (score-before-update): byte counts at
t reflect bytes <t;
count for byte t is incremented only after −log p_mix(t) is
recorded. ✅
- C4 (single pass): one left-to-right traversal of val bytes, no
rescoring, no oracle selection across passes. ✅
C2 is questionable
Issue #1017 III defines C2 as a normalized distribution over "the
official fixed token alphabet Σ". The submissions construct a
distribution over the byte alphabet (256 symbols) at each byte
position, not over the SP8192 token vocab.
Two possible readings:
(a) Σ = the SP8192 token vocab. Then the mixture isn't a
token-level distribution at all. The NN side's per-byte value is a
scalar functional of the neural token distribution, which the
common-violations table in #1017 VI flags directly:
Entropy expert in context mixer (scalar functional of neural dist,
not a distribution over Σ) — Condition 2 violation
Under this reading, every submission in the PPM-D cluster fails C2.
(b) Σ = the byte alphabet. Then p_NN (via bit-conserving spread)
and p_PPM (classical PPM-D) are both normalized over 256 symbols,
their convex combination is also normalized, and C2 is satisfied.
Under this reading the cluster is legal.
The contest's BPB formula in #1017 V (val_bpb = (total_cross_entropy_nats / log 2) × (token_count / byte_count)) is
written assuming token-level scoring converted via the LUT. A
byte-level scoring path is a different evaluation procedure even when
the byte stream and total bits are identical.
whichever way the ruling goes, having it on the record
will save a lot of agent-cycles between now and April 30.
(also i'm not affiliated with any of the submissions linked above i'm just opening
this so the queue clears.)
Legality clarification: byte-level PPM-D mixture submissions (#1835 / #1850 / #1854 cluster) under Issue #1017 C2
hi @cocohearts there's a few recent record submissions like #1835, #1850, #1854, #1858, #1862, #1833, #1871, and #1865 reporting val_bpb in the 1.00 to 0.85 range using byte-level PPM-D mixtures.
what the submissions do
according to #1835 and ports in #1854 / #1858:
record). Each token's NLL is then "bit-conservingly spread" across
its bytes, an
n-byte token with token probabilitypassignsprobability
p^(1/n)to each of its constituent byte positions.escape, state accumulated from already-scored bytes only.
p_mix = λ · p_NN + (1−λ) · p_PPM,binary-λ gate on PPM's local confidence. Score
−log p_mix(realized_byte_t). Counts updated AFTER scoring.I think C1, C3, C4 are clean
tuses bytes<tonly. ✅treflect bytes<t;count for byte
tis incremented only after−log p_mix(t)isrecorded. ✅
rescoring, no oracle selection across passes. ✅
C2 is questionable
Issue #1017 III defines C2 as a normalized distribution over "the
official fixed token alphabet Σ". The submissions construct a
distribution over the byte alphabet (256 symbols) at each byte
position, not over the SP8192 token vocab.
Two possible readings:
(a) Σ = the SP8192 token vocab. Then the mixture isn't a
token-level distribution at all. The NN side's per-byte value is a
scalar functional of the neural token distribution, which the
common-violations table in #1017 VI flags directly:
Under this reading, every submission in the PPM-D cluster fails C2.
(b) Σ = the byte alphabet. Then
p_NN(via bit-conserving spread)and
p_PPM(classical PPM-D) are both normalized over 256 symbols,their convex combination is also normalized, and C2 is satisfied.
Under this reading the cluster is legal.
The contest's BPB formula in #1017 V (
val_bpb = (total_cross_entropy_nats / log 2) × (token_count / byte_count)) iswritten assuming token-level scoring converted via the LUT. A
byte-level scoring path is a different evaluation procedure even when
the byte stream and total bits are identical.
whichever way the ruling goes, having it on the record
will save a lot of agent-cycles between now and April 30.
(also i'm not affiliated with any of the submissions linked above i'm just opening
this so the queue clears.)