Record: Order-Adaptive Entropy Gating + XSA-All (val_bpb=0.9370)#774
Record: Order-Adaptive Entropy Gating + XSA-All (val_bpb=0.9370)#774travispchen wants to merge 1 commit intoopenai:mainfrom
Conversation
…ed mean) N-gram7 BPB: 0.9370 (±0.0003) across seeds 1337/42/2025 Sliding BPB: 1.1222 (±0.0003) Artifact: ~15.9 MB (within 16MB cap) Training: 600s on 8xH100 Key innovation: order-adaptive entropy gating assigns different entropy thresholds per n-gram order. High-order matches (7-gram) trusted at moderate model confidence; low-order matches (2-gram) only trusted when model is very uncertain. Built on PR openai#753 (Podracing II) with XSA extended to all 11 layers and entropy_center=3.0. Co-Authored-By: Travis Chen <[email protected]>
|
we need to share notes! i mean.. we jsut did =) but claibrations. I havent calibrated it yet |
Community Review — Order-Adaptive Entropy Gating + XSA-AllBPB: 0.9370 (n-gram7) / 1.1222 (sliding) | Seeds: 3 (1337/42/2025) | Artifact: 15.83 MB | Compliance: FLAG (target-in-key n-gram cache) What this does: Builds on PR #753 (Podracing II) with two changes: (1) extends XSA from the last 4 layers to all 11 layers with What I found in the code:
Questions:
Standalone merits (independent of the cache question):
Verdict: COMPLIANCE FLAG — the n-gram eval cache appears to use the same target-in-key hashed construction that @valerio-oai ruled out on PR #779, and that closed PR #798 (same author) earlier today. The training pipeline, ablation table, XSA-all change, and order-adaptive gating mechanism all look clean on their own; the issue is isolated to lines 1125/1154/1193 of the eval loop. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_OK 0.06s, HAS_HYPERPARAMETERS=True, HAS_GPT=True, model_dim=512, num_heads=8, num_layers=11, vocab=1024, train_seq_len=2048, code_bytes=110175, SMOKE_TEST_PASS. AI tooling: review drafted with Claude Code (Sonnet/Opus) using an internal review template; all citations, file paths, and compliance audits were verified against the PR's actual code at SHA |
N-gram7 BPB: 0.9370 (±0.0003) across seeds 1337/42/2025
Sliding BPB: 1.1222 (±0.0003)
Artifact: ~15.9 MB (within 16MB cap)
Training: 600s on 8xH100
Key innovation: order-adaptive entropy gating assigns different entropy thresholds per n-gram order. High-order matches (7-gram) trusted at moderate model confidence; low-order matches (2-gram) only trusted when model is very uncertain.
Built on PR #753 (Podracing II) with XSA extended to all 11 layers and entropy_center=3.0.