Record: SP8192 CaseOps v13 PPM tuned gate — fresh 3-seed mean 0.94175270#2083
Record: SP8192 CaseOps v13 PPM tuned gate — fresh 3-seed mean 0.94175270#2083NewyorkDev wants to merge 5 commits intoopenai:mainfrom
Conversation
|
Fresh end-to-end rerun evidence is still being collected. Seed 42 has completed cleanly and is included in this PR; seed 314 is currently running; seed 999 is queued to start after seed 314 exits. The current headline score remains based on the existing three-seed evidence until the fresh rerun set is complete. |
|
Fresh rerun status update: seed 42 completed cleanly at ppm_sliding val_bpb 0.94182660; seed 314 completed cleanly at ppm_sliding val_bpb 0.94146034; seed 999 is now running as v13_submit_clean_s999_20260501_043637. I left the headline score unchanged until seed 999 finishes, because the fresh set is still incomplete. Thanks again to the public Parameter Golf contributors credited in REFERENCES.md, Claude for experiment/design help, and Codex for orchestration, implementation, audit, packaging, and PR maintenance. |
|
Fresh clean rerun set is now complete and pushed in commit Final fresh end-to-end evidence with submitted defaults: All three artifacts are under the strict 16,000,000 byte cap and eval stays under 600s. The earlier eval-only mean was 0.94174862; the fresh full rerun set is cleaner and is now used in |
Summary
fresh val_bpb = 0.94175270 (3-seed mean, sample std=0.00026331, full FineWeb val) | strict <16 MB artifact | 8xH100 SXM | causal sidecar-aware byte PPM, no TTT.
This is our strongest v13 lane: SP8192 CaseOps + SmearGate BOS masking + per-group
lrzipcompression + PPM order-5 evaluator. The final delta is a narrow PPM gate retune:PPM_ORDERPPM_HPPM_LPPM_TRelative to PR #1991's open
0.94290three-seed mean, this is about -0.00115 BPB on the same seed set.Fresh 3-seed results
ppm_sliding val_bpbAll three fresh evals finish under the 600s eval cap. All three artifacts are under the strict decimal 16,000,000 byte cap; the largest measured total is 15,988,348 bytes.
The earlier eval-only three-seed mean was
0.94174862; the fresh end-to-end rerun set is cleaner and is now used insubmission.json.Compliance notes
TTT_ENABLED=0: no validation-set gradient update for the submitted score.lrzipis required as a preinstalled system binary for the per-group compressor; the script does not download packages during training/eval.Test plan
python3 -m py_compile train_gpt.pypython3 -m json.tool submission.jsonThanks to Claude for late-stage experiment design help, to Codex for implementation/audit/packaging/run coordination, and to the Parameter Golf community for the public SP8192, PPM, SmearGate, compression, and quantization ideas this builds on.
Attribution
README.md and REFERENCES.md explicitly credit inherited public Parameter Golf components: SP8192/tokenizer and recurrence lineage (PR #1394, #1493, #1855), byte-PPM lineage (PR #1795, #1959, #1991), SmearGate/BOS masking lineage (modded-nanogpt @classiclarryd, PR #1667, #1797, #2014), compression lineage (PR #1586, #1667, #1729), and quantization/optimizer/scoring pieces (PR #1530, #1886, #1923, #1344, #1145, #1967). The v13-specific contribution is the consolidation, sidecar-aware packaging, and final PPM gate retune to H=0.999/L=0.18/T=0.80.
0.9.mp4