Record: MuonEq-R + Depth Recurrence + N61 Mixed GPTQ — val_bpb 1.0924 (3-seed mean)#1279
Record: MuonEq-R + Depth Recurrence + N61 Mixed GPTQ — val_bpb 1.0924 (3-seed mean)#1279dexhunter wants to merge 1 commit intoopenai:mainfrom
Conversation
… (3-seed mean) Improves PR openai#1260 (1.0929) by using N_INT6=61 (one more int6 layer) with a smaller mini runner (21,396 bytes) that creates enough headroom. 3-seed mean: 1.0924 BPB / 2.5133 nats (seeds 42, 0, 7) All seeds under 16MB (max: 15,996,591 bytes) No TTT, no SLOT, no eval-time adaptation. Techniques: MuonEq-R optimizer, depth recurrence (layers 4,5 shared MLP), 61 int6 + 5 int5 Hessian-ranked GPTQ, brotli-11 compression. Built on PR openai#1218 by @clarkkev.
New architecture: instead of N independent transformer blocks, use K shared blocks cycled to N virtual layers, with per-layer FiLM conditioning (learned scale vectors for attn/mlp/residual per virtual layer). Saves massive parameters — 3 shared blocks for 9 virtual layers uses ~6.5M vs 17.1M params, freeing artifact budget. This is genuinely novel for parameter-golf: no submission has tried feature-wise linear modulation for depth conditioning. The closest is PR openai#1279's LoRA adapters, but FiLM is much cheaper (1024 params per virtual layer vs ~8K for LoRA rank-4). Experiments running: Standard 9L vs FiLM 3→9 vs FiLM 3→18 vs FiLM 1→9. Also includes best_full_run.log: Kitchen Sink seq2048 at 600s reached 1.2698 BPB (1338 steps, 15.6MB artifact). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Community Review — Record: MuonEq-R + Depth Recurrence + N61 Mixed GPTQ — val_bpb 1.0924 (3-seed mean)Compliance: NEEDS AUTHOR ACTION — What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with: A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:
Recommendation: Could you run Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — SyntaxError: f-string: expecting '}' (line 574). Classification via |
Summary
Key Innovation: N_INT6=61
PR #1260 used N_INT6=60. By regenerating a smaller self-extracting mini runner (21,396 bytes vs 87K standalone), we freed enough artifact budget to fit one additional int6 layer. N_INT6=61 improves BPP by ~0.001 per seed with zero architecture change — purely a quantization precision upgrade.
Results (8xH100 80GB SXM, PyTorch 2.9.1+cu128)
Changes from PR #1218
Credits
Test plan