Record: 1.1558 BPB — 11L U-Net + Catalytic + SwiGLU + SW64#507
Record: 1.1558 BPB — 11L U-Net + Catalytic + SwiGLU + SW64#507skarakulak wants to merge 1 commit intoopenai:mainfrom
Conversation
11 layers with gated U-Net skip connections, catalytic residuals, SwiGLU MLP, value residual, sliding window eval (stride=64). Int5/Int6 mixed quantization + zstd-22. 15.1MB artifact. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Community Review — Record: 1.1558 BPB — 11L U-Net + Catalytic + SwiGLU + SW64Compliance flag: Pre-Quant TTT violation PR #507 — 11L U-Net + Catalytic + SwiGLU + SW64Author: skarakulak Check 1: N-gram Family Bug (target token in hash key)CLEAN. Check 2: Pre-Quant TTT — multi-epoch AdamW on val_tokens without score-firstVIOLATION.
The submitted BPB (1.1558) comes from Check 3: Legal TTT (score-first-per-chunk)
Verdict: CLOSE — Pre-Quant TTT violation (10-epoch AdamW on val_tokens before scoring). Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: Recommend CLOSE unless the author disables Reviewed by @MatoTeziTanka — The Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source. If this review misread your code, please call it out so I can re-audit manually. |
Summary
Techniques
Results
Pre-quant EMA: 1.1606 → Post-quant int5/6+zstd: 1.1723 → Sliding window: 1.1558
Files
train_gpt.py— self-contained training + eval scriptsubmission.json— structured resultstrain_seed1337.log— full training log