Skip to content

Record: SP8192 + PE + SmearGate + AttnOutGate + 4ep TTT — val_bpb 1.0770 (3-seed mean)#1825

Closed
EthanYangTW wants to merge 7 commits intoopenai:mainfrom
EthanYangTW:submission/v2-pe-smeargate-attngate-1.0770
Closed

Record: SP8192 + PE + SmearGate + AttnOutGate + 4ep TTT — val_bpb 1.0770 (3-seed mean)#1825
EthanYangTW wants to merge 7 commits intoopenai:mainfrom
EthanYangTW:submission/v2-pe-smeargate-attngate-1.0770

Conversation

@EthanYangTW
Copy link
Copy Markdown

Summary

  • 3-seed mean val_bpb: 1.0770 (std 0.0004) on 8×H100 SXM
  • Improvement: -0.0013 BPB vs our previous (1.0783)
  • All 3 seeds: 1.0772, 1.0765, 1.0772
  • Artifact: ~15.98 MB

3-Seed Results

Seed Sliding BPB TTT BPB Artifact (bytes)
1337 1.0785 1.0772 15,982,989
42 1.0777 1.0765 15,984,317
2024 1.0784 1.0772 15,985,404
Mean 1.0782 1.0770 15,984,237

Innovation

  1. Polar Express NS coefficients (PR Record: SP4096 + Polar Express + MuonEq-R + Depth Recurrence — 1.0923 BPB (3-seed) #1344)
  2. MIN_LR=0.10 warmdown floor (PR Record: PR #1736 + Polar Express NS + MIN_LR + Sparse Attn Gate + Fused CE + PR #1767 TTT — val_bpb 1.06335 #1787)
  3. QK-Gain 5.25 (PR Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean) #1493)
  4. SmearGate — causal content-gated residual, zero-init (PR RECORD: SmearGate + Attention Output Gate + Legal TTT | val_bpb=1.07139 #1667)
  5. Attention Output Gate — per-head sigmoid gate, width=12, zero-init (PR RECORD: SmearGate + Attention Output Gate + Legal TTT | val_bpb=1.07139 #1667)
  6. 4 TTT epochs (was 3)

Compliance (Issue #1017, Track B)

  • Train < 600s (599.6s)
  • Eval < 600s (531s)
  • Artifact < 16MB (15.98MB)
  • Score before update (each chunk scored under no_grad before TTT)
  • No SLOT, no pre-quant TTT, no n-gram cache, no CaseOps, no global TTT

Attribution

@abaybektursun (PR #1420), @clarkkev (PR #1394), @dexhunter (PR #1331), @aryanbhosale (PR #1477), @resouer (PR #1460), @orangekame3 (PR #1344), @nprime06 (PR #1787), @MarioPaerle (PR #1667), @bigbag (PR #1493)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant