Skip to content

Record: SP8192 + Headwise Gated Attention + LeakyReLU2 + Legal TTT (val_bpb 1.2073)#1799

Open
jamesEmerson112 wants to merge 1 commit intoopenai:mainfrom
jamesEmerson112:submission/2026-04-24_SP8192_HeadwiseGate_LeakyReLU2_LegalTTT
Open

Record: SP8192 + Headwise Gated Attention + LeakyReLU2 + Legal TTT (val_bpb 1.2073)#1799
jamesEmerson112 wants to merge 1 commit intoopenai:mainfrom
jamesEmerson112:submission/2026-04-24_SP8192_HeadwiseGate_LeakyReLU2_LegalTTT

Conversation

@jamesEmerson112
Copy link
Copy Markdown

Summary

  • val_bpb: 1.2073 (3-seed mean, std 0.0006)
  • Artifact: ~15.34 MB (under 16 MB budget, +0.54 MB headroom)
  • Seeds: 1337 (1.20665), 42 (1.20783), 2025 (1.20746)

Key Techniques

See records/track_10min_16mb/2026-04-24_SP8192_HeadwiseGate_LeakyReLU2_LegalTTT/README.md for full details.

3-seed mean 1.2073 BPB (std 0.0006) on 8xH100 SXM.
SP8192 + headwise gated attention (original) + LeakyReLU(0.5)^2 + QK-Gain 5.0 + score-first TTT.
MODEL_DIM=448, 16.4M params, ~15.34 MB artifact (under 16 MB budget).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant