Skip to content

Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT + V-Gated — val_bpb 1.0796 (3-seed mean)#1770

Open
liujshi wants to merge 1 commit intoopenai:mainfrom
liujshi:record-smear-vgate
Open

Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT + V-Gated — val_bpb 1.0796 (3-seed mean)#1770
liujshi wants to merge 1 commit intoopenai:mainfrom
liujshi:record-smear-vgate

Conversation

@liujshi
Copy link
Copy Markdown

@liujshi liujshi commented Apr 22, 2026

Summary

This PR adds a new record submission based on the SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT baseline.

Main additions in this submission:

  • final norm scale
  • Smear gate
  • per-head V-Gate
  • improved quantized compression via per-matrix automatic layout selection
  • additional hyperparameter tuning (MUON_BACKEND_STEPS=4, TTT_LR=0.01)

Result

  • val_bpb = 1.0796 (3-seed mean, std 0.00025)
  • artifact size ≈ 15.99 MB

3-seed TTT BPP:

  • seed 42: 1.07985553
  • seed 314: 1.07927887
  • seed 999: 1.07973035

Files Included

  • README.md
  • submission.json
  • train_gpt.py
  • train_seed42.log
  • train_seed314.log
  • train_seed999.log

Notes

The provided train_gpt.py contains the final norm / Smear / V-Gate / compression-layout changes used in this submission.

leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 22, 2026
…iculum + MLPClip12

Frontier: openai#1769 (1.06453) and openai#1771 (1.06513) both below baseline.
New ideas: mlp-clip-sigmas-12, v-gate.
Map updated with openai#1769, openai#1771, openai#1770.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant