Skip to content

Record: #1787 + Sparse Gate + Updated Frozen Carry — val_bpb 1.06287#1800

Closed
leon2k2k2k wants to merge 1 commit intoopenai:mainfrom
leon2k2k2k:submission/036-sparse-updated-carry
Closed

Record: #1787 + Sparse Gate + Updated Frozen Carry — val_bpb 1.06287#1800
leon2k2k2k wants to merge 1 commit intoopenai:mainfrom
leon2k2k2k:submission/036-sparse-updated-carry

Conversation

@leon2k2k2k
Copy link
Copy Markdown

Summary

Results (8×H100 80GB SXM, phased LoRA-TTT, 10-min train / 10-min eval)

Seed Steps Post-EMA (pre-quant) Quantized Post-TTT Artifact (bytes)
42 4989 1.06749 1.07678 1.06366 15,909,254
0 4974 1.06685 1.07608 1.06311 15,904,209
1234 4973 1.06578 1.07509 1.06183 15,909,401
Mean 4979 1.06671 1.07598 1.06287 15,907,621

Frozen Recurrent Carry

The recurrent α/β carry coefficients (first introduced in #1779) were learned end-to-end on a full training run with no validation set involvement, then quantized to 2 decimal places before this promotion run:

  • β = [1.56, 1.85, 2.13]
  • α = [[0.23, 0.04, 0.03], [0.13, −0.34, 0.01], [0.06, 0.19, −0.02]]

Full-precision learned values: β = [1.5610, 1.8531, 2.1320], α = [[0.2314, 0.0388, 0.0347], [0.1260, −0.3438, 0.0145], [0.0557, 0.1934, −0.0172]].

The legality of offline-learned frozen scalars was discussed in #1779 — the data-size budget provides a natural bound on this class of technique.

What this adds over #1779

From #1787 (nprime06):

  • Polar Express Newton-Schulz coefficients
  • MIN_LR=0.10 warmdown floor
  • Fused softcapped CE
  • GPTQ_RESERVE_SECONDS=0.5, VAL_LOSS_EVERY=0

New in this PR:

  • Sparse attention-output gate — replaces the dense GatedAttn with a narrow-input sparse gate
  • Updated frozen recurrent carry — α/β re-learned on the sparse-gate stack and frozen to 2 decimal places (values above)

Rule Compliance

Test Plan

  • Reviewer reproduces any single seed with the provided train_gpt.py and env vars
  • Verify artifact size < 16,000,000 bytes in each seed log
  • Verify score-first TTT ordering in code

🤖 Generated with Claude Code

@leon2k2k2k leon2k2k2k force-pushed the submission/036-sparse-updated-carry branch 3 times, most recently from 1a7f817 to 356dc2a Compare April 24, 2026 01:52
@leon2k2k2k leon2k2k2k force-pushed the submission/036-sparse-updated-carry branch from 356dc2a to 372c5f1 Compare April 24, 2026 01:55
@leon2k2k2k leon2k2k2k closed this Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant