Skip to content

Non-record: Kitchen Sink — ACT vs Masked Recurrence, 8192-Bigram (val_bpb 1.4011)#1820

Open
aiejvn wants to merge 3 commits intoopenai:mainfrom
aiejvn:submission/utransformer
Open

Non-record: Kitchen Sink — ACT vs Masked Recurrence, 8192-Bigram (val_bpb 1.4011)#1820
aiejvn wants to merge 3 commits intoopenai:mainfrom
aiejvn:submission/utransformer

Conversation

@aiejvn
Copy link
Copy Markdown

@aiejvn aiejvn commented Apr 25, 2026

Summary

Ablation comparing two recurrence control mechanisms on the kitchen-sink Universal
Transformer baseline (UT×22, Echo Training, Gradient Quilting, Adaptive Density,
8192-bigram, XSA, LeakyReLU², EMA, late QAT, Brotli-11). Two full 20k-step runs on
1×A10G-24GB.

Results

Run Recurrence val_bpb (pre-TTT) TTT roundtrip (int6+zlib-9) ms/step
ACT ACT, ponder_cost=0.01 1.4011 1.26482870 6087
Masked-Recur Soft gate 1.4044 1.26742860 6130

Reported val_bpb is the pre-TTT checkpoint at step 20000. Both runs had
TTT_ENABLED=1 adapting on val_tokens — the same pattern flagged in PR #1376 and
corrected in PR #1193. TTT roundtrip scores are noted for reference only and excluded
from the headline metric.

Key Finding

ACT and masked recurrence converge to within 0.003 bpb of each other after 20k steps.
The recurrence control mechanism is not a meaningful lever at this model size and
training budget — the UT compute budget (num_iters) dominates.

Artifact

  • int6+zlib-9 compressed: ~10.3 MB — within the 16 MB track limit
  • Code: 91,104 bytes

Track

Non-record research submission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant