SP8192 + CaseOps + Loop345 + Recur-Alpha + PhasedTTT#1766
Open
tashapais wants to merge 2 commits intoopenai:mainfrom
Open
SP8192 + CaseOps + Loop345 + Recur-Alpha + PhasedTTT#1766tashapais wants to merge 2 commits intoopenai:mainfrom
tashapais wants to merge 2 commits intoopenai:mainfrom
Conversation
Adds Recur-Alpha (learned carry scalar per looped block, init=0) to the PR openai#1736 CaseOps+GatedAttn+QuantGate+Loop345+PhasedTTT stack. The only code change is 3 new nn.Parameter(zeros(1)) scalars in Block.__init__ and carry-dict logic in both forward_logits and forward_ttt encoder/decoder loops. Zero initialization preserves the base model at step 0. Results pending on 8xH100 hardware. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rchitecture test - RECUR_ALPHA_ENABLED=0 disables carry additions for ablation runs without changing the depth recurrence architecture; freezes recur_alpha params - Logs recur_alpha values at loop activation and end of training so 1xH100 smoke runs can confirm the scalars are learning - test_architecture.py: CPU-only test (stubs FA3/triton) covering model instantiation, index layout, forward passes, gradient flow, and carry effect Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Recur-Alpha to the PR #1736 stack (CaseOps + GatedAttn + QuantGate + Loop345 + PhasedTTT).
Recur-Alpha is a learned scalar per looped block (init=0) that adds a weighted copy of each block's first-visit activation to its subsequent recurrence passes — a lightweight GRU-like carry inside the depth recurrence. The idea originates in PR #1714, where it was implemented on the older SP8192 stack but TTT evaluation was never completed. This PR is the first composition of Recur-Alpha with the CaseOps + phased TTT stack.
The only code change from PR #1736:
Cost: 3 parameters (one per looped block), ndim=1 → scalar AdamW, excluded from GPTQ and Muon. Artifact size impact: negligible.
Full Technique Stack
Reproduction
Results pending on 8xH100 hardware.
Test plan
Credits