SP8192 + CaseOps + Loop345 + Recur-Alpha + PhasedTTT by tashapais · Pull Request #1766 · openai/parameter-golf

tashapais · 2026-04-22T00:15:01Z

Summary

Adds Recur-Alpha to the PR #1736 stack (CaseOps + GatedAttn + QuantGate + Loop345 + PhasedTTT).

Recur-Alpha is a learned scalar per looped block (init=0) that adds a weighted copy of each block's first-visit activation to its subsequent recurrence passes — a lightweight GRU-like carry inside the depth recurrence. The idea originates in PR #1714, where it was implemented on the older SP8192 stack but TTT evaluation was never completed. This PR is the first composition of Recur-Alpha with the CaseOps + phased TTT stack.

The only code change from PR #1736:

# Block.__init__
self.recur_alpha = nn.Parameter(torch.zeros(1))

# forward_logits + forward_ttt — carry dict in encoder/decoder loops
carry = {}
for i in enc_iter:
    x = block(...)
    if self.looping_active:
        if i in carry:
            x = x + self.blocks[i].recur_alpha.to(dtype=x.dtype) * carry[i]
        carry[i] = x
    skips.append(x)

# decoder non-parallel branch:
x = block(...)
if self.looping_active and i in carry:
    x = x + self.blocks[i].recur_alpha.to(dtype=x.dtype) * carry[i]

Cost: 3 parameters (one per looped block), ndim=1 → scalar AdamW, excluded from GPTQ and Muon. Artifact size impact: negligible.

Full Technique Stack

SP8192 tokenizer
CaseOps — bijective lossless case preprocessing; BPB on original UTF-8 bytes
3-Layer Depth Recurrence — layers 3, 4, 5 × 2 loops (17 virtual layers), activates at 35%
Recur-Alpha — learned carry scalar per looped block (init=0) (novel)
Gated Attention — per-head sigmoid output gate, init_std=0.01
Quant Gate — int8-per-row quantization of attn_gate_w
Parallel Residuals — GPT-J style from layer 8
QK-Gain 5.0 — learned per-head query scalar
Full-Hessian GPTQ — int6 matrices, int8 embeddings, SDClip
MuonEq-R — row-normalized Muon + AdamW
Phased TTT — score-first LoRA SGD, per-doc reset, cosine LR decay

Reproduction

pip install brotli sentencepiece
pip install flash_attn_3 --no-deps --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291/
python prepare_caseops_data.py

SEED=42 CASEOPS_ENABLED=1 GATED_ATTN_ENABLED=1 GATED_ATTN_QUANT_GATE=1 \
  torchrun --standalone --nproc_per_node=8 train_gpt.py

Results pending on 8xH100 hardware.

Test plan

Run 3 seeds (42, 0, 1234) on 8xH100s
Verify training completes under 600s
Verify artifact under 16,000,000 bytes
Verify sliding-window + TTT eval under 600s
Report val_bpb for each seed

Credits

@dexhunter — PR Record: SP8192 + CaseOps + GatedAttn + QuantGate + Loop45 + PhasedTTT — val_bpb 1.06549 #1736 CaseOps + GatedAttn + QuantGate + Loop345 + PhasedTTT base (1.06549)
@samacqua — PR Record: Varlen attention + fused MLP + doc-independent TTT (1.07336) #1530 SP8192 base
@romeerp — PR Record: CaseOps Tokenizer + Tapered WD - val_bpb 1.0678 (3-seed mean) #1729 CaseOps concept
@MarioPaerle — PR RECORD: SmearGate + Attention Output Gate + Legal TTT | val_bpb=1.07139 #1667 attention gate
Anakintano — PR SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.5 + SGD-TTT [LoRA-TTT Future Work] #1714 Recur-Alpha concept

Adds Recur-Alpha (learned carry scalar per looped block, init=0) to the PR openai#1736 CaseOps+GatedAttn+QuantGate+Loop345+PhasedTTT stack. The only code change is 3 new nn.Parameter(zeros(1)) scalars in Block.__init__ and carry-dict logic in both forward_logits and forward_ttt encoder/decoder loops. Zero initialization preserves the base model at step 0. Results pending on 8xH100 hardware. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rchitecture test - RECUR_ALPHA_ENABLED=0 disables carry additions for ablation runs without changing the depth recurrence architecture; freezes recur_alpha params - Logs recur_alpha values at loop activation and end of training so 1xH100 smoke runs can confirm the scalars are learning - test_architecture.py: CPU-only test (stubs FA3/triton) covering model instantiation, index layout, forward passes, gradient flow, and carry effect Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tashapais and others added 2 commits April 21, 2026 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SP8192 + CaseOps + Loop345 + Recur-Alpha + PhasedTTT#1766

SP8192 + CaseOps + Loop345 + Recur-Alpha + PhasedTTT#1766
tashapais wants to merge 2 commits intoopenai:mainfrom
tashapais:submission/sp8192-caseops-recur-alpha

tashapais commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tashapais commented Apr 22, 2026

Summary

Full Technique Stack

Reproduction

Test plan

Credits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant