Skip to content

Record: SP8192 CaseOps + V13 Curriculum + SmearGate + LoRA-TTT — val_bpb 1.06513 (3-seed mean)#1771

Open
bigbag wants to merge 1 commit intoopenai:mainfrom
bigbag:submission/v13-l2-lora-ttt
Open

Record: SP8192 CaseOps + V13 Curriculum + SmearGate + LoRA-TTT — val_bpb 1.06513 (3-seed mean)#1771
bigbag wants to merge 1 commit intoopenai:mainfrom
bigbag:submission/v13-l2-lora-ttt

Conversation

@bigbag
Copy link
Copy Markdown

@bigbag bigbag commented Apr 22, 2026

Summary

val_bpb = 1.06513 (3-seed mean, std 0.00055) | ~15.98 MB | 8xH100 SXM

3-Seed Results

Seed Sliding BPB TTT BPB val_loss (nats/tok) Artifact
42 1.07767 1.06449 2.32950 15,975,592
314 1.07856 1.06543 2.33156 15,976,709
999 1.07866 1.06547 2.33162 15,976,693
Mean 1.07830 1.06513 2.33089 15,976,331
Std 0.00055 0.00055

Note: val_bpb computed via standard sentencepiece LUT byte counting (consistent with PR #1769 methodology). Train logs report sidecar-based BPB; val_loss is the ground truth.

Key Techniques

  1. SP8192 CaseOps — Lossless reversible case normalization (TITLE/ALLCAPS/CAPNEXT/ESC operators). Pending Clarify which text normalizations are allowed for custom tokenizers #1604.
  2. Recurrence Depth Curriculum (PR Record: CaseOps Tokenizer + Recurrence Depth Curriculum + Base Arch Stack — val_bpb 1.06505 #1756) — Phased depth 1→3→4 training, eval at depth 4.
  3. SmearGate (modded-nanogpt @classiclarryd) — Per-layer smoothing gate. Novel combo with GatedAttn.
  4. GatedAttn + QuantGate (PR Record: SP8192 + CaseOps + GatedAttn + QuantGate + Loop45 + PhasedTTT — val_bpb 1.06549 #1736) — Full-dim attention gate with int8 passthrough.
  5. LoRA-TTT Improvements (PR Record: Alpha=144 LoRA + Warm-start A + WD 1.0 — val_bpb 1.07209 (3-seed mean) #1767) — Alpha/rank scaling, warm-start A, WD 1.0, alpha 144.
  6. Phased Score-First TTT — 3-phase AdamW (lr=1e-4, WD=1.0), 2000 prefix docs.

Rule Compliance

Test plan

  • Reviewer reproduces any single seed with the provided train_gpt.py and env vars from README
  • Verify artifact size < 16,000,000 bytes in each seed log
  • Verify score-first TTT ordering in code

🤖 Generated with Claude Code

…bpb 1.06513 (3-seed mean)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bigbag
Copy link
Copy Markdown
Author

bigbag commented Apr 22, 2026

Thanks to OpenAI's Advanced Competitor grant ($500 compute credit via RunPod) for making this work possible.

leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 22, 2026
TTT_LORA_ALPHA env var (default 96, spec uses 144). Only zero B on reset;
A accumulates feature directions across batches. Output scaled by alpha/rank.
Validated by renqianluo (openai#1767) and bigbag (openai#1771).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 22, 2026
…iculum + MLPClip12

Frontier: openai#1769 (1.06453) and openai#1771 (1.06513) both below baseline.
New ideas: mlp-clip-sigmas-12, v-gate.
Map updated with openai#1769, openai#1771, openai#1770.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 22, 2026
…13) strongest legal signal; dexhunter PR openai#1769 (1.06453) new best; LoRA-TTT warm-start A+alpha=144+WD=1.0 appears legal; arXiv:2604.15259 looped transformer outer normalization; Day 13 plateau; Session 19

https://claude.ai/code/session_013agP2MtwGU9MaPNtWx2hib
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant