Record: VarLenAttn + PhasingTTT - val_bpb 1.0728 (3-seed mean) by romeerp · Pull Request #1610 · openai/parameter-golf

romeerp · 2026-04-14T04:50:50Z

This builds directly on PR #1530. Training is unchanged; the only change is in evaluation.

Results:

Seed	val_loss	val_bpb	eval_time	artifact_size
0	2.76951521	1.07216564	500.104 s	15,996,697 B
1	2.77167493	1.07300174	515.324 s	15,995,985 B
2	2.77232000	1.07325147	504.949 s	15,988,805 B
avg	2.77117005	1.07280628	506.792 s	15,993,829 B

All 3 seeds are under the 600s eval budget and under the 16 MB artifact cap.

Compared to the original PR #1530 submission mean:

Metric	PR1530	This submission	Delta
val_loss	2.77261037	2.77117005	-0.00144032
val_bpb	1.07336388	1.07280628	-0.00055760

Method:

Run the stock PR Record: Varlen attention + fused MLP + doc-independent TTT (1.07336) #1530 LoRA TTT evaluator on its single global length-sorted queue.
After 2000 queue-completed documents have been fully scored, pause once.
Gather exactly those already-scored documents in queue order.
Run distributed global SGD on that scored prefix.
Resume the same queue with the updated base model.

Legality:

LoRA scoring happens before LoRA updates on those chunks.
Global SGD only trains on documents that have already been fully scored.
After the pause, evaluation resumes on future queue items only.
So no token is used for adaptation before its score has already been counted.

Intuition:

PR Record: Varlen attention + fused MLP + doc-independent TTT (1.07336) #1530's LoRA TTT is a local adaptation mechanism. It lets the model fit the current document quickly, but that adaptation is discarded when the document ends.
The added global SGD phase is meant to improve the shared base model itself on a score-first prefix, so later documents can benefit from a slightly better base model before local LoRA adaptation is applied.
In that sense, LoRA handles fast document-local adaptation, while global SGD tries to capture reusable cross-document adaptation.

Implementation note:

I initially tried a more continuous hybrid scheme where local and global updates happened throughout eval.
That version was harder to make run well in distributed form without introducing too much synchronization overhead.
I simplified the final implementation into a phased process because it is much easier to reason about, clearly score-first, and still fits within the 600s eval budget.
I do not think this implementation is especially optimized yet; the main goal here was to get a clean legal baseline for combining local LoRA TTT with global base-model adaptation.

Run instructions:

Train + quantize + phased eval for one seed:

SEED=0 ARTIFACT_DIR="runs/varlen0" \
PHASED_TTT_ENABLED=1 PHASED_TTT_PREFIX_DOCS=2000 \
  torchrun --standalone --nproc_per_node=8 train_gpt.py

Eval-only on an existing checkpoint:

SEED=0 EVAL_ONLY_PATH="runs/varlen0/final_model.pt" \
PHASED_TTT_ENABLED=1 PHASED_TTT_PREFIX_DOCS=2000 \
  torchrun --standalone --nproc_per_node=8 train_gpt.py

Bring AGENTS.md, AGENT_SYNC.md, project-state.md, decisions.md, and next-session.md to the openai#1610-direct strategy. Add locked execution plan (PLAN_PR1610_CORRECTOR.md Rev 3). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Exact copy from PR openai#1610 at SHA ca19195. MD5: 57cfda2047b2c2a63ec10b99d704bfb0. 3379 lines, 139831 bytes. This is the unmodified source base; corrector will be added in later commits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Setup, seed-0 (Gate A), seed-1/2 (Gate B) subcommands with published BPB verification targets and kill criteria. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…; PRISM + Ouroboros papers; Session 13 - Merged SOTA 1.0810 unchanged (5-day plateau; 16 days to deadline) - PR openai#1610 (romeerp, 1.0728): VarLenAttn + PhasingTTT — legal, score-first compliant, but low EV (-0.0006 bpb) - PR openai#1619 flagged likely illegal (AdamW TTT — same pattern as rejected PR openai#771) - PRISM (arXiv:2602.10796, Feb 2026): Parallel Residual Iterative Sequence Model, 174x throughput — read before next recurrence architecture decision - Ouroboros (arXiv:2604.02051, Apr 2026): input-conditioned LoRA modulation for recursive transformers — watch - Session 13 added to CLAUDE.md; no strategy change (PR openai#1586 per-layer GPTQ still #1 priority) - daily_research.md Apr 14 entry added at top https://claude.ai/code/session_01GLn4VtS8D1uehRZnfb4dRe

@samacqua

…al_bpb 1.07193 (3-seed mean) Novel multi-phase global SGD during phased TTT evaluation. Builds on PR openai#1530 (@samacqua) + PR openai#1610 (@romeerp) phased TTT concept. 3-seed mean: 1.07193 BPB (2.76890 nats), std 0.00063. Seeds: 42, 0, 1234. All artifacts <16 MB.

Commit the current posterior-corrector working-tree state for PR openai#1610: - train_gpt.py corrector path plus warmup legality fix - LEGALITY_SPEC.md, DEPENDENCY_GATE.md, requirements.txt - test_corrector.py and bench_corrector_cpu.py - AGENT_SYNC.md closeout with audit measurements The warmup path previously touched val_data.val_tokens before the official eval timer. It now uses a device-local torch.Generator + torch.randint synthetic tokens. 9/9 tests pass and the CPU bench projects 26.1s. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… A/B Zero-intervention 8xH100 pipeline: pod verify, SP8192 download, Gate A seed-0 baseline, corrector ablations, 3-way decision point, Gate B 3-seed corrector mean, fallback requant, artifact preservation. Fixes applied (Codex review): checkpoint persisted before log parse (Fix D), 3-way ablation decision fork with hold band (Fix G), fail-closed fallback parse (Fix H), removed malformed S3 backend (Fix J), Gate B rewired to coherent 3-seed corrector mean (Fix I — seed-0 re-eval added so all three seeds use same corrector config, mean vs published 1.07280628). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

009: add logit_bias warmup pass (dummy bf16 tensor) so Dynamo traces the Tensor branch before the 600s eval timer starts; gated on h.corrector_alpha > 0 002: pass BEST_ALPHA/BEST_ORDERS as argv to Gate B summary heredoc; corrector_alpha/corrector_orders now populated in gate_b_summary.json 003: update 02_gate_a.sh header comment to show actual ceiling 1.07516564 004: drop hash() wrapper in PrefixNgramCorrector — use ctx tuple directly as dict key; Python dicts handle collision disambiguation natively 001: rewrite test_single_pass to actually exercise chunk-boundary invariance: same corrector fed tokens[:10] then tokens[10:] must match a fresh single-pass corrector fed all 20 tokens All 9 tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…t-SHA pin, align README + run_all - 05_preserve_artifacts.sh: write commit_sha.txt, hardware_info.txt, env_fingerprint.txt before tarball; fix repo_type=model to match the amay01/parameter-golf-session3-artifacts repo type - 00_verify_pod.sh: add optional EXPECTED_SHA exact-pin check on top of existing ancestry-only guard - run_all.sh: parameterize banner SHA; warn when EXPECTED_SHA unset so operator knows the orchestrator is running ancestry-only - README.md: align Gate A kill threshold (1.078 → 1.07516564); update Block 1 operator commands to include git checkout + EXPECTED_SHA; separate ancestry anchor from session launch SHA in header Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…verlay Supersedes the 2026-04-18 post-Ultrareview pin (876bb36). Rev 5 adds provenance auto-capture, repo-type=model fix, exact-SHA env-var pin, and run_all.sh/README alignment; new pin reflects the pipeline-patch commit. Also records the live-guidance absolute-BPB overlay and 04b deprecation driven by open-PR competitive intel (openai#1700 / openai#1716 / openai#1707 / openai#1693). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ative result + quantized-eval fix Non-record evidence package for PR openai#1610. Three separable contributions: 1. Faithful seed-0 reproduction of PR openai#1610 on independent infrastructure (8xH100 HBM3 SXM5, RunPod): our BPB 1.07218477 vs published seed-0 BPB 1.07216564 -> delta +1.913e-5. 2. Bounded negative result for a score-first n-gram posterior corrector layered on PR openai#1610's phased LoRA TTT eval path. All three tested (alpha, orders) configs degrade BPB monotonically with alpha. The corrector and TTT-LoRA are both deterministic functions of the scored prefix, so additively combining them over-counts the prefix evidence. Claim is bounded to the tested grid on this stack; does not generalize to all posterior correctors or non-TTT eval pipelines. 3. Fix for the quantized-eval-only branch of train_gpt.py (two guards at lines 3204 and 3259) that previously crashed on None-model dereference when EVAL_ONLY_QUANTIZED_PATH was set. Surfaced while running the ablations in contribution 2. Artifact: 15,999,394 bytes (606 bytes of competition-cap headroom). Single-seed scope, acknowledged. Compliance with Issue openai#1017 Section III walked line-by-line in README. Also updates three internal docs to reference the renamed HF artifact repo (amay01/parameter-golf-pr1610-reproduction-artifacts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ative result + quantized-eval-only path fix

@samacqua

…al_bpb 1.07193 (3-seed mean) Novel multi-phase global SGD during phased TTT evaluation. Builds on PR openai#1530 (@samacqua) + PR openai#1610 (@romeerp) phased TTT concept. 3-seed mean: 1.07193 BPB (2.76890 nats), std 0.00063. Seeds: 42, 0, 1234. All artifacts <16 MB.

romeerp added 2 commits April 13, 2026 23:50

Add phased global SGD TTT prefix submission

afcde8b

Refresh phased TTT submission logs and results

61b3e24

romeerp marked this pull request as ready for review April 14, 2026 05:30

romeerp changed the title ~~Add phased global SGD TTT prefix submission~~ Record: VarLenAttn + PhasingTTT - val_bpb 1.0728 (3-seed mean) Apr 14, 2026

romeerp and others added 2 commits April 14, 2026 00:43

Add artifact sizes to phased TTT README

ca19195

Update submission.json

41bb4a6

amrayach added a commit to amrayach/parameter-golf that referenced this pull request Apr 18, 2026

docs(openai#1610): pin post-Ultrareview SHA 876bb36 as launch commit

e8556c6

mikeapedia mentioned this pull request Apr 19, 2026

Non-record: Neural Base Model, No TTT — Parcae + Gates + Layered Windows (val_bpb 1.07706) #1728

Open

7 tasks

amrayach mentioned this pull request Apr 19, 2026

Non-record: #1610 reproduction (Δ=+1.9e-5 BPB), n-gram posterior corrector negative result, quantized-eval-only path fix #1740

Closed

amrayach added a commit to amrayach/parameter-golf that referenced this pull request Apr 19, 2026

add(non_record_16mb): openai#1610 reproduction + n-gram corrector neg…

2114ec4

…ative result + quantized-eval-only path fix

amrayach mentioned this pull request Apr 19, 2026

Non-record: #1610 reproduction (Δ=+1.9e-5 BPB), n-gram posterior corrector negative result, quantized-eval-only path fix #1741

Open

This was referenced Apr 28, 2026

Update Parameter Golf leaderboard #1899

Open

Update Parameter Golf leaderboard #1900

Open

cocohearts mentioned this pull request Apr 28, 2026

Update Parameter Golf leaderboard with BOS fix #1902

Merged

dttdrv mentioned this pull request Apr 28, 2026

{RECORD} CaseOps pre-quant TTT record (1.0354 BPB) #1911

Open

8 tasks

simon-marcus mentioned this pull request Apr 29, 2026

Record candidate: 1.06032 CaseOps + Matrix-LR 0.028 + TTT n=1 #1925

Open

8 tasks

cocohearts merged commit 96d3c34 into openai:main Apr 29, 2026

andrewbaggio1 mentioned this pull request Apr 30, 2026

Record: PR #1945 base + 2560 long-context + no_qv TTT mask + TTT LR 0.75 + QK_GAIN 5.25 — val_bpb 1.05855 (3-seed mean) #1953

Open

10 tasks

bsisduck mentioned this pull request Apr 30, 2026

SP8192 CaseOps + WiderGate32 + GPTQ-int6 — val_bpb 1.08037 (3-seed mean) #1969

Open

simonbissonnette mentioned this pull request Apr 30, 2026

Record: PR1855/PR1953 base + Progressive context growth (val_bpb: 1.05759, 3-seed) #2014

Open

PiyushDatta mentioned this pull request May 1, 2026

Record: SP8192+DepthRec+Half batch SWA+Polar NS+Phased LoRa TTT - val_bpb 1.089 (best), val_bpb 1.090 (3-seed mean) - PiyushDatta #2106

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: VarLenAttn + PhasingTTT - val_bpb 1.0728 (3-seed mean)#1610

Record: VarLenAttn + PhasingTTT - val_bpb 1.0728 (3-seed mean)#1610
cocohearts merged 4 commits intoopenai:mainfrom
romeerp:codex/phased-ttt-2000

romeerp commented Apr 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

romeerp commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

romeerp commented Apr 14, 2026 •

edited

Loading