Skip to content

Record: VarLenAttn + PhasingTTT - val_bpb 1.0728 (3-seed mean)#1610

Merged
cocohearts merged 4 commits intoopenai:mainfrom
romeerp:codex/phased-ttt-2000
Apr 29, 2026
Merged

Record: VarLenAttn + PhasingTTT - val_bpb 1.0728 (3-seed mean)#1610
cocohearts merged 4 commits intoopenai:mainfrom
romeerp:codex/phased-ttt-2000

Conversation

@romeerp
Copy link
Copy Markdown
Contributor

@romeerp romeerp commented Apr 14, 2026

This builds directly on PR #1530. Training is unchanged; the only change is in evaluation.

Results:

Seed val_loss val_bpb eval_time artifact_size
0 2.76951521 1.07216564 500.104 s 15,996,697 B
1 2.77167493 1.07300174 515.324 s 15,995,985 B
2 2.77232000 1.07325147 504.949 s 15,988,805 B
avg 2.77117005 1.07280628 506.792 s 15,993,829 B

All 3 seeds are under the 600s eval budget and under the 16 MB artifact cap.

Compared to the original PR #1530 submission mean:

Metric PR1530 This submission Delta
val_loss 2.77261037 2.77117005 -0.00144032
val_bpb 1.07336388 1.07280628 -0.00055760

Method:

  1. Run the stock PR Record: Varlen attention + fused MLP + doc-independent TTT (1.07336) #1530 LoRA TTT evaluator on its single global length-sorted queue.
  2. After 2000 queue-completed documents have been fully scored, pause once.
  3. Gather exactly those already-scored documents in queue order.
  4. Run distributed global SGD on that scored prefix.
  5. Resume the same queue with the updated base model.

Legality:

  • LoRA scoring happens before LoRA updates on those chunks.
  • Global SGD only trains on documents that have already been fully scored.
  • After the pause, evaluation resumes on future queue items only.
  • So no token is used for adaptation before its score has already been counted.

Intuition:

  • PR Record: Varlen attention + fused MLP + doc-independent TTT (1.07336) #1530's LoRA TTT is a local adaptation mechanism. It lets the model fit the current document quickly, but that adaptation is discarded when the document ends.
  • The added global SGD phase is meant to improve the shared base model itself on a score-first prefix, so later documents can benefit from a slightly better base model before local LoRA adaptation is applied.
  • In that sense, LoRA handles fast document-local adaptation, while global SGD tries to capture reusable cross-document adaptation.

Implementation note:

  • I initially tried a more continuous hybrid scheme where local and global updates happened throughout eval.
  • That version was harder to make run well in distributed form without introducing too much synchronization overhead.
  • I simplified the final implementation into a phased process because it is much easier to reason about, clearly score-first, and still fits within the 600s eval budget.
  • I do not think this implementation is especially optimized yet; the main goal here was to get a clean legal baseline for combining local LoRA TTT with global base-model adaptation.

Run instructions:

Train + quantize + phased eval for one seed:

SEED=0 ARTIFACT_DIR="runs/varlen0" \
PHASED_TTT_ENABLED=1 PHASED_TTT_PREFIX_DOCS=2000 \
  torchrun --standalone --nproc_per_node=8 train_gpt.py

Eval-only on an existing checkpoint:

SEED=0 EVAL_ONLY_PATH="runs/varlen0/final_model.pt" \
PHASED_TTT_ENABLED=1 PHASED_TTT_PREFIX_DOCS=2000 \
  torchrun --standalone --nproc_per_node=8 train_gpt.py

@romeerp romeerp marked this pull request as ready for review April 14, 2026 05:30
@romeerp romeerp changed the title Add phased global SGD TTT prefix submission Record: VarLenAttn + PhasingTTT - val_bpb 1.0728 (3-seed mean) Apr 14, 2026
amrayach added a commit to amrayach/parameter-golf that referenced this pull request Apr 14, 2026
Bring AGENTS.md, AGENT_SYNC.md, project-state.md, decisions.md,
and next-session.md to the openai#1610-direct strategy. Add locked
execution plan (PLAN_PR1610_CORRECTOR.md Rev 3).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
amrayach added a commit to amrayach/parameter-golf that referenced this pull request Apr 14, 2026
Exact copy from PR openai#1610 at SHA ca19195.
MD5: 57cfda2047b2c2a63ec10b99d704bfb0. 3379 lines, 139831 bytes.
This is the unmodified source base; corrector will be added in later commits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
amrayach added a commit to amrayach/parameter-golf that referenced this pull request Apr 14, 2026
Setup, seed-0 (Gate A), seed-1/2 (Gate B) subcommands with
published BPB verification targets and kill criteria.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 14, 2026
…; PRISM + Ouroboros papers; Session 13

- Merged SOTA 1.0810 unchanged (5-day plateau; 16 days to deadline)
- PR openai#1610 (romeerp, 1.0728): VarLenAttn + PhasingTTT — legal, score-first compliant, but low EV (-0.0006 bpb)
- PR openai#1619 flagged likely illegal (AdamW TTT — same pattern as rejected PR openai#771)
- PRISM (arXiv:2602.10796, Feb 2026): Parallel Residual Iterative Sequence Model, 174x throughput — read before next recurrence architecture decision
- Ouroboros (arXiv:2604.02051, Apr 2026): input-conditioned LoRA modulation for recursive transformers — watch
- Session 13 added to CLAUDE.md; no strategy change (PR openai#1586 per-layer GPTQ still #1 priority)
- daily_research.md Apr 14 entry added at top

https://claude.ai/code/session_01GLn4VtS8D1uehRZnfb4dRe
dexhunter added a commit to dexhunter/parameter-golf that referenced this pull request Apr 14, 2026
…al_bpb 1.07193 (3-seed mean)

Novel multi-phase global SGD during phased TTT evaluation.
Builds on PR openai#1530 (@samacqua) + PR openai#1610 (@romeerp) phased TTT concept.
3-seed mean: 1.07193 BPB (2.76890 nats), std 0.00063.
Seeds: 42, 0, 1234. All artifacts <16 MB.
amrayach added a commit to amrayach/parameter-golf that referenced this pull request Apr 17, 2026
Commit the current posterior-corrector working-tree state for PR openai#1610:
- train_gpt.py corrector path plus warmup legality fix
- LEGALITY_SPEC.md, DEPENDENCY_GATE.md, requirements.txt
- test_corrector.py and bench_corrector_cpu.py
- AGENT_SYNC.md closeout with audit measurements

The warmup path previously touched val_data.val_tokens before the official
eval timer. It now uses a device-local torch.Generator + torch.randint
synthetic tokens. 9/9 tests pass and the CPU bench projects 26.1s.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
amrayach added a commit to amrayach/parameter-golf that referenced this pull request Apr 17, 2026
… A/B

Zero-intervention 8xH100 pipeline: pod verify, SP8192 download,
Gate A seed-0 baseline, corrector ablations, 3-way decision point,
Gate B 3-seed corrector mean, fallback requant, artifact preservation.

Fixes applied (Codex review): checkpoint persisted before log parse
(Fix D), 3-way ablation decision fork with hold band (Fix G), fail-closed
fallback parse (Fix H), removed malformed S3 backend (Fix J), Gate B
rewired to coherent 3-seed corrector mean (Fix I — seed-0 re-eval added
so all three seeds use same corrector config, mean vs published 1.07280628).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
amrayach added a commit to amrayach/parameter-golf that referenced this pull request Apr 18, 2026
009: add logit_bias warmup pass (dummy bf16 tensor) so Dynamo traces
     the Tensor branch before the 600s eval timer starts; gated on
     h.corrector_alpha > 0

002: pass BEST_ALPHA/BEST_ORDERS as argv to Gate B summary heredoc;
     corrector_alpha/corrector_orders now populated in gate_b_summary.json

003: update 02_gate_a.sh header comment to show actual ceiling 1.07516564

004: drop hash() wrapper in PrefixNgramCorrector — use ctx tuple directly
     as dict key; Python dicts handle collision disambiguation natively

001: rewrite test_single_pass to actually exercise chunk-boundary
     invariance: same corrector fed tokens[:10] then tokens[10:] must
     match a fresh single-pass corrector fed all 20 tokens

All 9 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
amrayach added a commit to amrayach/parameter-golf that referenced this pull request Apr 18, 2026
amrayach added a commit to amrayach/parameter-golf that referenced this pull request Apr 18, 2026
…t-SHA pin, align README + run_all

- 05_preserve_artifacts.sh: write commit_sha.txt, hardware_info.txt,
  env_fingerprint.txt before tarball; fix repo_type=model to match
  the amay01/parameter-golf-session3-artifacts repo type
- 00_verify_pod.sh: add optional EXPECTED_SHA exact-pin check on top
  of existing ancestry-only guard
- run_all.sh: parameterize banner SHA; warn when EXPECTED_SHA unset
  so operator knows the orchestrator is running ancestry-only
- README.md: align Gate A kill threshold (1.078 → 1.07516564); update
  Block 1 operator commands to include git checkout + EXPECTED_SHA;
  separate ancestry anchor from session launch SHA in header

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
amrayach added a commit to amrayach/parameter-golf that referenced this pull request Apr 18, 2026
…verlay

Supersedes the 2026-04-18 post-Ultrareview pin (876bb36). Rev 5 adds
provenance auto-capture, repo-type=model fix, exact-SHA env-var pin, and
run_all.sh/README alignment; new pin reflects the pipeline-patch commit.

Also records the live-guidance absolute-BPB overlay and 04b deprecation
driven by open-PR competitive intel (openai#1700 / openai#1716 / openai#1707 / openai#1693).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
amrayach added a commit to amrayach/parameter-golf that referenced this pull request Apr 19, 2026
…ative result + quantized-eval fix

Non-record evidence package for PR openai#1610. Three separable contributions:

1. Faithful seed-0 reproduction of PR openai#1610 on independent infrastructure
   (8xH100 HBM3 SXM5, RunPod): our BPB 1.07218477 vs published seed-0 BPB
   1.07216564 -> delta +1.913e-5.

2. Bounded negative result for a score-first n-gram posterior corrector
   layered on PR openai#1610's phased LoRA TTT eval path. All three tested
   (alpha, orders) configs degrade BPB monotonically with alpha. The
   corrector and TTT-LoRA are both deterministic functions of the scored
   prefix, so additively combining them over-counts the prefix evidence.
   Claim is bounded to the tested grid on this stack; does not generalize
   to all posterior correctors or non-TTT eval pipelines.

3. Fix for the quantized-eval-only branch of train_gpt.py (two guards at
   lines 3204 and 3259) that previously crashed on None-model dereference
   when EVAL_ONLY_QUANTIZED_PATH was set. Surfaced while running the
   ablations in contribution 2.

Artifact: 15,999,394 bytes (606 bytes of competition-cap headroom).
Single-seed scope, acknowledged. Compliance with Issue openai#1017 Section III
walked line-by-line in README.

Also updates three internal docs to reference the renamed HF artifact
repo (amay01/parameter-golf-pr1610-reproduction-artifacts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
amrayach added a commit to amrayach/parameter-golf that referenced this pull request Apr 19, 2026
…ative result + quantized-eval-only path fix
@cocohearts cocohearts merged commit 96d3c34 into openai:main Apr 29, 2026
hilbertmeng pushed a commit to hilbertmeng/parameter-golf that referenced this pull request Apr 30, 2026
…al_bpb 1.07193 (3-seed mean)

Novel multi-phase global SGD during phased TTT evaluation.
Builds on PR openai#1530 (@samacqua) + PR openai#1610 (@romeerp) phased TTT concept.
3-seed mean: 1.07193 BPB (2.76890 nats), std 0.00063.
Seeds: 42, 0, 1234. All artifacts <16 MB.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants