Skip to content

⭐ [SR-ALG-03] e2e-ttt — beat parameter-golf #1837 (val_bpb < 1.07063) #457

@gHashTag

Description

@gHashTag

Soul-name: Bit Per Byte Hunter · Codename: LEAD · Priority: P0-CRITICAL · Kingdom: Cross-kingdom
Ring: GOLD II trios-algorithm-arena / SR-ALG-03 ⭐ WIN lane
Part of: #446
Blocks: BR-OUTPUT(GOLD II), arXiv preprint v1
Blocked by: SR-ALG-00 (issue 4), SR-02 trainer-runner (issue 8)

Goal

Beat the precedent of openai/parameter-golf#1837 (E2E TTT baseline val_bpb 1.07063). Wrap the algorithm from arXiv 2512.23675 — End-to-End Test-Time Training for Long Context as a Rust spec/verifier ring, run it through SR-02 TrainerRunner on three Fibonacci seeds, and ship a 3-seed mean.

Acceptance criteria

  • rings/SR-ALG-03/ with I5 trinity.
  • Deps: SR-ALG-00 + sha2 + anyhow (no Python in crates/).
  • spec.toml: entry_path = "records/track_10min_16mb/<entry>/train_gpt.py", entry_hash = <SHA-256>, env vars TTT_INNER_STEPS, TTT_LR_INNER, TTT_CHUNK_LEN.
  • verifier.rs: pre-run hash check + golden-state byte-equivalence (when TTT_INNER_STEPS=0, state_dict matches the merged baseline byte-for-byte; pattern from PR #2059 baseline_equivalence.py).
  • runner.rs: subprocess-spawns train_gpt.py via SR-02 TrainerRunner integration; never mutates the Python file.
  • 3-seed sweep on F_17 = 1597, F_18 = 2584, F_19 = 4181; 3-seed mean val_bpb < 1.07063.
  • Result row written to Neon bpb_samples via SR-03 with algorithm = "e2e-ttt", entry_hash, coq_theorem = "alpha_phi_phi_cubed".
  • Embargo: 14-day window before public publication; bpb_samples.embargo_until field set.
  • Smoke test (CPU, 1 step, no GPU): make verify style, < 30 s.
  • PR closes this issue, Agent: LEAD trailer.

Notes

  • L11 soul-name MUST be claimed before any train_gpt.py invocation.
  • L7 experience log must record each seed run.
  • φ-LR band (α_φ = φ⁻³ / 2, Theorem 3.1, SAC-1 PROVEN in Coq.Reals) consulted via lr_calibration.rs from trios-trainer-igla.

🌻 phi² + phi⁻² = 3

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions