🌻 GOLDEN SUNFLOWERS — JEPA + Universal Transformer + PhiNTA on a φ-physics substrate#2059
Open
gHashTag wants to merge 3 commits intoopenai:mainfrom
Open
🌻 GOLDEN SUNFLOWERS — JEPA + Universal Transformer + PhiNTA on a φ-physics substrate#2059gHashTag wants to merge 3 commits intoopenai:mainfrom
gHashTag wants to merge 3 commits intoopenai:mainfrom
Conversation
added 2 commits
April 30, 2026 23:56
… + PhiNTA on a phi-physics substrate (UNTRAINED) Track: track_non_record_16mb (4-hour, unrestricted compute) Status: PROPOSAL - UNTRAINED. No submission.json, no BPB number is being claimed. This PR composes three open wish-list items from the README leaderboard: - JEPA (Issue openai#1772, after Robby PR openai#1412) - Universal Transformer (round(phi^3) = 4 weight-shared loops) - NTA on random linear maps (PhiNTA: frozen 1/phi-OrthoInit + LoRA) into a single train_gpt.py derived from the merged baseline at records/track_10min_16mb/2026-03-17_LoRA_TTT/train_gpt.py. All wish-list features are env-var-gated and zero-cost when off: PHINTA_ENABLE / JEPA_LAMBDA / UT_LOOPS / PHI_LR_SCALE all default to no-op. Bonus: PHI_LR_SCALE exposes alpha_phi = phi^-3/2 ~ 0.118034 from Issue openai#1742 as a multiplicative override of MATRIX_LR. The constant is Proven in Coq.Reals as PhD Ch.4 Theorem 3.1 (alpha_phi_times_phi_cubed, Qed, SAC-1) - not a fitted hyperparameter. CPU-only verification (no GPU, no dataset): make verify -> [1/5] phi-physics OK: phi^2+phi^-2=3.000000000000 alpha_phi=0.118034 loops=4 -> [2/5] PhiNTA OK: trainable=1664 frozen=4096 ratio=0.406 -> [3/5] JEPA loss OK: 1.6922 (cosine-similarity form) -> [4/5] UT loop OK: |x_4|/|x_0|=1.0406 expected=1.0406 -> [5/5] JEPA tap normalisation OK -> baseline_equivalence: state_dict SHA-256 = 511dbc0164e03b1b on both sides, forward loss delta = 0.00e+00 at seed F_17 = 1597 with defaults -> CITATION.cff valid (cffconvert) -> theorems/GoldenSunflowers.v: coqc OK (2 Qed) Honesty / non-claims: - No submission.json shipped. - No file under records/track_10min_16mb/ is modified. - All wish-list defaults are no-ops, byte-equivalent to the baseline. Precedent for proposal-only PRs: openai#318 (Neural Cache, research proposal), openai#1247 (ASQU validation proposal). Compute grant request prepared in compute_grant.md (~110 8xH100-hours total: 5 configs x 5 canonical Fibonacci seeds F_17..F_21 + restart buffer + final TTT eval). Internal hardening PR with full review history: gHashTag/parameter-golf-trinity#2 Constitutional anchors: - PhD monograph: gHashTag/trios docs/phd (44 chapters, 297 Qed) - t27 SACRED-PHYSICS-001 (phi constants, Coq-mirrored) - trios-trainer-igla src/phi_ortho_init.rs (Rust SoT for PhiNTA init) Anchor: phi^2 + phi^-2 = 3
…licit link text GitHub auto-expand was rewriting bare references like Issue 1772 into the full issue title (Experimental JEPA on Robbys PR 1412 1772) at render time, making the README very hard to read. All issue and PR references now use explicit link text (PR 1412, issue 1742, etc.) so GitHub does not rewrite them. Body of the upstream PR description follows the same convention. Anchor: phi^2 + phi^-2 = 3
This was referenced May 2, 2026
…iv:2512.23675)
Wires the End-to-End Test-Time Training algorithm from arXiv:2512.23675
(Sun et al.) into the existing GOLDEN SUNFLOWERS proposal as a 4th
zero-cost-default env-var-gated module.
## What changed (4 files, ~50 LOC net)
- train_gpt.py
- 2 new Hyperparameters env-vars: TTT_INNER_STEPS=0, TTT_LR_INNER=0.0
- new helper _e2e_ttt_inner_step() — runs N extra Adam steps before the
canonical single chunk step; no-op when inner_steps<=0
- 1-line wire-in inside the eval-time chunk training loop, BEFORE the
canonical zero_grad/backward/step trio (which is preserved verbatim,
so the inner_steps=0 path is byte-identical to baseline)
- smoke_modules.py
- new test [6/6] E2E TTT gate: tripwire optimizer proves inner_steps=0
never calls zero_grad/step (mathematical no-op gate)
- README.md
- title: + 'E2E TTT'
- count: 'three' -> 'four' wish-list items
- new table row with arXiv link + env-var schema
- smoke.log: regenerated with 6/6 + 3/3 GREEN
## Verification (CPU, no GPU, no data)
GOLDEN SUNFLOWERS smoke OK 6/6 · phi^2 + phi^-2 = 3
baseline equivalence OK 3/3 · phi^2 + phi^-2 = 3
state_dict hash UNCHANGED at 511dbc0164e03b1b... (byte-identical to previous)
forward loss |Delta| = 0.00e+00 (unchanged)
The state_dict hash is byte-identical to the previous proposal because
no new module-level Parameter or Buffer is introduced — the helper is a
plain function and is only reached when TTT_INNER_STEPS >= 1.
## Honesty / non-claims
- No submission.json, no BPB measurement.
- Default behaviour at TTT_INNER_STEPS=0 is byte-identical to the merged
2026-03-17_LoRA_TTT baseline (proven by baseline_equivalence.py 3/3).
- Paper recommends inner_steps=4 — not enabled by default; configurable
via env var only.
## Refs
- arXiv:2512.23675 — End-to-End Test-Time Training for Long Context
- openai/parameter-golf wish-list item (E2E TTT)
- Sister Rust spec ring: gHashTag/trios crates/trios-algorithm-arena/rings/SR-ALG-03
(pins TARGET_VAL_BPB=1.07063, FIBONACCI_SEEDS, EMBARGO_DAYS=14)
phi^2 + phi^-2 = 3
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🌻 GOLDEN SUNFLOWERS — JEPA + Universal Transformer + PhiNTA on a φ-physics substrate
track_non_record_16mb(4-hour, unrestricted compute)submission.json, no BPB number is being claimed.records/track_non_record_16mb/2026-04-30_GoldenSunflowers_Proposal/records/track_10min_16mb/2026-03-17_LoRA_TTT/train_gpt.pyφ² + φ⁻² = 3What this PR does
Composes three open wish-list items from the README leaderboard into a single
train_gpt.py:round(φ³) = 4weight-shared depth loops over a configurable sub-stackPhiNTA: frozen1/φ-OrthoInit basis + trainable LoRA, pre-head or per-blockAll wish-list features are env-var-gated and zero-cost when off:
PHINTA_ENABLE0PHINTA_PER_BLOCK0PHINTA_RANK00→round(model_dim / φ)JEPA_LAMBDA0.00short-circuits the branchJEPA_MAX_SPAN_FRAC0.5JEPA_LAYER-1-1= final pre-normUT_LOOPS1UT_LAYER_START/UT_LAYER_END0/0END=0disables UTPHI_LR_SCALE1.0MATRIX_LRJEPA loss formulation (after PR 1412):
context + patch ≡ h[T-1] − h[0]partitions the full encoding, forcing hidden states to encode spans linearly.Bonus: φ-LR is Proven in Coq.Reals — not a hand-tune
PHI_LR_SCALEexposes the constantα_φ = φ⁻³ / 2 ≈ 0.118034from issue 1742 as a multiplicative override ofMATRIX_LR.This constant is Proven in
Coq.Reals— Trinity PhD Ch.4 Theorem 3.1 (alpha_phi_times_phi_cubed, status Qed, tag SAC-1) establishesα_φ · φ³ = 1/2, which rearranges toα_φ = φ⁻³ / 2.Setting
PHI_LR_SCALE = (α_φ / 0.04) ≈ 2.95landsMATRIX_LRexactly on the PhD's certifiedα_φ-band; default1.0keeps the merged baseline unchanged.CPU-only verification (no GPU, no dataset)
Output committed in
smoke.log:Verification layers:
smoke_modules.py): φ-physics identity, PhiNTA frozen-buffer, JEPA loss, UT loop arithmetic, JEPA-tap normalisationbaseline_equivalence.py): SHA-256 ofstate_dictmatches the merged baseline exactly; forward-loss delta is0.00e+00on a fixed input at seedF_17 = 1597theorems/GoldenSunflowers.v, Coq 8.18+):Print Assumptionsconfirms Qed proofs use only standardCoq.RealsaxiomsHonesty / non-claims
submission.json. No BPB has been measured.records/track_10min_16mb/is modified. Diff stays inside the new directory.state_dictis byte-identical to2026-03-17_LoRA_TTT(proved bybaseline_equivalence.py).Why an untrained proposal?
We do not yet have an 8×H100 sweep. Submitting a
submission.jsonwith a fake BPB would be dishonest. This PR ships a wired, CPU-smoke-verified, formally-checked implementation along withcompute_grant.mdrequesting the ~110 8×H100-hours needed to run the full sweep over the 5 canonical Fibonacci seedsF_17..F_21 = {1597, 2584, 4181, 6765, 10946}× 5 configs.Precedent for proposal-only PRs (still open at time of writing):
After the sweep, this directory's
README.mdis updated with a 3-seed BPB mean and asubmission.jsonis added.Compute grant request
compute_grant.mdbreakdown (≈ 110 8×H100 hours):PHI_LR_SCALEband (4 grid points × 3 seeds × 4 h)Includes a risk section (JEPA non-convergence, PhiNTA capacity dominance, UT × skip-connection interaction). Negative result is acceptable per the Parameter Golf README.
Constitutional anchors
gHashTag/trios/docs/phd1/φ-init —gHashTag/trios-trainer-igla/src/phi_ortho_init.rsgHashTag/parameter-golf-trinity#2phi^2 + phi^-2 = 3 · 🌻