🌻 GOLDEN SUNFLOWERS — JEPA + Universal Transformer + PhiNTA on a φ-physics substrate by gHashTag · Pull Request #2059 · openai/parameter-golf

gHashTag · 2026-05-01T00:03:44Z

🌻 GOLDEN SUNFLOWERS — JEPA + Universal Transformer + PhiNTA on a φ-physics substrate

Field	Value
Track	`track_non_record_16mb` (4-hour, unrestricted compute)
Status	PROPOSAL — UNTRAINED. No `submission.json`, no BPB number is being claimed.
Path	`records/track_non_record_16mb/2026-04-30_GoldenSunflowers_Proposal/`
Base	derived from merged `records/track_10min_16mb/2026-03-17_LoRA_TTT/train_gpt.py`
Anchor	`φ² + φ⁻² = 3`

What this PR does

Composes three open wish-list items from the README leaderboard into a single train_gpt.py:

🌻 JEPA auxiliary loss — linear-representation form (after PR 1412, discussion issue 1772)
🌻 Universal Transformer — round(φ³) = 4 weight-shared depth loops over a configurable sub-stack
🌻 NTA on random linear maps — PhiNTA: frozen 1/φ-OrthoInit basis + trainable LoRA, pre-head or per-block

All wish-list features are env-var-gated and zero-cost when off:

Env-var	Default	What it does
`PHINTA_ENABLE`	`0`	Activate PhiNTA adapter
`PHINTA_PER_BLOCK`	`0`	Per-block instead of pre-head placement
`PHINTA_RANK`	`0`	LoRA rank, `0` → `round(model_dim / φ)`
`JEPA_LAMBDA`	`0.0`	Weight of the JEPA aux loss; `0` short-circuits the branch
`JEPA_MAX_SPAN_FRAC`	`0.5`	Max patch length as fraction of seq_len
`JEPA_LAYER`	`-1`	Hidden-state tap; `-1` = final pre-norm
`UT_LOOPS`	`1`	Universal-Transformer loop count over the sub-stack
`UT_LAYER_START` / `UT_LAYER_END`	`0` / `0`	Sub-stack range; `END=0` disables UT
`PHI_LR_SCALE`	`1.0`	Multiplier on Muon `MATRIX_LR`

JEPA loss formulation (after PR 1412):

context = (h[a-1] − h[0]) + (h[T-1] − h[b])
patch   =  h[b]   − h[a-1]
loss    = 1 − cos_sim(context, patch)         # added to CE when JEPA_LAMBDA > 0

context + patch ≡ h[T-1] − h[0] partitions the full encoding, forcing hidden states to encode spans linearly.

Bonus: φ-LR is Proven in Coq.Reals — not a hand-tune

PHI_LR_SCALE exposes the constant α_φ = φ⁻³ / 2 ≈ 0.118034 from issue 1742 as a multiplicative override of MATRIX_LR.

This constant is Proven in Coq.Reals — Trinity PhD Ch.4 Theorem 3.1 (alpha_phi_times_phi_cubed, status Qed, tag SAC-1) establishes α_φ · φ³ = 1/2, which rearranges to α_φ = φ⁻³ / 2.

Setting PHI_LR_SCALE = (α_φ / 0.04) ≈ 2.95 lands MATRIX_LR exactly on the PhD's certified α_φ-band; default 1.0 keeps the merged baseline unchanged.

CPU-only verification (no GPU, no dataset)

cd records/track_non_record_16mb/2026-04-30_GoldenSunflowers_Proposal
make verify

Output committed in smoke.log:

[1/5] φ-physics OK: φ²+φ⁻²=3.000000000000 α_φ=0.118034 loops=4
[2/5] PhiNTA OK: trainable=1664 frozen=4096 ratio=0.406
[3/5] JEPA loss OK: 1.6922 (cosine-similarity form)
[4/5] UT loop OK: ‖x_4‖/‖x_0‖=1.0406 expected=1.0406
[5/5] JEPA tap normalisation OK: -1 → last block, in-range indices preserved
🌻 GOLDEN SUNFLOWERS smoke OK · 5/5 · phi^2 + phi^-2 = 3

[1/3] state_dict hash baseline  = 511dbc0164e03b1b…
      state_dict hash GOLDEN SF = 511dbc0164e03b1b…
[2/3] Gated branches inactive at defaults: phinta=None jepa=0 ut_loops=1 ut_end=0
[3/3] forward loss baseline=6.929044246674  GS=6.929044246674  |Δ|=0.00e+00
🌻 baseline equivalence OK · 3/3 · phi^2 + phi^-2 = 3

Citation metadata are valid according to schema version 1.2.0.
theorems/GoldenSunflowers.v: coqc OK (2 Qed)
🌻 GOLDEN SUNFLOWERS · local verify complete

Verification layers:

5/5 module smoke (smoke_modules.py): φ-physics identity, PhiNTA frozen-buffer, JEPA loss, UT loop arithmetic, JEPA-tap normalisation
3/3 baseline byte-equivalence (baseline_equivalence.py): SHA-256 of state_dict matches the merged baseline exactly; forward-loss delta is 0.00e+00 on a fixed input at seed F_17 = 1597
2 Qed + 2 Admitted (theorems/GoldenSunflowers.v, Coq 8.18+): Print Assumptions confirms Qed proofs use only standard Coq.Reals axioms

Honesty / non-claims

No submission.json. No BPB has been measured.
No file under records/track_10min_16mb/ is modified. Diff stays inside the new directory.
All wish-list defaults are no-ops. With every env-var unset, the resulting state_dict is byte-identical to 2026-03-17_LoRA_TTT (proved by baseline_equivalence.py).

Why an untrained proposal?

We do not yet have an 8×H100 sweep. Submitting a submission.json with a fake BPB would be dishonest. This PR ships a wired, CPU-smoke-verified, formally-checked implementation along with compute_grant.md requesting the ~110 8×H100-hours needed to run the full sweep over the 5 canonical Fibonacci seeds F_17..F_21 = {1597, 2584, 4181, 6765, 10946} × 5 configs.

Precedent for proposal-only PRs (still open at time of writing):

After the sweep, this directory's README.md is updated with a 3-seed BPB mean and a submission.json is added.

Compute grant request

compute_grant.md breakdown (≈ 110 8×H100 hours):

Phase	Subtotal (h)
Sanity reproduction (baseline × 1)	0.17
Per-feature ablation (PhiNTA / JEPA / UT × 5 seeds × 4 h)	60.0
Combined GOLDEN SUNFLOWERS (all-three × 5 seeds × 4 h)	20.0
`PHI_LR_SCALE` band (4 grid points × 3 seeds × 4 h)	12.0
Restart / debug buffer (10 %)	9.2
TTT eval + final 3-seed mean rerun	8.0
Total	≈ 109.4

Includes a risk section (JEPA non-convergence, PhiNTA capacity dominance, UT × skip-connection interaction). Negative result is acceptable per the Parameter Golf README.

Constitutional anchors

Trinity PhD monograph (44 chapters, 297 Qed canonical theorems) — gHashTag/trios/docs/phd
t27 standards — SACRED-PHYSICS-001, NUMERIC-STANDARD-001
Rust SoT for 1/φ-init — gHashTag/trios-trainer-igla/src/phi_ortho_init.rs
Internal hardening PR with full review history — gHashTag/parameter-golf-trinity#2

phi^2 + phi^-2 = 3 · 🌻

… + PhiNTA on a phi-physics substrate (UNTRAINED) Track: track_non_record_16mb (4-hour, unrestricted compute) Status: PROPOSAL - UNTRAINED. No submission.json, no BPB number is being claimed. This PR composes three open wish-list items from the README leaderboard: - JEPA (Issue openai#1772, after Robby PR openai#1412) - Universal Transformer (round(phi^3) = 4 weight-shared loops) - NTA on random linear maps (PhiNTA: frozen 1/phi-OrthoInit + LoRA) into a single train_gpt.py derived from the merged baseline at records/track_10min_16mb/2026-03-17_LoRA_TTT/train_gpt.py. All wish-list features are env-var-gated and zero-cost when off: PHINTA_ENABLE / JEPA_LAMBDA / UT_LOOPS / PHI_LR_SCALE all default to no-op. Bonus: PHI_LR_SCALE exposes alpha_phi = phi^-3/2 ~ 0.118034 from Issue openai#1742 as a multiplicative override of MATRIX_LR. The constant is Proven in Coq.Reals as PhD Ch.4 Theorem 3.1 (alpha_phi_times_phi_cubed, Qed, SAC-1) - not a fitted hyperparameter. CPU-only verification (no GPU, no dataset): make verify -> [1/5] phi-physics OK: phi^2+phi^-2=3.000000000000 alpha_phi=0.118034 loops=4 -> [2/5] PhiNTA OK: trainable=1664 frozen=4096 ratio=0.406 -> [3/5] JEPA loss OK: 1.6922 (cosine-similarity form) -> [4/5] UT loop OK: |x_4|/|x_0|=1.0406 expected=1.0406 -> [5/5] JEPA tap normalisation OK -> baseline_equivalence: state_dict SHA-256 = 511dbc0164e03b1b on both sides, forward loss delta = 0.00e+00 at seed F_17 = 1597 with defaults -> CITATION.cff valid (cffconvert) -> theorems/GoldenSunflowers.v: coqc OK (2 Qed) Honesty / non-claims: - No submission.json shipped. - No file under records/track_10min_16mb/ is modified. - All wish-list defaults are no-ops, byte-equivalent to the baseline. Precedent for proposal-only PRs: openai#318 (Neural Cache, research proposal), openai#1247 (ASQU validation proposal). Compute grant request prepared in compute_grant.md (~110 8xH100-hours total: 5 configs x 5 canonical Fibonacci seeds F_17..F_21 + restart buffer + final TTT eval). Internal hardening PR with full review history: gHashTag/parameter-golf-trinity#2 Constitutional anchors: - PhD monograph: gHashTag/trios docs/phd (44 chapters, 297 Qed) - t27 SACRED-PHYSICS-001 (phi constants, Coq-mirrored) - trios-trainer-igla src/phi_ortho_init.rs (Rust SoT for PhiNTA init) Anchor: phi^2 + phi^-2 = 3

…licit link text GitHub auto-expand was rewriting bare references like Issue 1772 into the full issue title (Experimental JEPA on Robbys PR 1412 1772) at render time, making the README very hard to read. All issue and PR references now use explicit link text (PR 1412, issue 1742, etc.) so GitHub does not rewrite them. Body of the upstream PR description follows the same convention. Anchor: phi^2 + phi^-2 = 3

…iv:2512.23675) Wires the End-to-End Test-Time Training algorithm from arXiv:2512.23675 (Sun et al.) into the existing GOLDEN SUNFLOWERS proposal as a 4th zero-cost-default env-var-gated module. ## What changed (4 files, ~50 LOC net) - train_gpt.py - 2 new Hyperparameters env-vars: TTT_INNER_STEPS=0, TTT_LR_INNER=0.0 - new helper _e2e_ttt_inner_step() — runs N extra Adam steps before the canonical single chunk step; no-op when inner_steps<=0 - 1-line wire-in inside the eval-time chunk training loop, BEFORE the canonical zero_grad/backward/step trio (which is preserved verbatim, so the inner_steps=0 path is byte-identical to baseline) - smoke_modules.py - new test [6/6] E2E TTT gate: tripwire optimizer proves inner_steps=0 never calls zero_grad/step (mathematical no-op gate) - README.md - title: + 'E2E TTT' - count: 'three' -> 'four' wish-list items - new table row with arXiv link + env-var schema - smoke.log: regenerated with 6/6 + 3/3 GREEN ## Verification (CPU, no GPU, no data) GOLDEN SUNFLOWERS smoke OK 6/6 · phi^2 + phi^-2 = 3 baseline equivalence OK 3/3 · phi^2 + phi^-2 = 3 state_dict hash UNCHANGED at 511dbc0164e03b1b... (byte-identical to previous) forward loss |Delta| = 0.00e+00 (unchanged) The state_dict hash is byte-identical to the previous proposal because no new module-level Parameter or Buffer is introduced — the helper is a plain function and is only reached when TTT_INNER_STEPS >= 1. ## Honesty / non-claims - No submission.json, no BPB measurement. - Default behaviour at TTT_INNER_STEPS=0 is byte-identical to the merged 2026-03-17_LoRA_TTT baseline (proven by baseline_equivalence.py 3/3). - Paper recommends inner_steps=4 — not enabled by default; configurable via env var only. ## Refs - arXiv:2512.23675 — End-to-End Test-Time Training for Long Context - openai/parameter-golf wish-list item (E2E TTT) - Sister Rust spec ring: gHashTag/trios crates/trios-algorithm-arena/rings/SR-ALG-03 (pins TARGET_VAL_BPB=1.07063, FIBONACCI_SEEDS, EMBARGO_DAYS=14) phi^2 + phi^-2 = 3

gHashTag added 2 commits April 30, 2026 23:56

gHashTag mentioned this pull request May 2, 2026

[non-record / wishlist] E2E TTT — full-model SGD per chunk, val_bpb 1.07063, demonstrates "healing property" #1837

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🌻 GOLDEN SUNFLOWERS — JEPA + Universal Transformer + PhiNTA on a φ-physics substrate#2059

🌻 GOLDEN SUNFLOWERS — JEPA + Universal Transformer + PhiNTA on a φ-physics substrate#2059
gHashTag wants to merge 3 commits intoopenai:mainfrom
gHashTag:feat/golden-sunflowers-jepa-universal-nta

gHashTag commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gHashTag commented May 1, 2026

🌻 GOLDEN SUNFLOWERS — JEPA + Universal Transformer + PhiNTA on a φ-physics substrate

What this PR does

Bonus: φ-LR is Proven in Coq.Reals — not a hand-tune

CPU-only verification (no GPU, no dataset)

Honesty / non-claims

Why an untrained proposal?

Compute grant request

Constitutional anchors

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant