Non-record: GolfParty — composable scaffolding for every Requests-for-PRs item by EthanYangTW · Pull Request #1978 · openai/parameter-golf

EthanYangTW · 2026-04-30T11:50:52Z

Summary

Single non-record submission that addresses all currently-unchecked items on OpenAI's Requests-for-PRs list (Universal Transformer, megakernels, SSM, E2E TTT, super long context, RLA, JEPA, text diffusion, H-net tokenization) as toggleable env vars on the PR #1953 base. Default config is byte-identical to PR #1953; toggles compose additively.

3-seed mean post-TTT val_bpb 1.07776 (std 0.00126) on 8×H100 SXM. All seeds within the 600s training cap.

Position: NOT a SOTA bid. This is a composability ablation + scaffolding for future record submissions in the directions OpenAI explicitly invited. Aligned with the README's "we strongly encourage participants to submit implementations for weird or out-of-the-box ideas, in-progress or unoptimized solutions, so long as they run successfully, or even interesting negative results."

What's in the box

Request-for-PRs item	Env var	Status
Universal transformer	`KS_UT_DEPTH`	Real — extends PR #1344 Loop4-5 by K extra cycles
Megakernels	`KS_MEGAKERNEL`	Real (already shipping) — surfaces fused LeakyReLU² MLP + softcapped CE Triton kernels
Super long context	`KS_LONG_CONTEXT` + `EVAL_SEQ_LEN=3072`	Real
Random linear adapters	`TTT_RLA_ENABLED`	Real — frozen orthonormal A, only B learnable
Text diffusion	`KS_DIFFUSION_FRAC`	Real — training-time embedding-noise auxiliary
E2E TTT	`KS_E2E_TTT`	Wired but disabled — OOM at EVAL_SEQ_LEN=3072 + UT depth
JEPA	`KS_JEPA_WEIGHT`	Wired but disabled — GPTQ Hessian KeyError on aux head
State-space models	`KS_SSM_LAST_K`	Stub — Python-loop scan compile-toxic
H-net tokenization	`KS_HNET_CHUNK`	Stub — dynamic-shape padding compile-toxic

5 active in shipped 3-seed config; 2 wired-with-blocker; 2 stubs. All 9 documented in notes/.

Per-seed results

Seed	Pre-quant	Quant	Post-TTT	Eval s	Artifact bytes
42	1.07594	1.08396	1.07631	359.6	16,008,464
1234	1.07726	1.08531	1.07860	353.2	16,003,972
0	1.07717	1.08508	1.07838	359.7	16,000,415
Mean	1.07679	1.08478	1.07776	357.5	16,004,284
Std	0.00073	0.00073	0.00126	3.7	4,030

vs current rank-1 PR #1855 (1.06108): +0.01668 BPB (regression — non-record).

Artifact size note: all 3 seeds came in 415-8464 bytes above the 16,000,000-byte cap. Driven by ~6KB compressed kitchen-sink scaffolding + ~5KB bf16 run-to-run variance. Trivially fixable (strip toy classes / bump WD); kept as-shipped to preserve full scaffolding visibility for review.

Test plan

All 3 seeds train within 600s wallclock on 8×H100 SXM (max 596.176s)
All 3 seeds eval within 600s (max 359.7s)
Per-seed train + eval logs included
Default config (all KS_*=0, TTT_RLA_ENABLED=0) is byte-identical to PR Record: PR #1945 base + 2560 long-context + no_qv TTT mask + TTT LR 0.75 + QK_GAIN 5.25 — val_bpb 1.05855 (3-seed mean) #1953
Each technique labeled real / blocked-with-reason / stub in notes/
Artifact size 415-8464 bytes over cap — would require a small fix-up commit before any record bid

Lineage

PR #1953 → #1945 → #1923 → #1908 → #1855 → #1797 → #1787 → #1729 → #1667 → #1530 → #1394 → #1344. Toy implementations of SSM, JEPA, diffusion, H-net introduced in this submission.

…770 (3-seed mean)

…-PRs item Single non-record submission that addresses all currently-unchecked items on OpenAI's Requests-for-PRs list (Universal Transformer, megakernels, SSM, E2E TTT, super long context, RLA, JEPA, text diffusion, H-net tokenization) as toggleable env vars on the PR openai#1953 base. Default config is byte-identical to PR openai#1953; toggles compose additively. 3-seed mean post-TTT val_bpb 1.07776 (std 0.00126), all seeds within 600s training cap on 8xH100 SXM. Each technique honestly labeled real / wired-but-disabled-with-reason / stub-for-future-work in notes/. Real wired toggles in shipped run: KS_UT_DEPTH, KS_LONG_CONTEXT (via EVAL_SEQ_LEN=3072), KS_DIFFUSION_FRAC, TTT_RLA_ENABLED, KS_MEGAKERNEL. Wired-but-disabled (with documented blocker): KS_E2E_TTT (OOM), KS_JEPA_WEIGHT (GPTQ Hessian KeyError on aux head). Stubs (compile-toxic): KS_SSM_LAST_K, KS_HNET_CHUNK. Includes: per-seed train+eval logs, per-feature notes/, 3-seed launcher, CaseOps SP8192 tokenizer. Position: not a SOTA bid. Composability ablation + scaffolding for future record submissions in the directions OpenAI explicitly invited.

Copilot

Pull request overview

Adds a new non-record submission (“GolfParty”) under records/track_10min_16mb/ that scaffolds multiple “Requests-for-PRs” techniques behind env-var toggles on top of the PR #1953 lineage, along with full 3-seed logs and writeups. The PR also includes an additional 2026-04-26_V2_PE_MinLR_AttnGate/ record directory, which appears unrelated and incomplete.

Changes:

Add 2026-04-30_GolfParty_AllChecks/ with a modified train_gpt.py, 3-seed logs, submission.json, reproduction script, tokenizer model, and per-feature notes.
Document 9 technique toggles in README.md and notes/*.md.
Add 2026-04-26_V2_PE_MinLR_AttnGate/ with a README and a wrapper train_gpt.py (but missing other required submission artifacts).

Reviewed changes

Copilot reviewed 14 out of 19 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
records/track_10min_16mb/2026-04-30_GolfParty_AllChecks/train_gpt.py	Implements technique toggles (UT depth recurrence, RLA, diffusion noise, JEPA aux loss wiring, etc.) on the PR #1953-style codebase.
records/track_10min_16mb/2026-04-30_GolfParty_AllChecks/README.md	Describes the submission, toggles, results, and reproduction guidance.
records/track_10min_16mb/2026-04-30_GolfParty_AllChecks/submission.json	Submission metadata and per-seed metrics.
records/track_10min_16mb/2026-04-30_GolfParty_AllChecks/run_kitchen_3seed.sh	3-seed launcher script intended to reproduce the run.
records/track_10min_16mb/2026-04-30_GolfParty_AllChecks/train_seed42.log	Seed 42 training/quant/TTT log.
records/track_10min_16mb/2026-04-30_GolfParty_AllChecks/train_seed1234.log	Seed 1234 training/quant/TTT log.
records/track_10min_16mb/2026-04-30_GolfParty_AllChecks/train_seed0.log	Seed 0 training/quant/TTT log.
records/track_10min_16mb/2026-04-30_GolfParty_AllChecks/tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model	Tokenizer model file referenced by the submission.
records/track_10min_16mb/2026-04-30_GolfParty_AllChecks/notes/universal.md	Explains UT-depth toggle intent and limitations.
records/track_10min_16mb/2026-04-30_GolfParty_AllChecks/notes/megakernel.md	Documents “megakernel” claim as surfacing existing fused kernels.
records/track_10min_16mb/2026-04-30_GolfParty_AllChecks/notes/long_context.md	Documents long-context evaluation toggle.
records/track_10min_16mb/2026-04-30_GolfParty_AllChecks/notes/e2e_ttt.md	Documents E2E TTT toggle and limitations.
records/track_10min_16mb/2026-04-30_GolfParty_AllChecks/notes/rla.md	Documents Random Linear Adapter (RLA) toggle behavior.
records/track_10min_16mb/2026-04-30_GolfParty_AllChecks/notes/ssm.md	Documents SSM stub and why it’s not compiled/wired.
records/track_10min_16mb/2026-04-30_GolfParty_AllChecks/notes/jepa.md	Documents JEPA aux-loss wiring and blockers.
records/track_10min_16mb/2026-04-30_GolfParty_AllChecks/notes/diffusion.md	Documents diffusion-inspired embedding noise feature.
records/track_10min_16mb/2026-04-30_GolfParty_AllChecks/notes/hnet.md	Documents H-net pooling stub.
records/track_10min_16mb/2026-04-26_V2_PE_MinLR_AttnGate/train_gpt.py	Adds a decompression wrapper script for another record folder.
records/track_10min_16mb/2026-04-26_V2_PE_MinLR_AttnGate/README.md	Describes a separate “record” submission that appears incomplete in this PR.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    {"seed": 42,   "post_ttt_val_bpb": 1.07631, "pre_quant_val_bpb": 1.07594, "quantized_val_bpb": 1.08396, "eval_seconds": 359.6, "artifact_bytes": 16008464, "stop_step": 4538},
+    {"seed": 1234, "post_ttt_val_bpb": 1.07860, "pre_quant_val_bpb": 1.07726, "quantized_val_bpb": 1.08531, "eval_seconds": 353.2, "artifact_bytes": 16003972, "stop_step": 4534},
+    {"seed": 0,    "post_ttt_val_bpb": 1.07838, "pre_quant_val_bpb": 1.07717, "quantized_val_bpb": 1.08508, "eval_seconds": 359.7, "artifact_bytes": 16000415, "stop_step": 4533}


+> **Position: not a SOTA bid.** This submission addresses every currently-
+> unchecked item on OpenAI's "Requests for PRs" list as a *single composable
+> recipe*, with each technique behind an env-var toggle. Default config is
+> byte-identical to the parent **PR #1953** stack; toggles compose
+> additively.


+# Record: SP8192 + PE + MIN_LR + SmearGate + AttnOutGate + 4ep TTT — val_bpb 1.0770 (3-seed mean)
+
+**val_bpb = 1.0770** (3-seed mean, std 0.0004) | **~15.98 MB** | 8xH100 SXM
+
+## 3-Seed Results
+
+| Seed | Steps | Sliding BPB | **TTT BPB** | Artifact (bytes) |
+|------|-------|-------------|-------------|-------------------|
+| 1337 | 4631 | 1.0785 | **1.0772** | 15,982,989 |
+| 42 | 4637 | 1.0777 | **1.0765** | 15,984,317 |
+| 2024 | 4633 | 1.0784 | **1.0772** | 15,985,404 |
+| **Mean** | **4634** | **1.0782** | **1.0770** | **15,984,237** |
+| **Std** | | 0.0004 | **0.0004** | |
+


@@ -0,0 +1,42 @@
+#!/usr/bin/env bash
+set -euo pipefail
+cd /workspace/parameter-golf/records/track_10min_16mb/2026-04-30_ParamGolfKitchen_AllChecks


+**Note on artifact size:** all three seeds came in slightly above the
+16,000,000-byte cap (max 16,008,464, min 16,000,415). The overage is
+~0.05% of the cap and is driven by (a) the kitchen-sink scaffolding
+adding ~6 KB compressed code over the parent PR #1953 baseline, and
+(b) bf16 non-determinism shifting model compressibility by ±5 KB
+run-to-run. A trivial fix (strip the ToySSMBlock / ToyJEPAHead class
+defs before serialization, or bump weight decay slightly) brings the
+artifact comfortably under cap. *Not* applied in the as-shipped run
+because we wanted to preserve the full kitchen-sink scaffolding visible
+to anyone reading the train_gpt.py for review.


+        # KS_DIFFUSION_FRAC: training-time embedding-noise auxiliary. Replace
+        # `frac` of token embeddings with Gaussian noise. Toy 1-step denoising
+        # signal — only fires when self.training and ks_diffusion_frac > 0.
+        if self.training and getattr(self, "ks_diffusion_frac", 0.0) > 0.0:
+            x, _diff_mask = ks_diffusion_perturb(x, self.ks_diffusion_frac)


+    """
+    B, T, D = emb.shape
+    mask = (torch.rand(B, T, 1, device=emb.device, generator=generator) < frac).to(emb.dtype)
+    noise = torch.randn_like(emb) * emb.std()


+def ks_hnet_pool(h, chunk):
+    """H-net hierarchical chunk pooling: mean-pool every `chunk` tokens
+    so a coarse-grained downstream attention pass can run cheaply over
+    summaries. Returns (coarse, gather_index) — coarse[b, t//chunk] is
+    the summary for the chunk containing t.
+    """
+    B, T, D = h.shape
+    pad = (chunk - T % chunk) % chunk
+    if pad:
+        h = F.pad(h, (0, 0, 0, pad))
+    h2 = h.reshape(B, (T + pad) // chunk, chunk, D).mean(dim=2)
+    return h2  # (B, T_coarse, D)


Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

EthanYangTW added 3 commits April 26, 2026 16:15

Record: SP8192 + PE + SmearGate + AttnOutGate + 4ep TTT — val_bpb 1.0…

c0ec37a

…770 (3-seed mean)

Fix: default GLOBAL_TTT_ENABLED=0 to match reported results

8f3c65a

EthanYangTW marked this pull request as ready for review April 30, 2026 11:51

Copilot AI review requested due to automatic review settings April 30, 2026 11:51

Copilot started reviewing on behalf of EthanYangTW April 30, 2026 11:52 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Potential fix for pull request finding

22b101e

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: GolfParty — composable scaffolding for every Requests-for-PRs item#1978

Non-record: GolfParty — composable scaffolding for every Requests-for-PRs item#1978
EthanYangTW wants to merge 4 commits intoopenai:mainfrom
EthanYangTW:submission/golfparty-allchecks

EthanYangTW commented Apr 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

EthanYangTW commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in the box

Per-seed results

Test plan

Lineage

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

EthanYangTW commented Apr 30, 2026 •

edited

Loading