openai · newjordan · Mar 30, 2026 · Mar 30, 2026 · Mar 30, 2026 · Mar 30, 2026
diff --git a/.gitignore b/.gitignore
@@ -8,4 +8,14 @@ data/manifest.json
 data/docs_selected.jsonl
 .mypy_cache/
 .venv
-logs/
+logs/
+experiments/archive/checkpoints/
+
+# Large binaries — never commit
+*.pt
+*.ptz
+junkyard/results/
+junkyard/checkpoints/
+junkyard/experiments/archive/checkpoints/
+junkyard/experiments/GreenRod_X_1/lab_protocol_20260327/research_hub_*/
+junkyard/experiments/GreenRod_X_1/lab_protocol_20260327/vast_tests/
diff --git a/.hotfix/sitecustomize.py b/.hotfix/sitecustomize.py
@@ -0,0 +1,10 @@
+import os
+
+try:
+    import torch._dynamo as d
+    # Keep compile enabled, but avoid known DDP graph optimizer crash path.
+    d.config.optimize_ddp = False
+    # If a graph still fails, fall back instead of killing the entire run.
+    d.config.suppress_errors = True
+except Exception:
+    pass
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,77 @@
+# Parameter Golf Lab — Agent Protocol
+
+## Orient first
+```
+cat neural/LEADER.md    # current neural SOTA
+cat crawler/LEADER.md   # current crawler SOTA
+```
+These two files tell you where the lab stands. Read them before doing anything.
+
+## Repo structure
+```
+neural/       ← Neural SOTA track (Rascal lineage) — leaderboard #1 focus
+crawler/      ← Crawler track (Bandit_Wagon lineage) — compression/quality focus
+submissions/  ← Competition PR zone. Read submissions/PROTOCOL.md before touching.
+vault/        ← Immutable locked sources. Never modify.
+records/      ← Leaderboard submission records. Never modify.
+scripts/      ← Shared runners. sota_now.sh is the neural baseline runner.
+data/         ← Dataset. Never modify.
+junkyard/     ← Legacy experiments. Read-only reference only.
+```
+
+## Hard rules
+
+**NEVER overwrite a test file.** Always create a new file. If you need to modify
+a training script, copy it first, work on the copy, name it clearly.
+
+**Confirm names before creating.** Ask the user what to name a new leg, script,
+or directory before creating it. Never invent names silently.
+
+**ONE variable per test.** If a run changes more than one thing vs the baseline,
+the result is uninterpretable and the money is gone.
+
+**Gate before 8x.** Every hypothesis runs a 1-GPU 2000-step gate (~$0.50) before
+an 8×H100 full run (~$3-4). Never skip the gate.
+
+**Never submit from TEST_LAB.** Submissions go: `submissions/` zone only.
+Read `submissions/PROTOCOL.md`. Run `bash submissions/validate.sh <records_dir>` first.
+Branch flow: `submission/<name>` → push `fork1` → PR to `openai/parameter-golf`.
+
+## RunPod workflow
+1. Pod always pulls from `TEST_LAB` branch
+2. Commit and push scripts BEFORE launching the pod
+3. On pod: `git pull && bash <script>`
+4. Never push FROM the pod
+5. Pod gets destroyed after the run — save checkpoints before destroying
+
+## Test cycle: Hypothesis → Ablation → Results
+
+Every leg follows this sequence. No skipping steps.
+
+```
+hypothesis.md    ← write FIRST. ONE variable. Why. Gate target.
+train_gpt.py     ← copy from leader, make the ONE change
+gate.sh          ← commit+push → pod pulls TEST_LAB → run (1-GPU, 2000 steps)
+ablation.md      ← fill gate result. Pass? Proceed. Fail? Stop.
+run.sh           ← commit+push → pod pulls TEST_LAB → run (8×H100, 600s, seed=444)
+ablation.md      ← fill full run result. Beats leader? Run confirmation.
+                    confirmation run (8×H100, 600s, seed=300)
+RESULTS.md       ← verdict (PROMOTES / DOES NOT PROMOTE), what we learned, next hyp
+```
+
+New legs are scaffolded with all three files pre-created:
+```bash
+bash scripts/new_leg.sh neural <name>
+bash scripts/new_leg.sh crawler <name>
+```
+
+## Seeds
+- Primary: 444
+- Confirmation: 300
+- Never use 1337
+
+## Cost
+- 8×H100 SXM: ~$13.36/hr
+- Full 10-min run: ~$3-4
+- Gate (1-GPU, 2000 steps): ~$0.50
+- Do not suggest a run without a validated gate or clear hypothesis
diff --git a/LAB_PROTOCOL.md b/LAB_PROTOCOL.md
@@ -0,0 +1,121 @@
+# Lab Protocol — Parameter Golf
+
+_We are competing for #1. Every pod dollar is a decision._
+
+---
+
+## The One Rule
+
+**ONE variable changes per test. If you change two, the result is meaningless and the money is gone.**
+
+Before committing any gate script: diff it against the baseline. Count the differences. If it's more than one, stop.
+
+---
+
+## Pipeline: Gate → Full → Submit
+
+```
+Hypothesis
+    ↓
+Single GPU gate (2000 steps)
+    ↓ passes?
+8×H100 full run (600s, seed=444)
+    ↓ beats baseline?
+8×H100 confirmation (seed=300)
+    ↓ both seeds confirm?
+Submission branch → PR
+```
+
+**Never skip the gate.** A 2000-step single GPU run costs ~$0.50. A full 8×H100 run costs ~$3-4. Skipping the gate to save 10 minutes has cost us runs.
+
+**Never submit on one seed.** Seed variance is real. Two seeds confirming = it's real.
+
+---
+
+## Cost Discipline
+
+- 8×H100 SXM: ~$1.67/hr per GPU = **$13.36/hr for 8×**
+- Full 10-min run (with pod overhead): **~$3-4**
+- Per-race budget: **~$15**
+- Do not suggest a run without a validated gate result or a clear hypothesis
+
+**Reproducing a score we already own = no.** Never re-run a baseline we control unless the architecture changed.
+
+---
+
+## Checkpoints
+
+After every full run, `final_model.pt` gets copied to a unique name immediately:
+
+```bash
+cp final_model.pt checkpoints/EXP_s${SEED}_$(date +%Y%m%d_%H%M%S)_bpb${BPB}.pt
+```
+
+The pod gets destroyed. If the checkpoint isn't saved before that, it's gone.
+
+---
+
+## Script Standards
+
+- Every experiment lives in `experiments/<Name>/`
+- Every experiment has: `run.sh`, `gate.sh` or `gate_1gpu.sh`, `RESULTS.md`
+- `run.sh` uses `train_gpt.py` from the same directory (symlink or copy)
+- Scripts are committed and pushed before the pod fires
+- Never paste raw commands. Always a `.sh` file.
+- Log files go to `experiments/<Name>/results/` or `logs/`
+
+---
+
+## Naming
+
+- Confirm experiment names before creating directories
+- Active series: `Bandit_Wagon_V`, `Bandit_Wagon_V_Cannon`, etc.
+- Superseded experiments → `experiments/archive/`
+- Never reuse a name from a previous run
+
+---
+
+## SOTA Garage
+
+Three active models:
+
+| Track | Model | BPB | Size |
+|-------|-------|-----|------|
+| Neural | Rascal II | 1.10987 | 15.44MB |
+| Crawler | BW5 seed=444 | 1.18672 | 8.61MB |
+| Compression | FX_WING_DELTA | 0.2233 | — (model lost) |
+
+**Submission branch protocol:**
+1. Never submit from TEST_LAB
+2. Create dedicated branch → push to Open-parameter-golf-1 fork → PR to openai/parameter-golf
+3. Every PR needs: `submission.json`, logs, README with reproduce instructions
+
+---
+
+## Experimental Design
+
+- Proxy deltas (500 steps, 1 GPU) inflate **5–15×** vs full run. Never promote from proxy alone.
+- Gate (2000 steps, 1 GPU) is the minimum signal to trust.
+- SWA kicks in at step ~7650. Results before that step are pre-SWA.
+- Wallclock budget is 600s. Extra parameters cost convergence speed — account for this.
+- `COMPILE_FULLGRAPH=1` is now baseline for all BW5+ experiments.
+
+---
+
+## Seeds
+
+- Primary: **444**
+- Confirmation: **300**
+- Never use 1337 for new experiments.
+
+---
+
+## Submission Checklist
+
+- [ ] Two seeds confirmed, both beat baseline
+- [ ] `submission.json` present
+- [ ] Logs committed
+- [ ] README with reproduce instructions
+- [ ] File size ≤ 16MB
+- [ ] Score-first always (no training on val before scoring)
+- [ ] Branch is NOT TEST_LAB