openai · amrayach · Mar 28, 2026 · Mar 28, 2026 · Mar 28, 2026 · Mar 28, 2026
diff --git a/.gitignore b/.gitignore
@@ -8,4 +8,12 @@ data/manifest.json
 data/docs_selected.jsonl
 .mypy_cache/
 .venv
-logs/
+logs/
+diagnostics/*/
+!diagnostics/README.md
+# Agent tooling (Antigravity, Claude Code, Codex)
+.agents/skills/
+.agents/workflows/
+.serena/
+*.pyc
+SESSION.md
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,43 @@
+# Shared Agent Entry Point
+
+Start here for Claude Code, Codex, and Antigravity.
+
+## Read First
+
+1. `docs/campaign/AGENT_SYNC.md`
+2. `CLAUDE.md`
+
+For deep context (campaign strategy, prior experiments, hardware state):
+
+3. `docs/codex-memory/BOOTSTRAP.md` — full bootstrap reading list
+
+## Purpose
+
+`docs/campaign/AGENT_SYNC.md` is the mutable source of truth for:
+
+- current objective
+- current scope
+- latest measured results
+- next commands to run
+
+`CLAUDE.md` contains the standing coordination rules for sessions, updates, and disagreement handling.
+
+## Tool-Specific Config
+
+| Tool | Config | Skills | Workflows |
+|------|--------|--------|-----------|
+| Claude Code | `~/.claude/settings.json` + `.claude/settings.local.json` | `~/.claude/skills/` | `~/.claude/commands/` |
+| Codex | `~/.codex/config.toml` | `~/.codex/skills/` | N/A |
+| Antigravity | `~/.gemini/antigravity/mcp_config.json` | `.agents/skills/` (project-level) | `.agents/workflows/` |
+
+## Current Working Mode
+
+- Active goal: reproduce PR `#1610` directly and layer a full-vocab posterior corrector to push below 1.070 BPB
+- Execution plan: `docs/campaign/PLAN_PR1610_CORRECTOR.md` (locked Revision 3)
+- Source base: `#1610` `train_gpt.py` at SHA `ca191953` (NOT patched D variant)
+- Non-record PR `#1598` remains open and frozen; do not edit unless reviewers request changes
+- Best measured result: canonical D 5-seed mean TTT BPB `1.08129` (sigma = 0.00059)
+- Target: <= 1.070 BPB via #1610 reproduction + posterior corrector
+- Budget: $212 RunPod (~35 runs), deadline Apr 30
+- Fallback cascade defined in plan if corrector < 0.001 BPB gain
+- Out of scope: more Pegasus resubmissions, paid OWC salvage on D stack, SLOT, pre-quant validation TTT, casefold tokenizers, D-variant patching
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,75 @@
+# Coordination Rules
+
+This repo uses one shared handoff protocol for Claude Code, Codex, and Antigravity.
+
+## Entry Point
+
+Every new session must read in this order:
+1. `AGENTS.md` — shared entry point, current working mode
+2. `docs/campaign/AGENT_SYNC.md` — mutable source of truth for objectives, results, next steps
+3. This file (`CLAUDE.md`) — standing rules and operational constraints
+
+This file (`CLAUDE.md`) contains **stable standing rules only**. Do not duplicate mutable campaign state (current objective, latest metrics, next commands) that is already tracked in `AGENT_SYNC.md`.
+
+## Session Rules
+
+1. Treat `docs/campaign/AGENT_SYNC.md` as the current source of truth for:
+   - active objective
+   - current scope
+   - next commands to run
+   - latest measured results
+2. Before starting new work, check `docs/campaign/artifacts/` and `records/` to avoid duplicating completed work.
+3. If you change the objective, next step, or interpretation of results, update `docs/campaign/AGENT_SYNC.md`.
+4. If you make a campaign-level decision or disagree with an earlier recommendation, record it in `docs/codex-memory/decisions.md`.
+5. If a run produces a measured result, append one JSON record to `docs/campaign/results_log.jsonl`. Do not rewrite prior lines.
+6. If you finish a meaningful session, update `docs/codex-memory/project-state.md` and `docs/codex-memory/next-session.md`.
+7. If the task touches campaign strategy or prior experiments, also read:
+   - `docs/codex-memory/project-state.md`
+   - `docs/codex-memory/next-session.md`
+   - `docs/codex-memory/decisions.md`
+8. For competition re-implementations, use source priority:
+   - `openai/parameter-golf` PR code first
+   - local repo code second
+   - papers and generic web sources only to resolve ambiguous math or API details
+9. If a post-training export path looks wrong, debug it on the same checkpoint before spending more H100 time on retraining.
+
+## Working Agreement
+
+- Pegasus `8xH100` is the active development target.
+- Pegasus `A100-80GB` is fallback or grant-supporting evidence, not the mainline path.
+- RunPod is reserved for final validation only.
+- `git clone` and `git pull` are the default sync path for remote workspaces.
+- Use `rsync` only to push local uncommitted changes quickly.
+
+## Challenge Submission Rules
+
+These are stable public rules from the challenge README and should not be rediscovered every session.
+
+- Official leaderboard entry is **record-gated**, not top-5-open-entry.
+- A record submission must beat the current official SOTA by at least `0.005` nats and provide enough logs for `p < 0.01`.
+- Train and eval must each run under `10 minutes` on `8xH100`.
+- Total artifact size is `16,000,000` bytes decimal for code plus compressed model.
+- If a submission does not beat the current record bar, it is a non-record submission, not official leaderboard entry.
+
+## Pegasus Operational Rules
+
+These are stable constraints learned from operational experience. They apply to all Pegasus jobs.
+
+### Launcher
+- **Never use `torchrun --standalone`** on Pegasus multi-GPU. It hangs at rendezvous.
+- Use Slurm-native `srun` with manual rank env vars: `LOCAL_RANK=$SLURM_LOCALID`, `RANK=$SLURM_PROCID`, `WORLD_SIZE=$SLURM_NTASKS`.
+
+### Job output
+- **Never use `| tail -1`** on Pegasus training or install commands. It hides errors and progress.
+- Always set `PYTHONUNBUFFERED=1` or use `python -u` to prevent output buffering.
+
+### Allocation shape
+- **Always include `--nodes=1`** for challenge-shaped `8xH100` runs. Without it, Slurm may split across nodes, breaking NVSwitch locality.
+- Use `--ntasks=8 --gpus-per-task=1 --gpu-bind=none` (not `--gpus=8`).
+- If a job lands on multiple nodes, cancel and relaunch with `--nodes=1`.
+
+### FA3 container path
+- Saved FA3 container: `/netscratch/$USER/containers/pytorch_25.02_fa3.sqsh`
+- **Do not use `--no-deps`** for FA3 wheel on stock NGC 25.02. The container's torch 2.7.0 is ABI-incompatible with FA3 (`undefined symbol: aoti_torch_abi_version`).
+- Do not do ad hoc per-job `pip install` of FA3 once the saved container exists.
+- See `docs/campaign/PEGASUS_H100_RUNBOOK.md` for full container build and benchmark commands.
diff --git a/diagnostics/README.md b/diagnostics/README.md
@@ -0,0 +1,34 @@
+# Diagnostics Index
+
+This directory keeps local copies of analysis outputs pulled from Pegasus so
+diagnostic state survives future training runs.
+
+## Current contents
+
+- `2026-03-31_05c_plus/`
+  - float and float-vs-int6 reports for the best measured branch
+- `2026-03-31_05f/`
+  - cross-run comparison reports against `05c-plus`
+
+## Canonical utilities
+
+- `scripts/diagnostics/diagnose_weights.py`
+  - single-checkpoint weight statistics
+  - float-vs-int6 comparison on the same checkpoint
+- `scripts/diagnostics/compress_probe.py`
+  - export-path feasibility probe for saved `.int6.ptz` artifacts
+
+## Typical commands
+
+From the repo root:
+
+```bash
+python scripts/diagnostics/diagnose_weights.py final_model.pt
+python scripts/diagnostics/diagnose_weights.py final_model.pt final_model.int6.ptz
+python scripts/diagnostics/compress_probe.py diagnostics/2026-03-31_05c_plus/final_model.int6.ptz
+```
+
+## Notes
+
+- The authoritative preserved artifacts live on Pegasus under `/netscratch/$USER/parameter-golf/diagnostics/`.
+- This directory is for pulled reports and local interpretation, not for live training outputs.
diff --git a/docker/runpod-pr1610/Dockerfile b/docker/runpod-pr1610/Dockerfile
@@ -0,0 +1,78 @@
+ARG BASE_IMAGE=runpod/pytorch:1.0.2-cu1281-torch280-ubuntu2204
+FROM ${BASE_IMAGE}
+
+ARG VENV_DIR=/opt/pg-venv
+ARG TORCH_VERSION=2.9.1+cu128
+ARG TORCH_INDEX_URL=https://download.pytorch.org/whl/cu128
+ARG TORCH_CUDA_VERSION=12.8
+ARG FLASH_ATTN_SPEC=flash-attn>=2.8,<2.9
+ARG FLASH_ATTN_WHEEL_URL=
+ARG TORCH_CUDA_ARCH_LIST=9.0
+ARG MAX_JOBS=2
+
+ENV DEBIAN_FRONTEND=noninteractive \
+    VENV_DIR=${VENV_DIR} \
+    PATH=${VENV_DIR}/bin:${PATH} \
+    PYTHONUNBUFFERED=1 \
+    PIP_NO_CACHE_DIR=1 \
+    PIP_DISABLE_PIP_VERSION_CHECK=1
+
+SHELL ["/bin/bash", "-lc"]
+
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends git && \
+    rm -rf /var/lib/apt/lists/*
+
+RUN python3 -m venv "${VENV_DIR}"
+
+RUN python -m pip install --upgrade pip setuptools wheel && \
+    python -m pip install --no-cache-dir --upgrade --force-reinstall \
+        "torch==${TORCH_VERSION}" \
+        --extra-index-url "${TORCH_INDEX_URL}" && \
+    python -m pip install --no-cache-dir --upgrade \
+        numpy \
+        tqdm \
+        huggingface-hub \
+        setuptools \
+        typing-extensions==4.15.0 \
+        datasets \
+        "fsspec>=2023.1.0,<=2026.2.0" \
+        tiktoken \
+        sentencepiece \
+        zstandard \
+        brotli \
+        python-minifier \
+        psutil
+
+# Prefer a prebuilt wheel when one is provided. Otherwise do the one-time
+# source build here so fresh RunPod pods never rebuild flash-attn again.
+RUN if [[ -n "${FLASH_ATTN_WHEEL_URL}" ]]; then \
+        python -m pip install --no-cache-dir --force-reinstall --no-deps "${FLASH_ATTN_WHEEL_URL}"; \
+    else \
+        TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST}" MAX_JOBS="${MAX_JOBS}" \
+        python -m pip install --no-cache-dir --force-reinstall --no-build-isolation --no-deps "${FLASH_ATTN_SPEC}"; \
+    fi
+
+RUN python - <<PY
+import brotli
+import sentencepiece
+import torch
+import triton
+from flash_attn_interface import flash_attn_func, flash_attn_varlen_func
+
+expected_torch = "${TORCH_VERSION}"
+expected_cuda = "${TORCH_CUDA_VERSION}"
+if torch.__version__ != expected_torch:
+    raise SystemExit(f"torch version mismatch: expected {expected_torch}, got {torch.__version__}")
+if torch.version.cuda != expected_cuda:
+    raise SystemExit(f"CUDA runtime mismatch: expected {expected_cuda}, got {torch.version.cuda}")
+
+print(f"torch {torch.__version__}, CUDA {torch.version.cuda}")
+print(f"triton {triton.__version__}")
+print("flash_attn_interface: OK")
+print("python runtime deps: OK")
+PY
+
+WORKDIR /workspace
+# Rebuild and repin the published image digest before the next RunPod session.
+CMD ["sleep", "infinity"]