Non-record: #1610 reproduction (Δ=+1.9e-5 BPB), n-gram posterior corrector negative result, quantized-eval-only path fix by amrayach · Pull Request #1740 · openai/parameter-golf

amrayach · 2026-04-19T13:07:08Z

This package is intentionally narrow: it does not remix multiple frontier submissions into a new record claim. Instead, it reproduces one current frontier line to near-exact fidelity, tests one new adaptive corrector path against that reproduced baseline, and reports both the measured negative result and the eval-only fix required to obtain it.

Prior context

Previous submissions in this line: #1101 (pre-TTT anchor, 1.1290 BPB), #1307 (07c1 strict base proof vs merged #1019), #1598 (SP8192-D 5-seed evidence package).

Contributions

Reproduction of PR Record: VarLenAttn + PhasingTTT - val_bpb 1.0728 (3-seed mean) #1610 on independent infrastructure. Seed-0 BPB 1.07218477 vs Record: VarLenAttn + PhasingTTT - val_bpb 1.0728 (3-seed mean) #1610 published seed-0 1.07216564 → Δ = +1.913×10⁻⁵. Run on 8× H100 80GB HBM3 SXM5 (RunPod) at this branch's commit 1765afc (pins Record: VarLenAttn + PhasingTTT - val_bpb 1.0728 (3-seed mean) #1610 at upstream ca19195).
Bounded negative result for a score-first n-gram posterior corrector layered on Record: VarLenAttn + PhasingTTT - val_bpb 1.0728 (3-seed mean) #1610's phased LoRA TTT eval path. All three tested (alpha, orders) configs degrade BPB, monotonically in alpha. Multi-order backoff provides no measurable benefit over single-order at the same blend weight.
Bug fix in train_gpt.py's quantized-eval-only branch (two guards at lines 3204 and 3259). Without these, EVAL_ONLY_QUANTIZED_PATH crashes on None-model dereference. Surfaced while running the ablations in Contribution 2.

The reproduction is a credibility prerequisite for the negative-result claim, not a contribution in itself. The corrector formulation and its Section-III-compliance engineering are the only novel content. The bug fix is incidental.

Reproduction result

	Value
Our seed-0 BPB	1.07218477
Published #1610 seed-0 BPB	1.07216564
Δ vs published seed-0	+1.913×10⁻⁵
Eval wall-clock	455.9 s
Artifact	15,999,394 bytes (606 B under the 16 MB cap)

Training stopped at step 4,879 of 20,000 due to MAX_WALLCLOCK_SECONDS=600 - GPTQ_RESERVE_SECONDS=13 (by design in #1610). The training log's GATE_A: FAIL line is our internal pipeline's 15,997,520-byte safety threshold (intended to absorb code-size drift); the artifact passes the competition rule.

Corrector ablation

All three run in eval-only mode against the reproduced seed-0 checkpoint — no retraining.

Run	α	orders	BPB	Δ BPB (run − baseline; positive = worse)	Eval (s)
Baseline	0.0	—	1.07218477	0	455.9
1a	0.3	[8]	1.08876294	+0.01658	462.8
1b	0.3	[5, 8, 12]	1.08891256	+0.01673	472.4
1c	0.1	[5, 8, 12]	1.07430360	+0.00212	465.8

The effect at α=0.1 is ~1/8 of the effect at α=0.3 — first-order linear in α, no inflection toward improvement. Structurally, TTT-LoRA and the n-gram corrector are both deterministic functions of the scored prefix x_{1..t-1}; adding alpha * log(q_prefix_ngram(v)) on top of logits that already encode P(x_t | x_{1..t-1}) under TTT adaptation over-counts the prefix evidence. This predicts the monotonic-in-α result and predicts a non-TTT eval pipeline might behave differently. The latter was not tested.

This PR rules out one tested posterior-corrector path on a reproduced #1610-class phased-TTT stack; it does not claim that all n-gram or posterior correctors are ineffective.

Eval-only bug fix

In EVAL_ONLY_QUANTIZED_PATH mode, base_model, compiled_model, and compiled_forward_logits are all None (line 3188), but two downstream paths dereferenced them:

The pre-quantization diagnostic timed_eval("diagnostic pre-quantization post-ema", ...) dereferenced compiled_model.forward_logits → AttributeError.
The TTT-branch del eval_model, compiled_model cleanup referenced eval_model which was never bound in this mode → UnboundLocalError.

Fix: if not quantized_eval_only: guard on the diagnostic (line 3204), and extend the existing cleanup guard to cover this branch (line 3259). The post-quantization diagnostic still runs because it calls deserialize(h, device) directly and does not touch the None locals.

Compliance with Issue #1017 Section III

Walked line-by-line in the folder README under "Compliance with Issue #1017 Section III". Summary:

C1 (causal): PrefixNgramCorrector state (lines 15-58) populated only via update(x_t), which runs after scoring.
C2 (full distribution): Blend is logits + alpha * log(q_t) over full V=8192 (line 1122). Laplace init (line 23) guarantees q_t(v) > 0 for all v. Full [V] tensor add, not gathered single-index.
C3 (score-before-update): Bias collected (line 2564), score forward pass (line 2567), BPB accumulation (lines 2568-2582), then update(_tok) (line 2591). Explicit inline comment at line 2583: # Corrector: update state with scored tokens (score-before-update).
C4 (single pass): One forward pass over validation. Global SGD steps between chunks do not re-score prior positions. Corrector state is reset after global SGD.

Warmup uses synthetic tokens only, via a device-local RNG generator (lines 3324-3365). Timer starts at torch.cuda.synchronize(); t_ttt = time.perf_counter() (lines 3370-3371) after warmup closes.

The chunk-static bias approximation is a deliberate engineering choice (per-position bias would cost 32× more GPU forwards or a ~2 GB [B, S, V] dense tensor per batch per rank, both breaking the time/memory budget). It satisfies score-before-update at chunk granularity rather than per-position — the bias inside chunk c uses only tokens from chunks [0, c). Explicit in the corrector's docstring.

Scope

Single-seed (seed 0). Reproduction is compared against #1610's published seed-0 number (1.07216564), not their 3-seed mean. Multi-seed validation was descoped: given a +1.9×10⁻⁵ BPB delta against the matched seed and monotonic +0.002 to +0.017 degradation across the corrector grid, additional seeds would refine variance but are unlikely to flip either direction. The negative-result claim is bounded to seed 0 of the reproduced checkpoint.

Out of scope in this package: α < 0.1, orders > 12, logistic-domain blends, non-TTT eval pipelines.

Artifacts

Self-contained in records/track_non_record_16mb/2026-04-19_pr1610_reproduction_corrector_negative/: train_gpt.py, submission.json, requirements.txt, raw train_seed0.log + three ablation_1[abc].log, machine-readable reproduction_summary.json and ablation_summary.json, plus provenance/ (commit SHA, env fingerprint, nvidia-smi). Training logs are raw; the training script writes compact metrics-only output by design.

Supplementary external archive: https://huggingface.co/amay01/parameter-golf-pr1610-reproduction-artifacts (141 MB tarball, MD5 caf8adf63d8c80965f6671beba95d7aa). Contains preserved checkpoints (final_model.int6.ptz, final_model.pt) and full intermediate artifacts. Not required to reproduce the headline number.

- 5-run A100 evidence package (baseline, seed42, LowerLR, warmdown, smoke) - Campaign sessions 01-07 with templates and runbook - Pre-TTT anchor diff analysis and root port-gap audit - V100/fp16 compatibility via AMP_DTYPE auto-detection - Agent coordination files (CLAUDE.md, AGENTS.md, AGENT_SYNC.md) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…cripts

…te, pre-run Self-contained anchor script porting the 2026-03-21 donor features onto the root train_gpt.py skeleton. Features: SmearGate, BigramHash, XSA-last-4, partial RoPE (16/64) with NTK scaling, LN scale, EMA, Muon/Adam weight decay, mixed int6+zstd export, stride-64 sliding eval. SDPA attention backend (no flash_attn_3 dependency). All 15 code review checks passed. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…constructors The parameter was defined in Hyperparameters but hardcoded as 1024 in CausalSelfAttention.__init__ instead of being passed through GPT → Block → CausalSelfAttention. This would have caused a TypeError on model construction. Found by Codex local validation (CPU forward pass). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…ning NGC PyTorch containers don't ship sentencepiece. Install via Pegasus PyPI cache (with fallback to public PyPI) at container launch time. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

The oom_kill was a Slurm memory limit (not GPU OOM). Default Slurm memory allocation is too low for the anchor model's grad accumulation buffer on a single GPU. Request 64G explicitly. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

8 tasks simultaneously pip-installing inside NGC container caused Slurm OOM kill. Fix: only rank 0 installs, others wait 5s. Also request all available node memory with --mem=0 in salloc mode. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

The math backend is a slow fallback that PyTorch may pick for certain shapes. The donor disables it. This was flagged as observation I1 in the code review but left as-is for safety. Now that the smoke test confirms flash SDP works, disable math for better throughput. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Sliding s64 val_bpb: 1.12904446 (target 1.123-1.128, landed at edge) Steps: 6564, step_avg: 91.37ms, artifact: 15,751,324 bytes Gap from donor (1.1248): +0.0042 BPB — entirely throughput (487 fewer steps) Bottleneck: SDPA (FA2) vs donor's FA3, not model fidelity Environment: 8xH100 SXM5, serv-3342, NGC 26.03 container, /fscratch data All coordination docs updated for Session 04 handoff. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…dates Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…thesis + doc sync Delta 1 (GPTQ-lite percentile clip search): - Sliding s64: 1.12941 (worse than anchor 1.12904 by +0.0004) - Artifact: 16,219,752 bytes — OVER 16MB cap by 219KB - Conclusion: clip search hurts zstd compressibility, export gap not clip-related Novel approach synthesis (04b): - 50 ideas from 8 parallel AI research queries (ChatGPT/Claude/Gemini/Perplexity) - Top convergence signal: n-gram cache (4+ models) - RFN thesis connection: low-rank residual correction (W ≈ Q + UV^T) All coordination docs updated for Delta 2 (LeakyReLU²) pivot. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…re-run Single change vs measured Session 03 anchor: MLP activation from relu^2 to leaky_relu(0.5)^2. enable_math_sdp restored to True to match anchor measured state (pre-563700f). All other code identical to anchor. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Delta 2 measured on 8xH100: sliding s64 val_bpb 1.12904123 (effectively identical to anchor 1.12904446). Slightly better quantization metrics and 168KB smaller artifact, but +0.72ms/step slower throughput cancels gains. Not a standalone graduating delta — keepable as a stack component. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…audit Session 04 micro-delta sweep closed at 1 failed (GPTQ-lite) + 1 neutral (LeakyReLU²). Session 05 opens as a three-part audit: throughput gap decomposition (FA3 portability), pre-TTT stack-gap analysis vs the local 1.1194 record, and TTT correctness/integration planning. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Comprehensive prompt covering throughput audit, pre-TTT stack-gap audit, and TTT correctness audit. Includes git, Pegasus, documentation, and memory update conventions for session continuity. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…ify FA3 scope Make tool/MCP section "prefer if available" instead of hard requirements. Reframe FA3 check as verify-presence not install-package. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Concrete implementation prompt for FW-1 (Flash Attention 3) as isolated delta on anchor. Includes pre-implementation FA3 verification check, exact code change sites, isolation constraints, and full convention set. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Phase 1 of revised Session 05: port anchor attention from SDPA to direct flash_attn_interface (FA3) on NGC 25.02 container. Microbenchmark shows 11.44x kernel speedup. AGENT_SYNC updated with competitive landscape analysis and 3-phase plan (FA3 → Full Hessian GPTQ → Novelty). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…pdate CLAUDE.md - 05b: Full Hessian GPTQ isolated delta with strict source priority, preserved artifact format, calibration budget (target ≤30s) - 05c: Training bundle (XSA-all + VE128 + SWA + warmdown3500) with bisect plan - CLAUDE.md: add entry point protocol, Pegasus operational rules, FA3 container path - AGENT_SYNC: updated with FA3 port results (neutral/slower), revised strategy - FA3 README: updated with smoke results and saved container path Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Isolated quantization-only delta: replaces naive int6 per-row rounding with Cholesky error-compensated GPTQ (block_size=128, actorder, percdamp=0.01). Training code is identical to anchor. Post-training calibration collects H=X^TX via 128 forward passes, then GPTQ quantizes column-by-column with error propagation weighted by H^{-1}. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

… handoff docs 1xH100 smoke revealed 0.212 BPB roundtrip gap (27x worse than anchor). GPTQ pipeline mechanics work (66 layers, 0 fallbacks, 4.2s) but quantized weights reconstruct catastrophically. Must debug before 8xH100. Updated: AGENT_SYNC, next-session, decisions, project-state. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…n path The export-only replay showed 66/66 layers worse than naive regardless of actorder or block_size, pointing to the upstream Hessian path as the root cause. This patch aligns Hessian collection with PR openai#1019/openai#634 semantics: - Divide accumulated H by num_batches (was raw sum — caused scale blowup) - Add 1% diagonal damping in _finalize_hessians before quantization - Run calibration forward pass under torch.autocast(bf16) to match training - Accumulate Hessians on CPU to avoid GPU memory pressure Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Previous replay_ref_hfix still showed 66/66 worse layers with the Hessian normalization fix. Rather than continuing to debug from symptoms, this transplants PR openai#1019's complete GPTQ slice verbatim: - collect_hessians: PR openai#1019 hook pattern with pre-init and param_name keys - quantize_int6_gptq: verbatim from PR openai#1019 lines 1171-1224 - gptq_mixed_quantize_int6: direct param_name key lookup, PR openai#1019 quantizer Source: pr-1019-gptq:records/track_10min_16mb/2026-03-25_ValCalib_GPTQ_XSA_BigramHash3072/train_gpt.py Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Set GPTQ_AR_CALIB=1 to generate 64 autoregressive sequences (temp=0.8) from the model itself instead of using training data for Hessian collection. This matches PR openai#1019's actual calibration strategy. Both paths available — training data (default) and AR self-gen (opt-in). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…sbatch - Brotli is now a hard import (not try/except). The lzma fallback caused cross-rank decompress failures when pip install raced across srun tasks. 2/12 grid jobs (seed 42, ttt_qk5) crashed with LZMAError on brotli blobs. - Removed dead lzma compress/decompress branches from the model export path. Code-wrapper self-compression (line 186) still uses lzma intentionally. - New sbatch files with fixes: - 07c1_ttt_s1337_fixed: TTT seed 1337, 35min wallclock, forced pip install - 07c1_ttt_s2025_fixed: TTT seed 2025, 35min wallclock, forced pip install - 07c1_base_s42_fixed: baseline seed 42 rerun, forced pip install All use unconditional `pip install` instead of conditional import guard. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Bring AGENTS.md, AGENT_SYNC.md, project-state.md, decisions.md, and next-session.md to the openai#1610-direct strategy. Add locked execution plan (PLAN_PR1610_CORRECTOR.md Rev 3). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Exact copy from PR openai#1610 at SHA ca19195. MD5: 57cfda2047b2c2a63ec10b99d704bfb0. 3379 lines, 139831 bytes. This is the unmodified source base; corrector will be added in later commits. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Setup, seed-0 (Gate A), seed-1/2 (Gate B) subcommands with published BPB verification targets and kill criteria. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Commit the current posterior-corrector working-tree state for PR openai#1610: - train_gpt.py corrector path plus warmup legality fix - LEGALITY_SPEC.md, DEPENDENCY_GATE.md, requirements.txt - test_corrector.py and bench_corrector_cpu.py - AGENT_SYNC.md closeout with audit measurements The warmup path previously touched val_data.val_tokens before the official eval timer. It now uses a device-local torch.Generator + torch.randint synthetic tokens. 9/9 tests pass and the CPU bench projects 26.1s. Co-Authored-By: Claude Opus 4.7 <[email protected]>

… A/B Zero-intervention 8xH100 pipeline: pod verify, SP8192 download, Gate A seed-0 baseline, corrector ablations, 3-way decision point, Gate B 3-seed corrector mean, fallback requant, artifact preservation. Fixes applied (Codex review): checkpoint persisted before log parse (Fix D), 3-way ablation decision fork with hold band (Fix G), fail-closed fallback parse (Fix H), removed malformed S3 backend (Fix J), Gate B rewired to coherent 3-seed corrector mean (Fix I — seed-0 re-eval added so all three seeds use same corrector config, mean vs published 1.07280628). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

009: add logit_bias warmup pass (dummy bf16 tensor) so Dynamo traces the Tensor branch before the 600s eval timer starts; gated on h.corrector_alpha > 0 002: pass BEST_ALPHA/BEST_ORDERS as argv to Gate B summary heredoc; corrector_alpha/corrector_orders now populated in gate_b_summary.json 003: update 02_gate_a.sh header comment to show actual ceiling 1.07516564 004: drop hash() wrapper in PrefixNgramCorrector — use ctx tuple directly as dict key; Python dicts handle collision disambiguation natively 001: rewrite test_single_pass to actually exercise chunk-boundary invariance: same corrector fed tokens[:10] then tokens[10:] must match a fresh single-pass corrector fed all 20 tokens All 9 tests pass. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…t-SHA pin, align README + run_all - 05_preserve_artifacts.sh: write commit_sha.txt, hardware_info.txt, env_fingerprint.txt before tarball; fix repo_type=model to match the amay01/parameter-golf-session3-artifacts repo type - 00_verify_pod.sh: add optional EXPECTED_SHA exact-pin check on top of existing ancestry-only guard - run_all.sh: parameterize banner SHA; warn when EXPECTED_SHA unset so operator knows the orchestrator is running ancestry-only - README.md: align Gate A kill threshold (1.078 → 1.07516564); update Block 1 operator commands to include git checkout + EXPECTED_SHA; separate ancestry anchor from session launch SHA in header Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…verlay Supersedes the 2026-04-18 post-Ultrareview pin (876bb36). Rev 5 adds provenance auto-capture, repo-type=model fix, exact-SHA env-var pin, and run_all.sh/README alignment; new pin reflects the pipeline-patch commit. Also records the live-guidance absolute-BPB overlay and 04b deprecation driven by open-PR competitive intel (openai#1700 / openai#1716 / openai#1707 / openai#1693). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

… mode

…ipefail

…er alive

…-start polling

…ative result + quantized-eval fix Non-record evidence package for PR openai#1610. Three separable contributions: 1. Faithful seed-0 reproduction of PR openai#1610 on independent infrastructure (8xH100 HBM3 SXM5, RunPod): our BPB 1.07218477 vs published seed-0 BPB 1.07216564 -> delta +1.913e-5. 2. Bounded negative result for a score-first n-gram posterior corrector layered on PR openai#1610's phased LoRA TTT eval path. All three tested (alpha, orders) configs degrade BPB monotonically with alpha. The corrector and TTT-LoRA are both deterministic functions of the scored prefix, so additively combining them over-counts the prefix evidence. Claim is bounded to the tested grid on this stack; does not generalize to all posterior correctors or non-TTT eval pipelines. 3. Fix for the quantized-eval-only branch of train_gpt.py (two guards at lines 3204 and 3259) that previously crashed on None-model dereference when EVAL_ONLY_QUANTIZED_PATH was set. Surfaced while running the ablations in contribution 2. Artifact: 15,999,394 bytes (606 bytes of competition-cap headroom). Single-seed scope, acknowledged. Compliance with Issue openai#1017 Section III walked line-by-line in README. Also updates three internal docs to reference the renamed HF artifact repo (amay01/parameter-golf-pr1610-reproduction-artifacts). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

- decisions.md: new locked decision explaining non-record framing, scope bounds, and post-submission discipline (no self-comments for 48h) - AGENT_SYNC.md: current objective now records PR4 submitted upstream - next-session.md: phase updated to post-submission state; Fallback 1A framed as secondary task unblocked from PR status - results_log.jsonl: appended four records (reproduction + three ablations) with pre/post quantization BPBs, eval times, and bounded negative-result outcomes Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

amrayach · 2026-04-19T13:15:08Z

Closing to reopen against upstream/main with a clean diff scoped to the submission folder only.

amrayach and others added 30 commits March 28, 2026 04:13

ops: add optimized Pegasus launcher, diagnostic, and fscratch setup s…

5841286

…cripts

fix: install sentencepiece+zstandard inside NGC container before trai…

535e693

…ning NGC PyTorch containers don't ship sentencepiece. Install via Pegasus PyPI cache (with fallback to public PyPI) at container launch time. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

docs: Codex Session 03 handoff refinements and Session 04 planning up…

0878f42

…dates Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

research(protocol): Session 04 Delta 1 — GPTQ-lite clip search, pre-run

82386dd

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

bench: FA3 vs SDPA flash microbenchmark

ef1d775

Fix GPTQ export loop and add diagnostics

58f5187

Add export-only GPTQ replay ablations

190d375

Add debug skip for sliding replay eval

4f72329

Record GPTQ replay ablation results

92db297

amrayach and others added 28 commits April 1, 2026 15:28

fix(06b): restore batched muon ns5

d04a556

feat(07a): add non-banked ksv2/ksv3 pivot

4a594e8

feat(07b): add turbomuon engramlite pivot

1bf926a

Add 07b1 fidelity-fix fork

e0c5a3a

Add faithful PR1212 07c repro scaffold

d919041

feat(07c1): wire TTT sliding eval and add Pegasus jobs

3fe30eb

chore(jobs): add second-wave 07c1 TTT queue

1c15cad

chore(jobs): add 07c1 width probes

cfae3e1

Add RunPod reproduction script for PR openai#1610 baseline (Gate A/B)

1d92043

Setup, seed-0 (Gate A), seed-1/2 (Gate B) subcommands with published BPB verification targets and kill criteria. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

docs(runpod-pipeline): remove stale s3 path and record launch state

775bb2f

docs(openai#1610): pin post-Ultrareview SHA 876bb36 as launch commit

e8556c6

fix(1610-corrector): skip pre-quant diagnostic in quantized_eval_only…

e99f18e

… mode

fix(stage1): use nullglob + array for shard counting under set -euo p…

c6f48c6

…ipefail

fix(runpod-pipeline): add preflight guards and better failure surfaces

3ac2653

fix(dockerfile): replace bash CMD with sleep infinity to keep contain…

106543d

…er alive

docs(runpod): record HBM3, dockerStartCmd array form, MFS quota, cold…

37c0388

…-start polling

docs(memory): record Session 3 negative result and Fallback 1A decision

71ed28e

docs(session4): note stashed reproduction-script WIP

67fe3a2

amrayach closed this Apr 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: #1610 reproduction (Δ=+1.9e-5 BPB), n-gram posterior corrector negative result, quantized-eval-only path fix#1740

Non-record: #1610 reproduction (Δ=+1.9e-5 BPB), n-gram posterior corrector negative result, quantized-eval-only path fix#1740
amrayach wants to merge 74 commits intoopenai:mainfrom
amrayach:submission/pr1610-corrector

amrayach commented Apr 19, 2026

Uh oh!

amrayach commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amrayach commented Apr 19, 2026

Prior context

Contributions

Reproduction result

Corrector ablation

Eval-only bug fix

Compliance with Issue #1017 Section III

Scope

Artifacts

Uh oh!

amrayach commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant