Non-record: #1610 reproduction (Δ=+1.9e-5 BPB), n-gram posterior corrector negative result, quantized-eval-only path fix#1741
Open
amrayach wants to merge 2 commits intoopenai:mainfrom
Conversation
…ative result + quantized-eval-only path fix
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This package is intentionally narrow: it does not remix multiple frontier submissions into a new record claim. Instead, it reproduces one current frontier line to near-exact fidelity, tests one new adaptive corrector path against that reproduced baseline, and reports both the measured negative result and the eval-only fix required to obtain it.
Prior context
Previous submissions in this line: #1101 (pre-TTT anchor, 1.1290 BPB), #1307 (07c1 strict base proof vs merged #1019), #1598 (SP8192-D 5-seed evidence package).
Contributions
1765afc(pins Record: VarLenAttn + PhasingTTT - val_bpb 1.0728 (3-seed mean) #1610 at upstreamca19195).(alpha, orders)configs degrade BPB, monotonically inalpha. Multi-order backoff provides no measurable benefit over single-order at the same blend weight.train_gpt.py's quantized-eval-only branch (two guards at lines 3204 and 3259). Without these,EVAL_ONLY_QUANTIZED_PATHcrashes onNone-model dereference. Surfaced while running the ablations in Contribution 2.The reproduction is a credibility prerequisite for the negative-result claim, not a contribution in itself. The corrector formulation and its Section-III-compliance engineering are the only novel content. The bug fix is incidental.
Reproduction result
Training stopped at step 4,879 of 20,000 due to
MAX_WALLCLOCK_SECONDS=600 - GPTQ_RESERVE_SECONDS=13(by design in #1610). The training log'sGATE_A: FAILline is our internal pipeline's 15,997,520-byte safety threshold (intended to absorb code-size drift); the artifact passes the competition rule.Corrector ablation
All three run in eval-only mode against the reproduced seed-0 checkpoint — no retraining.
The effect at α=0.1 is ~1/8 of the effect at α=0.3 — first-order linear in α, no inflection toward improvement. Structurally, TTT-LoRA and the n-gram corrector are both deterministic functions of the scored prefix
x_{1..t-1}; addingalpha * log(q_prefix_ngram(v))on top of logits that already encodeP(x_t | x_{1..t-1})under TTT adaptation over-counts the prefix evidence. This predicts the monotonic-in-α result and predicts a non-TTT eval pipeline might behave differently. The latter was not tested.This PR rules out one tested posterior-corrector path on a reproduced #1610-class phased-TTT stack; it does not claim that all n-gram or posterior correctors are ineffective.
Eval-only bug fix
In
EVAL_ONLY_QUANTIZED_PATHmode,base_model,compiled_model, andcompiled_forward_logitsare allNone(line 3188), but two downstream paths dereferenced them:timed_eval("diagnostic pre-quantization post-ema", ...)dereferencedcompiled_model.forward_logits→AttributeError.del eval_model, compiled_modelcleanup referencedeval_modelwhich was never bound in this mode →UnboundLocalError.Fix:
if not quantized_eval_only:guard on the diagnostic (line 3204), and extend the existing cleanup guard to cover this branch (line 3259). The post-quantization diagnostic still runs because it callsdeserialize(h, device)directly and does not touch theNonelocals.Compliance with Issue #1017 Section III
Walked line-by-line in the folder README under "Compliance with Issue #1017 Section III". Summary:
PrefixNgramCorrectorstate (lines 15-58) populated only viaupdate(x_t), which runs after scoring.logits + alpha * log(q_t)over full V=8192 (line 1122). Laplace init (line 23) guaranteesq_t(v) > 0for all v. Full[V]tensor add, not gathered single-index.update(_tok)(line 2591). Explicit inline comment at line 2583:# Corrector: update state with scored tokens (score-before-update).Warmup uses synthetic tokens only, via a device-local RNG generator (lines 3324-3365). Timer starts at
torch.cuda.synchronize(); t_ttt = time.perf_counter()(lines 3370-3371) after warmup closes.The chunk-static bias approximation is a deliberate engineering choice (per-position bias would cost 32× more GPU forwards or a ~2 GB
[B, S, V]dense tensor per batch per rank, both breaking the time/memory budget). It satisfies score-before-update at chunk granularity rather than per-position — the bias inside chunkcuses only tokens from chunks[0, c). Explicit in the corrector's docstring.Scope
Single-seed (seed 0). Reproduction is compared against #1610's published seed-0 number (1.07216564), not their 3-seed mean. Multi-seed validation was descoped: given a +1.9×10⁻⁵ BPB delta against the matched seed and monotonic +0.002 to +0.017 degradation across the corrector grid, additional seeds would refine variance but are unlikely to flip either direction. The negative-result claim is bounded to seed 0 of the reproduced checkpoint.
Out of scope in this package: α < 0.1, orders > 12, logistic-domain blends, non-TTT eval pipelines.
Artifacts
Self-contained in
records/track_non_record_16mb/2026-04-19_pr1610_reproduction_corrector_negative/:train_gpt.py,submission.json,requirements.txt, rawtrain_seed0.log+ threeablation_1[abc].log, machine-readablereproduction_summary.jsonandablation_summary.json, plusprovenance/(commit SHA, env fingerprint, nvidia-smi). Training logs are raw; the training script writes compact metrics-only output by design.Supplementary external archive: https://huggingface.co/amay01/parameter-golf-pr1610-reproduction-artifacts (141 MB tarball, MD5
caf8adf63d8c80965f6671beba95d7aa). Contains preserved checkpoints (final_model.int6.ptz,final_model.pt) and full intermediate artifacts. Not required to reproduce the headline number.