Record: SP8192 + SLOT scored-position + cross-batch EMA warmup: val_bpb=0.94569#1929
Open
davie2009kh wants to merge 2 commits intoopenai:mainfrom
Open
Record: SP8192 + SLOT scored-position + cross-batch EMA warmup: val_bpb=0.94569#1929davie2009kh wants to merge 2 commits intoopenai:mainfrom
davie2009kh wants to merge 2 commits intoopenai:mainfrom
Conversation
…on (SDPA-friendly) — val_bpb 1.07037 (3-seed mean)
leon2k2k2k
added a commit
to leon2k2k2k/parameter-golf
that referenced
this pull request
Apr 29, 2026
- spec 060N: compound AWQ-lite (PR openai#1908) + 4 TTT phases + 3000 prefix + 2 global-SGD epochs, eval-only on 060A's final_model.pt. Single-shot compound to use openai#1918's ~205s eval-time slack; safe fallback drops GLOBAL_TTT_EPOCHS if wallclock blows. - new idea 1925-matrix-lr-ttt-prefix-tune (PR openai#1925, hyperparam-only on openai#1855: MATRIX_LR=0.028 + PHASED_TTT_PREFIX_DOCS=3500 → 1.06109). - new idea 1915-per-doc-lora-ttt (PR openai#1915, per-doc-only LoRA TTT discipline; parked as fallback if global-SGD class is ruled out). - frontier scan: 21 new PRs (openai#1906-openai#1931). Headline: PRs openai#1908+openai#1918 independently confirm AWQ-lite mixed-bit GPTQ pattern at ~1.0608 on openai#1855 base; openai#1925 hyperparam-only at 1.06109; openai#1923 Asymmetric Logit Rescale = empirical negative; openai#1929 banned SLOT+prequant-TTT. - frontier-state.json: 21 PRs added; total 200. - diary/2026-04-29-frontier-scan.md: full scan report. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alertcat
added a commit
to alertcat/parameter-golf
that referenced
this pull request
Apr 29, 2026
…ams) After 4 parallel research agents reviewed 30+ open PRs and compliance issues, two new findings: 1. PR openai#1923 (AsymLogit) flagged "empirical negative" by sunnypatneedi 4-29 frontier-scan, BUT only on PR openai#1855 base with default WD=1.0. Never tested on PR openai#1908 + WD=2.0 combo. V19's specific stack is NOT directly invalidated. 2. PR openai#1925 simon-marcus 1.06049 (3-seed verified, vs PR openai#1855 base 1.06108 = -0.00059 BPB). Just 2 hparam env vars: MATRIX_LR 0.026 -> 0.028 PHASED_TTT_PREFIX_DOCS 2500 -> 3500 Orthogonal axis to AsymLogit (LR/TTT prefix vs logit head). Adds two new scout scripts: - run_v19c_stacked_scout.sh: PR openai#1908 + AsymLogit + simon-marcus + WD=2.0 (full stack, recommended first scout) - run_v19b_simonmarcus_scout.sh: PR openai#1908 + simon-marcus + WD=2.0 (ablation if V19c wins partially) Decision rule (CaseOps val baseline 0.97651, community floor 0.0006): V19c < 0.97591 -> CLEAR WIN, run 3-seed V19c 0.97591-0.9755 -> borderline, ablate via V19a/V19b V19c > 0.9755 -> abandon stack, try Lead B (PR openai#1884) Other research findings: - PR openai#1898 SpinQuant flagged regression vs parent openai#1851 (skip) - PR openai#1929 SLOT banned per openai#1722 precedent - PR openai#1911 pre-quant TTT chain banned per openai#1735 precedent - cocohearts 4-28 PR openai#1902 confirmed PR openai#1855 as official openai#1 - regina-openai + Alex Zhao 48h zero activity - CaseOps de-facto legal (PR openai#1855 merged into chain)
|
This appears to be a score-before-update violation. As described, pre_quant_adamw_ttt performs AdamW adaptation over the full validation stream before scoring the same stream. Therefore the model used to score token x_t has already been updated using x_t. Under the rules, the predictive distribution for x_t must be fixed before observing or updating on x_t; only after the score is recorded may x_t be used to update state for future tokens. If the 28-epoch validation adaptation happens before scoring, this is score-after-adapt TTT, not legal score-first adaptation. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Pre-Quant TTT + Scored-Position SLOT
val_bpb: 0.94569 (3-seed mean: seeds 1337, 42, 2025)
What this submission does
This builds on the kilojoules/alertcat PR #1738 stack and adds Scored-Position SLOT at evaluation time:
[bsz, 1, d_model]and logit_bias[bsz, 1, vocab]in fp32eval_val_slidingon the quantized model; training is unmodifiedTraining wall-clock: ~588s (within 600s budget).
Eval wall-clock: ~1405s (SLOT optimization over full val set).
Artifact size: ~15.87MB (within 16MB budget).
Base stack
SLOT lineage
Scored-Position SLOT is inspired by @resouer PR #1229. Key differences: per-sample (not shared) delta, cross-batch EMA prior warmup, and restriction to scored positions only via a boolean mask.