Record: SP8192 #1855 Base + Asymmetric Logit Rescale + AWQ-lite — val_bpb 1.05971 (3-seed mean, full val) by jorge-asenjo · Pull Request #1923 · openai/parameter-golf

jorge-asenjo · 2026-04-29T10:26:34Z

Summary

Verbatim PR #1855 stack with one orthogonal eval-path addition: asymmetric logit rescale (modded-nanogpt @classiclarryd PR #181). Two learnable scalars (softcap_pos, softcap_neg) replace the single logit_softcap in forward_logits and forward_ttt, trained inside Phased TTT via the global SGD loop. Init at logit_softcap=30.0 so eval is identity at start; train-time fused-CE kernel left untouched.

3-seed Results

Seed	val_bpb	artifact
42	1.06533	15,902,200 B
0	1.06643	15,899,693 B
1234	1.06554	15,907,523 B
mean	1.06577	15,903,139 B avg

Reproduced on 8× H100 80GB SXM, torch 2.9.1+cu128, FA3, fused softcapped CE, lrzip pergroup compression.

What changed vs #1855

In class GPT:

New init flag asym_logit_enabled gated by ASYM_LOGIT_RESCALE env var
Two nn.Parameter scalars softcap_pos, softcap_neg (init 30.0, fp32)
Helper _apply_asym_softcap(logits) doing where(logits>0, sp*tanh(logits/sp), sn*tanh(logits/sn))
forward_logits and forward_ttt call the helper when flag is on; training path unchanged

The two new scalars land in the passthrough float16 list at serialize time (8 bytes total, lossless under per-group lrzip).

Test plan

Reproduce SEED=42 → expect within ±0.001 of 1.06533
Verify training path numerics unchanged (fused CE kernel gets pre-softcap logits unchanged)
Compliance: score-first TTT preserved, full normalized distribution over Σ, no rescoring, single L→R pass over validation

Attribution

@codemath3000 — PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855 base stack
@dexhunter — PR Record: PR #1787 base + Smear Gate + LQER Asym — val_bpb 1.06157 #1797 (LQER asym + SmearGate base) and PR Record: SP8192 + CaseOps + GatedAttn + QuantGate + Loop45 + PhasedTTT — val_bpb 1.06549 #1736 (CaseOps + GatedAttn)
@nprime06 — PR Record: PR #1736 + Polar Express NS + MIN_LR + Sparse Attn Gate + Fused CE + PR #1767 TTT — val_bpb 1.06335 #1787 (Polar Express NS + MIN_LR + Sparse Attn Gate + Fused CE)
@MarioPaerle — PR RECORD: SmearGate + Attention Output Gate + Legal TTT | val_bpb=1.07139 #1667 (SmearGate origin)
@renqianluo — PR Record: Alpha=144 LoRA + Warm-start A + WD 1.0 — val_bpb 1.07209 (3-seed mean) #1767 (LoRA TTT improvements)
@classiclarryd — modded-nanogpt PR 1.1190 BPB — 11L LeakyReLU² XSA4 PartialRoPE LNScale EMA ParallelMuon TTT #181 (Asymmetric Logit Rescale, the orthogonal addition over Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855)
Phased Multi-Phase Global SGD TTT retained from PR Add SP8192 Multi-Phase Global SGD + Phased TTT (1.07219 bpb) #1700.

…1.06577 (3-seed mean)

h1beee · 2026-04-29T10:49:45Z

PR #1855 reports 1.061 bpb, but yours is higher at 1.065 bpb?

- spec 060N: compound AWQ-lite (PR openai#1908) + 4 TTT phases + 3000 prefix + 2 global-SGD epochs, eval-only on 060A's final_model.pt. Single-shot compound to use openai#1918's ~205s eval-time slack; safe fallback drops GLOBAL_TTT_EPOCHS if wallclock blows. - new idea 1925-matrix-lr-ttt-prefix-tune (PR openai#1925, hyperparam-only on openai#1855: MATRIX_LR=0.028 + PHASED_TTT_PREFIX_DOCS=3500 → 1.06109). - new idea 1915-per-doc-lora-ttt (PR openai#1915, per-doc-only LoRA TTT discipline; parked as fallback if global-SGD class is ruled out). - frontier scan: 21 new PRs (openai#1906-openai#1931). Headline: PRs openai#1908+openai#1918 independently confirm AWQ-lite mixed-bit GPTQ pattern at ~1.0608 on openai#1855 base; openai#1925 hyperparam-only at 1.06109; openai#1923 Asymmetric Logit Rescale = empirical negative; openai#1929 banned SLOT+prequant-TTT. - frontier-state.json: 21 PRs added; total 200. - diary/2026-04-29-frontier-scan.md: full scan report. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…ams) After 4 parallel research agents reviewed 30+ open PRs and compliance issues, two new findings: 1. PR openai#1923 (AsymLogit) flagged "empirical negative" by sunnypatneedi 4-29 frontier-scan, BUT only on PR openai#1855 base with default WD=1.0. Never tested on PR openai#1908 + WD=2.0 combo. V19's specific stack is NOT directly invalidated. 2. PR openai#1925 simon-marcus 1.06049 (3-seed verified, vs PR openai#1855 base 1.06108 = -0.00059 BPB). Just 2 hparam env vars: MATRIX_LR 0.026 -> 0.028 PHASED_TTT_PREFIX_DOCS 2500 -> 3500 Orthogonal axis to AsymLogit (LR/TTT prefix vs logit head). Adds two new scout scripts: - run_v19c_stacked_scout.sh: PR openai#1908 + AsymLogit + simon-marcus + WD=2.0 (full stack, recommended first scout) - run_v19b_simonmarcus_scout.sh: PR openai#1908 + simon-marcus + WD=2.0 (ablation if V19c wins partially) Decision rule (CaseOps val baseline 0.97651, community floor 0.0006): V19c < 0.97591 -> CLEAR WIN, run 3-seed V19c 0.97591-0.9755 -> borderline, ablate via V19a/V19b V19c > 0.9755 -> abandon stack, try Lead B (PR openai#1884) Other research findings: - PR openai#1898 SpinQuant flagged regression vs parent openai#1851 (skip) - PR openai#1929 SLOT banned per openai#1722 precedent - PR openai#1911 pre-quant TTT chain banned per openai#1735 precedent - cocohearts 4-28 PR openai#1902 confirmed PR openai#1855 as official openai#1 - regina-openai + Alex Zhao 48h zero activity - CaseOps de-facto legal (PR openai#1855 merged into chain)

jorge-asenjo · 2026-04-29T15:55:03Z

The +0.0047 gap is an environment offset, not a regression. I reproduced the #1855 stack verbatim (same code, same env vars, same seeds) on my pod and got 1.06754 (seed 42) — i.e. the verbatim #1855 baseline measures ~+0.006 higher under my setup than its reported 1.06108. AsymLogit on top of that same baseline lands at 1.06577 (3-seed mean), so the delta on this PR is −0.00177.

What's submitted here is the AsymLogit delta on top of the #1855 stack, not a claim against the #1855 absolute number. A third party reproducing #1855 verbatim and then toggling ASYM_LOGIT_RESCALE=1 should see the same sign.

codemath3000 · 2026-04-29T17:23:08Z

Thanks for putting up the repro detail @jorge-asenjo. I took a look at the seed-42 logs side-by-side, and the gap looks like it's a different validation set rather than an environment offset:

metric	this PR seed-42	#1855 seed-42
`val_tokens`	9,662,464	47,851,520
`train_shards`	1502	80
pre-quant post-EMA val_bpb (no TTT, no AsymLogit)	1.07019	1.06396
post-quant val_bpb (no TTT)	1.07896	1.07254
final TTT-phased val_bpb	1.06533	1.05989

All hyperparameters in the printed Hyperparameters: block are byte-identical between the two runs. The +0.0062 gap is already present at the pre-quantization, post-EMA diagnostic — i.e. before TTT, GPTQ, pergroup serialization, or any AsymLogit code runs. That's a deterministic forward pass, so the only thing it can be measuring is the val corpus itself.

Your val_tokens is 9,662,464; the standard FineWeb10B SP8192 caseops val set used by #1855 (and the prior records in this stack) has 47,851,520 tokens — about 5× larger. With val_doc_fraction: 1.0 in your log, that points to a smaller val partition on disk rather than a sampling difference.

The intra-environment delta you're showing (verbatim #1855 = 1.06754, this PR = 1.06577 → −0.00177) is still informative as an A/B within your setup, but the absolute val_bpb won't be directly comparable to leaderboard numbers measured on the full 47.85 M-token val set. If you can rerun on the standard val partition (the same one used for #1855's reported 1.06108), the AsymLogit delta should carry over and be easier to evaluate against the chain.

@alertcat

…1908 frontier V21 = PR openai#1855 base (cocohearts-merged openai#1) + PR openai#1908 AWQ-lite quantization + PR openai#1923 Asymmetric Logit Rescale. 3-seed results: seed 42: val_bpb 1.058336 (FSS=4920, wallclock 602.048s borderline*) seed 0: val_bpb 1.059394 (no FSS, wallclock 596.057s strict <600s) seed 1234: val_bpb 1.060243 (no FSS, wallclock 596.045s strict <600s) MEAN: 1.059324 STD: 0.000780 * seed 42 borderline matches PR openai#1908 seed 42 (601.153s, accepted by cocohearts) Seeds 0 + 1234 use GPTQ_RESERVE_SECONDS=4.0 to ensure strict <600s wallclock. Comparisons: vs PR openai#1908 frontier (1.06081): -0.00149 BPB ✅ WIN vs PR openai#1855 official openai#1 (1.06108): -0.00176 BPB ✅ vs win threshold (1.06021): -0.00089 BPB ✅ passes community floor vs MERGED SOTA bigbag (1.0810): -0.02168 BPB 🏆 vs record threshold (1.0738): -0.01448 BPB (breaks record by 2.0x margin) Welch one-sided t-test V21 vs PR openai#1908 (n=3 each, std 0.00078 vs 0.00089): t ≈ 2.18, p ≈ 0.045 — well below cocohearts-applied p<0.25 chain threshold Stack: - PR openai#1855 (codemath3000): 11L XSA + LQER + SparseAttnGate + BOS-fixed SmearGate + Polar-Express NS + Phased TTT 3-phase + lrzip pergroup - PR openai#1908 (romeerp): AWQ-lite mixed-precision GPTQ (1 group of 64 cols int8) - PR openai#1923 (jorge-asenjo): Asymmetric Logit Rescale (V21 INNOVATION on this stack) Code changes vs PR openai#1908: 5 surgical edits to train_gpt.py (+26 lines, eval-only). Train numerics bit-identical to PR openai#1908. Asymmetric softcap adds 8 bytes (2 fp16 passthrough scalars) to artifact. Compliance Issue openai#1017 Track A all 4 conditions verified: - Causality (VarLen + per-doc cu_seqlens) - Normalized softmax (full SP8192 vocab) - Score-before-update (Phased TTT 3-phase, gd:0 then gd:1) - Single pass (each val token scored exactly once) No SLOT, no pre-quant TTT, no n-gram cache, no ETLB. V21's empirical falsification of sunnypatneedi 2026-04-29 frontier-scan flag: PR openai#1923 standalone is -0.00469 BPB negative on PR openai#1855 base (1.06577 vs 1.06108) but +0.00128 BPB POSITIVE consistently across 3 seeds when stacked on PR openai#1908 quantization. Mechanism: per-doc LoRA in 3-phase TTT learns asymmetric logit distributions that the symmetric softcap cannot capture. Files included: - V21_README.md: full strategy + results + reproduction - submission.json: structured 3-seed metadata + comparison + attribution - train_seed42.log + train_seed0.log + train_seed1234.log: full per-seed logs - train_gpt.py: PR openai#1908 base + 5 V21 edits (already in branch) Hardware: 8xH100 80GB SXM (RunPod, AP-IN-1) Pytorch: 2.9.1+cu128 System dep: lrzip (apt-get install lrzip) Authors: V21 integration: @alertcat PR openai#1908 base: @romeerp PR openai#1855 stack: @codemath3000 PR openai#1923 axis: @jorge-asenjo

@codemath3000

…— val_bpb 1.05971 (3-seed mean, full val) - Re-measured on full validation partition (47,851,520 tokens). Original v1 (1.06577) was on a truncated val partition (9,662,464 tokens) due to a corrupted shard in our local network volume. @codemath3000 caught this on review by comparing val_tokens lines. - Added AWQ-lite mixed-precision GPTQ (top-1 64-col group at int8) on top of the AsymLogit lever. Identical recipe to @romeerp's PR openai#1908; we converged on it independently while iterating on the AsymLogit branch and confirmed it stacks orthogonally on the full val partition. - 3-seed mean val_bpb 1.05971 (population std 0.000478). Train ≤599.6s, eval ≤532s, max artifact 15,985,176 bytes — all 3 seeds strict-compliant. Renamed records folder to reflect the corrected stack and BPB. Attribution updates: @romeerp (PR openai#1908 AWQ-lite) added to the credit list. @codemath3000 (PR openai#1855 base + val truncation catch).

jorge-asenjo · 2026-04-30T10:23:52Z

@codemath3000 thanks again for catching the val truncation issue. I've now corrected and re-measured.

Root cause: the fineweb_val_000000.bin shard in our local network volume (RunPod NV) was truncated to 19 MB / 9,662,464 tokens, vs the canonical 95.7 MB / 47,851,520 tokens on Hugging Face (romeerp/parameter-golf-caseops-v1). Re-pulling directly from HF restored the full partition.

Updated submission: I've pushed a new commit (0fc7f2d) renaming the records folder and updating submission.json, README.md, and the 3 train logs:

Seed	val_bpb (full val)	val_tokens	train	eval	artifact
42	1.06030	47,851,520	599.6s	532s	15,944,044 B
0	1.05970	47,851,520	599.6s	419s	15,941,061 B
1234	1.05912	47,851,520	599.5s	457s	15,951,087 B
mean	1.05971	—	—	—	—
std	0.000478

vs PR #1855 (1.06108): −0.00137 BPB, Welch t≈4.96 (p < 0.05).

Stack also expanded: while re-running on full val I added AWQ-lite mixed-precision GPTQ (top-1 64-col group at int8) on top of AsymLogit. This converges on the same recipe @romeerp shipped in PR #1908; we landed on it independently and it stacks orthogonally with AsymLogit on full val. Credit added in the README.

submission.json carries both v1 (1.06577 truncated) and v2 (1.05971 full) revisions for transparency.

Apologies for the v1 noise. Appreciate the careful audit.

# Record: PR openai#1953 stack — no_qv TTT + AWQ-lite + AsymLogit + long-context eval **val_bpb = 1.05847** (3-seed mean, std 0.00063) | **max artifact 15,985,934 bytes** | 8x H100 SXM | strict 600s train + eval ## Results | Seed | Stop step | Train time | Pre-quant BPB | Quantized BPB | **Post-TTT BPB** | Eval time | Artifact bytes | |------|-----------|------------|---------------|---------------|------------------|-----------|----------------| | 42 | 4892 | 595.97s ✅ | 1.06126 | 1.06962 | **1.05788** | 493.2s ✅ | 15,979,342 | | 0 | 4884 | 595.97s ✅ | 1.06181 | 1.07019 | **1.05840** | 420.5s ✅ | 15,979,187 | | 1234 | 4894 | 596.14s ✅ | 1.06232 | 1.07093 | **1.05914** | 428.4s ✅ | 15,985,934 | | **Mean** | **4890** | **596.03s** | **1.06180** | **1.07025** | **1.05847** | **447.4s** | **15,981,488** | vs merged PR openai#1855 (1.06108): **-0.00261 BPB / -0.00571 nats** ## Stack Inherits the full PR openai#1855 base (codemath3000) and layers: 1. **AWQ-lite mixed-precision GPTQ** (PR openai#1908, romeerp) — activation-aware salient-group int8 promotion 2. **Asymmetric Logit Rescale** (PR openai#1923, jorge-asenjo) — learnable pos/neg softcap during TTT eval 3. **no_qv TTT mask** (PR openai#1953, himanshudongre) — disable Q/V LoRA in TTT, keep K/MLP/O 4. **TTT_LOCAL_LR_MULT=0.75** — scaled TTT optimizer LR 5. **QK_GAIN_INIT=5.25** — per-head Q-gain initialization 6. **EVAL_SEQ_LEN=2560** — extended eval context 7. **PHASED_TTT_PREFIX_DOCS=3000** — larger global-TTT prefix 8. **TTT_LORA_RANK=56** — reduced LoRA rank (compute reallocation) ## Compliance - [x] Artifact under 16,000,000 bytes (max 15,985,934) - [x] Train wallclock under 600s (max 596.14s) - [x] Eval wallclock under 600s (max 493.2s) - [x] No PPM, no SLOT, no pre-quant TTT, no n-gram cache - [x] Single left-to-right pass, score-before-update - [x] Full normalized softmax distribution ## Reproduction ```bash apt-get install -y lrzip pip install sentencepiece brotli huggingface_hub numpy python-minifier pip install --no-deps flash_attn_3 --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291/ # Dataset HF_HUB_ENABLE_HF_TRANSFER=1 python3 -c " from huggingface_hub import snapshot_download snapshot_download(repo_id='romeerp/parameter-golf-caseops-v1', repo_type='dataset', local_dir='/workspace/caseops_data') " # Run for SEED in 42 0 1234; do SEED=$SEED \ DATA_PATH=/workspace/caseops_data/datasets/datasets/fineweb10B_sp8192_lossless_caps_caseops_v1_reserved \ TOKENIZER_PATH=/workspace/caseops_data/datasets/tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model \ CASEOPS_ENABLED=1 VOCAB_SIZE=8192 MAX_WALLCLOCK_SECONDS=600 VAL_LOSS_EVERY=0 \ SMEAR_GATE_ENABLED=1 GATE_WINDOW=12 SPARSE_ATTN_GATE_ENABLED=1 SPARSE_ATTN_GATE_SCALE=0.5 \ GATED_ATTN_QUANT_GATE=1 FUSED_CE_ENABLED=1 QK_GAIN_INIT=5.25 \ EMBED_BITS=7 MATRIX_CLIP_SIGMAS=12.85 ATTN_CLIP_SIGMAS=13.0 MLP_CLIP_SIGMAS=11.5 EMBED_CLIP_SIGMAS=14.0 \ GPTQ_RESERVE_SECONDS=4.0 GPTQ_CALIBRATION_BATCHES=16 COMPRESSOR=pergroup \ LQER_ENABLED=1 LQER_ASYM_ENABLED=1 LQER_RANK=4 LQER_FACTOR_BITS=4 LQER_ASYM_GROUP=64 LQER_TOP_K=3 \ AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64 \ ASYM_LOGIT_RESCALE=1 \ TTT_ENABLED=1 PHASED_TTT_ENABLED=1 PHASED_TTT_NUM_PHASES=3 PHASED_TTT_PREFIX_DOCS=3000 \ TTT_LORA_RANK=56 TTT_MASK=no_qv TTT_Q_LORA=0 TTT_V_LORA=0 TTT_LOCAL_LR_MULT=0.75 \ TTT_CHUNK_SIZE=48 TTT_BETA2=0.99 TTT_WEIGHT_DECAY=0.5 \ EVAL_SEQ_LEN=2560 TTT_EVAL_SEQ_LEN=2560 \ WARMDOWN_FRAC=0.85 BETA2=0.99 GRAD_CLIP_NORM=0.3 MIN_LR=0.1 MATRIX_LR=0.026 \ NCCL_NET=Socket GLOBAL_TTT_MOMENTUM=0.9 \ torchrun --standalone --nproc_per_node=8 train_gpt.py > train_seed${SEED}.log 2>&1 done ```

…ean 1.05831 BPB Clears record bar (1.05914) by 0.83 milli-BPB. Welch t = -6.49 vs PR openai#1855 (1.06108), p < 0.0001. All 3 seeds produce 15.99 MB artifacts under the 16 MB cap, all under the 600s wallclock budget. Per-seed: - 42: ttt=1.05793 art=15,986,149 eval=572.6s - 314: ttt=1.05852 art=15,987,257 eval=553.7s - 1234: ttt=1.05849 art=15,989,895 eval=574.1s Submission directory at records/track_10min_16mb/2026-04-30_PR2014_Reproduction_1.0583/ contains PR openai#2014's verbatim train_gpt.py + tokenizer + our seed_results.csv + a detailed README documenting the lineage (openai#1797 -> openai#1851 -> openai#1855 -> openai#1908 -> openai#1923 -> openai#1953 -> openai#2014), the new levers vs each parent, and the full 4-condition C1-C4 legality check. submission.json author/github_id are placeholders pending the user's choice of submitting account. Reproduction script: runpod/phase_x_pr2014.sh — runs end-to-end on a single 8xH100 SXM pod (~2.5h wall, ~$66 cost). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

AsymLogit Rescale (PR openai#1923) ported as 2 TTT-adaptable scalar params (softcap_pos, softcap_neg). Pre-quant 1.06160 (slightly worse than S55's 1.06058 — AsymLogit hurts un-adapted model). TTT recovery -0.01267 (much better than S55's -0.01103) — AsymLogit gives massive adaptive capacity. Final 1.05759 = -0.00055 vs S55. Single-seed matches PR openai#2014's 3-seed mean. Eval 521.7s (under 600s cap), Size 15,946,610. softcap_pos and softcap_neg init to logit_softcap=30.0, adapted per-doc via TTT-LoRA optimizer.

@dexhunter

Audits every CaseOps-lineage record-track PR (merged + unmerged) since 2026-04-18 for whether val docs are also in the training set. Working set: 34 PRs (31 from chronological seed list + 3 discovered ancestors: openai#1908, openai#1923, openai#2007). Boundary nodes openai#1493 / openai#1626 (pre-CaseOps). Verdicts: - CLEAN (8): openai#1729, openai#1851, openai#1868, openai#1908, openai#2019, openai#2027, openai#2031, openai#2068 - LEAK (25): openai#1736 (our research baseline) → openai#1769 → openai#1787 → openai#1797 → openai#1855 → V21 family (openai#1945, openai#1923, openai#1953, openai#1967) → openai#2018 → openai#2118 (current claimed frontier 1.04350), plus siblings. - INHERIT (1): openai#2050 (eval-only on frozen openai#1915) Code-level evidence (not README claims): - Every shipped prepare_caseops_data.py is byte-identical: SHARD_TOKENS=10_000_000, default=10_000 for --val-docs - NO PR overrides --val-docs (searched all .sh files in all 34 PRs) - cached_challenge_fineweb.py downloads from romeerp/parameter-golf-caseops-v1 HF dataset whose manifest pins docs_val=50000, docs_train=8181945, sums match → CLEAN by construction - PR openai#2018's DATASET_AUDIT.md is gold-standard explicit leak description - PR openai#2118's submission.json admits "--val-docs=10000 train shards + 50k val eval" Three signposts: - Leak introduced: PR openai#1736 by @dexhunter (Apr 19) — first prepare_caseops_data.py default invocation - Leak fixed: PR openai#1851 by @aquariouseworkman (Apr 27) — switched to HF dataset - Leak re-introduced: PR openai#1855 by @codemath3000 (same day) — rebuilt locally The merged-leaderboard SOTA (openai#1851/openai#1868 at 1.06128/1.06141) is CLEAN. The unmerged frontier (openai#2118 at 1.04350) is LEAK. The 0.018 bpb gap is inflated by val memorization; spec 301 was designed to measure how much remains under clean data. Files: caseops-memory-leakage/README.md — overview, methodology, takeaways caseops-memory-leakage/verdicts.md — 34-row master table with evidence caseops-memory-leakage/family-tree.md — ASCII trees with [C]/[L] annotations

Record: SP8192 openai#1855 Base + Asymmetric Logit Rescale — val_bpb …

fc67377

…1.06577 (3-seed mean)

alertcat mentioned this pull request Apr 29, 2026

Record: V22 = V21 + PR #1953 levers + EVAL_SEQ_LEN=2816 -- val_bpb 1.05877 (3-seed mean, all strict <600s) #1945

Open

12 tasks

andrewbaggio1 mentioned this pull request Apr 30, 2026

Record: PR #1945 base + 2560 long-context + no_qv TTT mask + TTT LR 0.75 + QK_GAIN 5.25 — val_bpb 1.05855 (3-seed mean) #1953

Open

10 tasks

ndokutovich mentioned this pull request Apr 30, 2026

Record: V21 + N-gram Tilt + LeakyReLU 0.3 — val_bpb 1.05851 (3-seed mean) #1967

Open

jorge-asenjo changed the title ~~Record: SP8192 #1855 Base + Asymmetric Logit Rescale — val_bpb 1.06577 (3-seed mean)~~ Record: SP8192 #1855 Base + Asymmetric Logit Rescale + AWQ-lite — val_bpb 1.05971 (3-seed mean, full val) Apr 30, 2026

EthanYangTW mentioned this pull request Apr 30, 2026

Non-record: GolfParty — composable scaffolding for every Requests-for-PRs item #1978

Open

6 tasks

simonbissonnette mentioned this pull request Apr 30, 2026

Record: PR1855/PR1953 base + Progressive context growth (val_bpb: 1.05759, 3-seed) #2014

Open

aquariouseworkman mentioned this pull request Apr 30, 2026

Record: 1.05847 no_qv TTT + AWQ-lite + AsymLogit + long-context eval (3-seed) #2019

Open

6 tasks

jorge-asenjo mentioned this pull request Apr 30, 2026

Record: SP8192 V21 + Inside-timer N-gram TTT (no Gated XSA) — val_bpb 1.05692 (3-seed mean) #2041

Open

8 tasks

This was referenced May 1, 2026

Record: SP8192 CaseOps v13 PPM tuned gate — fresh 3-seed mean 0.94175270 #2083

Open

Record: BIJEPAX-lite JEPA + SP8192 CaseOps PPM — val_bpb 0.97271 #2080

Open

This was referenced May 1, 2026

Record: CaseOps Gated XSA NgramTilt LQER | val_bpb=1.05933439 #2123

Closed

Record : CaseOps Gated XSA NgramTilt LQER | val_bpb=1.05933439 #2124

Open

Record : CaseOps Gated XSA NgramTilt LQER | val_bpb=1.05933439 vaibhavmishra1/parameter-golf#1

Merged

leon2k2k2k mentioned this pull request May 1, 2026

Train/val data leakage in CaseOps records — prepare_caseops_data.py default overlaps 80% of val docs with training data #2127

Open

TanishGudise mentioned this pull request May 1, 2026

Record candidate: 1.05670 BPB — token-only n-gram tilt + AsymLogit + #2060 levers + NUM_PHASES=1 #2130

Open

codemath3000 mentioned this pull request May 2, 2026

Record candidate: PR #2130 base + GPTQ_CALIBRATION_BATCHES=32 — val_bpb 1.05651 (3-seed mean) #2135

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SP8192 #1855 Base + Asymmetric Logit Rescale + AWQ-lite — val_bpb 1.05971 (3-seed mean, full val)#1923

Record: SP8192 #1855 Base + Asymmetric Logit Rescale + AWQ-lite — val_bpb 1.05971 (3-seed mean, full val)#1923
jorge-asenjo wants to merge 2 commits intoopenai:mainfrom
jorge-asenjo:submit/sp8192-1855base-asymlogit-1.0658

jorge-asenjo commented Apr 29, 2026

Uh oh!

h1beee commented Apr 29, 2026 •

edited

Loading

Uh oh!

jorge-asenjo commented Apr 29, 2026

Uh oh!

codemath3000 commented Apr 29, 2026

Uh oh!

jorge-asenjo commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jorge-asenjo commented Apr 29, 2026

Summary

3-seed Results

What changed vs #1855

Test plan

Attribution

Uh oh!

h1beee commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorge-asenjo commented Apr 29, 2026

Uh oh!

codemath3000 commented Apr 29, 2026

Uh oh!

jorge-asenjo commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

h1beee commented Apr 29, 2026 •

edited

Loading