Non-record: final frontier autopsy by himanshudongre · Pull Request #2110 · openai/parameter-golf

himanshudongre · 2026-05-01T07:11:10Z

Non-record: Final Frontier Autopsy

Track: non-record / methodology
Author: Himanshu Dongre
Date: 2026-05-01
Leaderboard claim: none

This is a non-record submission. It records my final attempt to improve the
late PR #2018 frontier, the logs behind that attempt, and the stop rule that
fell out of it.

The short version: every serious branch failed before quantization. On this
frontier, the trained model had to be competitive before GPTQ and TTT. Mine
was not.

Result Summary

Run	Base	Change	Seed	Pre-quant BPB	Quant BPB	Final BPB	Train time	Eval time	Artifact	Result
PR #2018 reference	#2018	none	1337	1.05124428	1.05990331	1.04826351	596.167s	465.480s	15,992,746	target
Plan A	#2018	Gate32 + q-aware token-only tilt	1337	1.06385301	1.07199665	1.06057508	595.974s	515.283s	15,972,854	no-go
Plan B	#2018	Gate32 + native n-gram	1337	1.06434971	not run	not run	596.116s	not run	not run	stopped at pre-quant
Plan C	#2018	native n-gram + BigramHash 512x4 + Path-A-v3 small	1337	1.06471733	not retained	not retained	596.111s	not run	not retained	stopped at pre-quant

The useful observation is where the runs failed. The gap was already around
+0.013 BPB before quantization. That was too large for quantization choices
or score-first TTT to rescue.

Frontier stop rule from these runs:
If a branch is about +0.01 BPB worse before quantization on the same seed,
stop unless the branch introduces a proven legal eval mechanism of that size.

Files

records/track_non_record_16mb/2026-05-01_Final_Frontier_Autopsy/
|-- README.md
|-- submission.json
|-- logs/
|   |-- planA_2018_qaware_gate32_seed1337.log
|   |-- planA_metrics_and_decision.jsonish
|   |-- planB_2018_native_gate32_seed1337_prequant_killed.log
|   |-- planB_metrics_and_decision.jsonish
|   `-- planC_terminal_observation.md
`-- scripts/
    |-- pod_move39_gate_scout.sh
    |-- patch_qaware_ngram_2018.py
    |-- patch_bigramhash_frontier.py
    `-- patch_path_a_v3_small_frontier.py

Plan C is terminal-observed only. The pod connection dropped before I could
copy the full remote output folder, so I do not treat it as a complete logged
artifact.

Plan A: Gate32 + q-aware token-only n-gram tilt

Command shape:

QAWARE_NGRAM_PATCH=1 NGRAM_QAWARE_DYNAMIC=1 BASE_STACK=2018 \
  bash /workspace/pod_move39_gate_scout.sh \
  run_split 1337 32 12 planA_2018_qaware_gate32

Key log lines:

caseops_enabled: True
gate_window: 32
smear_gate_window: 12
ngram_tilt_enabled: True
ngram_qaware_dynamic: True
val_tokens: 47851520
stopping_early: wallclock_cap train_time: 595974ms step: 4871/20000
diagnostic pre-quantization post-ema val_loss:2.32830764 val_bpb:1.06385301
diagnostic quantized val_loss:2.34613048 val_bpb:1.07199665
ngram_tilt:hints total=47851520 gated=628130 token_gate=628130 within_gate=0 word_gate=0 agree2plus=0
quantized_ttt_phased val_loss:2.32093140 val_bpb:1.06057508 eval_time:515283ms
total_eval_time:515.3s

Budget checks passed:

artifact: 15,972,854 bytes,
train: 595.974s,
eval: 515.283s.

The n-gram path was token-only and timed inside eval. The failure was not
timing or artifact size. The trained model was weak before quantization.

Plan B: Gate32 + native PR #2018 n-gram

Plan B removed the q-aware patch to test whether the stricter n-gram logic was
responsible for the regression.

Command shape:

BASE_STACK=2018 \
  bash /workspace/pod_move39_gate_scout.sh \
  run_split 1337 32 12 planB_2018_native_ngram_gate32

Key log lines:

caseops_enabled: True
gate_window: 32
smear_gate_window: 12
ngram_tilt_enabled: True
val_tokens: 47851520
stopping_early: wallclock_cap train_time: 596116ms step: 4827/20000
diagnostic pre-quantization post-ema val_loss:2.32939469 val_bpb:1.06434971

That result isolated the main issue. Gate32 itself did not transfer to this
stack. The q-aware n-gram patch was not the cause of the pre-quant regression.

Plan C: exact #2018 gates + tiny BigramHash

The last branch removed Gate32 and tested a small causal input feature from my
earlier work.

Command shape:

BIGRAMHASH_PATCH=1 PATH_A_V3_SMALL=1 \
BIGRAM_VOCAB_SIZE=512 BIGRAM_DIM=4 BIGRAM_BITS=6 \
BASE_STACK=2018 \
  bash /workspace/pod_move39_gate_scout.sh \
  run_split 1337 12 12 planC_2018_native_bigram512d4_patha

Terminal-observed lines:

bigram_bits: 6
bigram_dim: 4
bigram_vocab_size: 512
path_a_v3_small: True
gate_window: 12
smear_gate_window: 12
model_params:35949858
stopping_early: wallclock_cap train_time: 596111ms step: 4837/20000
diagnostic pre-quantization post-ema val_loss:2.33019926 val_bpb:1.06471733

This was also stopped at pre-quant. A tiny BigramHash branch did not recover
training quality on the #2018 frontier.

Interpretation

The final runs support three narrow conclusions.

1. Gate32 did not transfer here

Gate widening had public evidence on nearby stacks, but it damaged this one.
The failure appeared before quantization and before TTT, so this was a training
dynamics problem rather than an eval-time issue.

2. The q-aware n-gram patch was not the root cause

Plan B removed that patch and still produced an even worse pre-quant result.

3. A tiny BigramHash branch was too small or mismatched

BigramHash helped my earlier PR #1716 in a different base. The 512x4 version
tested here did not transfer to #2018.

Compliance Notes

This is a non-record package, but the executed Plan A run still satisfies the
ordinary record constraints:

Check	Plan A
Artifact under 16,000,000 bytes	yes, 15,972,854
Train under 600s	yes, 595.974s
Eval under 600s	yes, 515.283s
Token-only n-gram path	yes
In-timer n-gram preprocessing	yes
Score-first TTT inherited	yes
Leaderboard claim	none

Closing Note

This folder is intentionally narrow. It is not a competition-wide synthesis.
It is the evidence package for one failed final transfer attempt.

The broader research notes are in PR #2111.

himanshudongre mentioned this pull request May 1, 2026

Non-record: competition research notes #2111

Open

Non-record: final frontier autopsy

1236ae4

himanshudongre force-pushed the nonrecord/final-frontier-autopsy branch from 9f5d71f to 1236ae4 Compare May 1, 2026 07:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: final frontier autopsy#2110

Non-record: final frontier autopsy#2110
himanshudongre wants to merge 1 commit intoopenai:mainfrom
himanshudongre:nonrecord/final-frontier-autopsy

himanshudongre commented May 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

himanshudongre commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Non-record: Final Frontier Autopsy

Result Summary

Files

Plan A: Gate32 + q-aware token-only n-gram tilt

Plan B: Gate32 + native PR #2018 n-gram

Plan C: exact #2018 gates + tiny BigramHash

Interpretation

1. Gate32 did not transfer here

2. The q-aware n-gram patch was not the root cause

3. A tiny BigramHash branch was too small or mismatched

Compliance Notes

Closing Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

himanshudongre commented May 1, 2026 •

edited

Loading