Record val_bpb 1.06996: Independent 3-seed reproduction of PR #1874 + TTT_LORA_RANK=192 by GodlyDonuts · Pull Request #1909 · openai/parameter-golf

GodlyDonuts · 2026-04-28T20:05:50Z

This submission is fundamentally an independent end-to-end reproduction of PR #1874 by @AjAnubolu, run from scratch on a separate 8×H100 SXM pod across three independent seeds, with one additional hyperparameter change (TTT_LORA_RANK default $128 \rightarrow 192$).

Every numerical claim below maps to a specific line in one of the unedited training logs included in the submission folder; every .int6.ptz artifact described is shipped reload-ready in models/.

Headline numbers (verbatim from the included logs)

Seed	val_bpb (quantized_ttt_phased)	Total submission bytes	Headroom under 16 MB	Eval time (s)
42	1.06927777	15,954,871	45,129	438.3
314	1.07023963	15,954,924	45,076	440.6
999	1.07035739	15,947,796	52,204	434.3
Mean	1.069958	15,952,530	47,470	437.7
Std	0.000592	—	—	—

Comparison:
Vs. current merged SOTA 1.0810 (PR #1493):

Improvement: 0.011042 nats
Excess over threshold: 0.006042 nats ($> 0.005$ nat requirement)
Significance: $t$-statistic = $17.67$ ($df = 2$, one-tailed).
P-value: Bound $< 0.005$ (Critical $t$ for $p < 0.01$ is $6.965$; for $p < 0.005$ is $9.925$).

Byte-budget compliance

The included train_gpt.py runs _compressed_code_size() at the end of every training run. It reads its own source, runs it through pyminify + lzma + b85, and adds that to the brotli-compressed int6 artifact.

Minimum Headroom: 45,076 bytes under the 16,000,000-byte cap.
Code Size: The script on disk is 32,353 B; it self-reports 33,710 B for the budget calculation at runtime.

What's ours, in one diff

- ttt_lora_rank = int(os.environ.get("TTT_LORA_RANK", 128))
+ ttt_lora_rank = int(os.environ.get("TTT_LORA_RANK", 192))

That is the entire code delta vs PR #1874. In our single-seed sweep, rank=192 measured $-0.00019$ nat against rank=128. This is effectively within the noise floor; the primary contribution of this PR is the independent reproduction and the provision of full logs/artifacts.

Compliance with Issue #1017 Track B

Causality: Sliding-window eval scores from prefix tokens only.
Score before update: Every chunk is fully scored under torch.no_grad() before any TTT update.
Single pass: Each token scored exactly once.
Restricted Techniques: No SLOT, no pre-quant TTT on val data, no ETLB, no n-gram cache.
Time Constraints: * Train: $< 600$ s on all 3 seeds (9.2 min avg).
- Eval: $< 600$ s on all 3 seeds (440.6 s max).

Five reload-ready artifacts shipped in `models/`

Including artifacts allows reviewers to verify the work without a full re-train. (~76 MB total):

champion_3seed_{42,314,999}.int6.ptz: The headline 3-seed results.
pr1874_baseline_rank128_seed42.int6.ptz: Baseline reproduction of PR Record: SP8192 + Polar Express NS + MIN_LR + LQER Asym Rank-4 — val_bpb 1.06766 (3-seed mean) #1874.
sweep_rank192_seed42.int6.ptz: Sweep run for A/B verification.

Regarding PR #1900's provenance review

We are aware of PR #1900 regarding provenance concerns on upstream techniques (MIN_LR and LQER).

Independent Verification: Every number here was generated on our own hardware. Logs are provided for trace-ability.
Technique Inheritance: We reproduced PR #1874 knowing it uses these high-performance techniques.
Policy Compliance: If derivative submissions inherit blocked status, we will not contest closure.
Fallback Offer: We can submit a variant with MIN_LR=0.0 and LQER_ENABLED=0 (est. mean: 1.077–1.079 BPB) upon request.

Attribution

Full Stack: @AjAnubolu (PR #1874)
Architecture: @dexhunter (PR #1790, #1797) and @clarkkev (PR #1394).
Frameworks: LQER (PR #1530), Polar Express NS (PR #1667), Score-first TTT (@abaybektursun, PR #549).

Submitted by: Saicharan Ramineni (@GodlyDonuts)
Compute: 8×H100 80 GB SXM (RunPod) | PyTorch 2.9.1 | FlashAttention 3

@AjAnubolu

…+ TTT_LORA_RANK=192 — val_bpb 1.06996 (3-seed mean) Independent end-to-end reproduction of @AjAnubolu's PR openai#1874 stack (SmearGate / AttnOutGate / LoRA-TTT / Phased Global SGD TTT / Polar Express NS / MIN_LR / LQER) on a separate 8xH100 SXM pod across three independent seeds, with one additional hyperparameter change: ttt_lora_rank default 128 -> 192. Headline numbers (verbatim from logs in this folder): seed val_bpb total_submission_bytes eval_time 42 1.06927777 15,954,871 438.3 s 314 1.07023963 15,954,924 440.6 s 999 1.07035739 15,947,796 434.3 s ----- ---------- ---------------------- --------- mean 1.069958 15,952,530 437.7 s std 0.000592 -- -- vs current merged SOTA 1.0810 (PR openai#1493): improvement 0.011042 nats, excess over 0.005-nat threshold 0.006042 nats, t-statistic = 17.67 (df = 2, one-tailed). Critical t for p < 0.01 is 6.965; for p < 0.005 is 9.925. Both pass; p-value bound < 0.005. Compliance: train < 600 s, eval < 600 s, total submission < 16,000,000 B on all 3 seeds. Score-first TTT per Issue openai#1017 Track B. No SLOT, no pre-quant TTT, no ETLB, no n-gram cache. Five reload-ready int6 quantized artifacts shipped in models/ for direct verification (3 seed champions + rank=128 PR openai#1874 baseline + rank=192 single-seed sweep run, the latter two so a reviewer can independently verify the rank-delta sweep evidence). PR openai#1900 / @regina-openai provenance review: openly disclosed in the README. Both upstream blocked PRs (openai#1787 MIN_LR, openai#1797 LQER) are env-var- gated in the shipped train_gpt.py; a no-blocked-parents variant (MIN_LR=0.0 LQER_ENABLED=0) is offered on request. Author: Saicharan Ramineni <[email protected]>

V19c (seed 42) result: 1.06179 BPB (LOSS by +0.001 vs PR openai#1908 frontier 1.06081). V19c data attribution: pre-quant 1.06906 vs PR openai#1908 1.06384 = +0.0052 hurt -> primary cause: MATRIX_LR=0.028 (vs default 0.026) penalty on seed 42 TTT recovery -0.01489 vs PR openai#1908 -0.01269 = +0.0022 helped -> AsymLogit + PHASED_TTT_PREFIX=3500 actually working V20 strategy: remove LR penalty + keep TTT helpers + add LORA capacity: - DROP MATRIX_LR=0.028 -> default 0.026 (recovers +0.005 BPB on pre-quant) - KEEP ASYM_LOGIT_RESCALE=1 (eval-only, verified -0.001 to -0.002) - KEEP TTT_WEIGHT_DECAY=2.0 (stability fix) - KEEP PHASED_TTT_PREFIX_DOCS=3500 (verified more LoRA training data) - ADD TTT_LORA_RANK=144 (vs 96 default, +50% LoRA capacity) PR openai#1909 GodlyDonuts verified rank=192 gives small benefit on PR openai#1874 Conservative 144 to balance benefit vs eval-time budget (V19c was 527s, 73s buffer) Predicted (seed 42): pre-quant: ~1.063 (no train hparam changes from PR openai#1908) quantized: ~1.072 (matches PR openai#1908 quant tax) post-TTT: ~1.057 (TTT recovery -0.013 base + -0.002 AsymLogit/PHASED + -0.001 RANK = -0.016) Win threshold: < 1.06021 (PR openai#1908 - 0.0006 community floor) Probability of true win: ~50% Cost: ~$22 single-seed scout on 8xH100 SXM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record val_bpb 1.06996: Independent 3-seed reproduction of PR #1874 + TTT_LORA_RANK=192 #1909

Record val_bpb 1.06996: Independent 3-seed reproduction of PR #1874 + TTT_LORA_RANK=192 #1909
GodlyDonuts wants to merge 1 commit intoopenai:mainfrom
GodlyDonuts:record-pr1874-rank192-3seed-1.06996

GodlyDonuts commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GodlyDonuts commented Apr 28, 2026

Headline numbers (verbatim from the included logs)

Byte-budget compliance

What's ours, in one diff

Compliance with Issue #1017 Track B

Five reload-ready artifacts shipped in models/

Regarding PR #1900's provenance review

Attribution

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Five reload-ready artifacts shipped in `models/`