Skip to content

Record val_bpb 1.06996: Independent 3-seed reproduction of PR #1874 + TTT_LORA_RANK=192 #1909

Open
GodlyDonuts wants to merge 1 commit intoopenai:mainfrom
GodlyDonuts:record-pr1874-rank192-3seed-1.06996
Open

Record val_bpb 1.06996: Independent 3-seed reproduction of PR #1874 + TTT_LORA_RANK=192 #1909
GodlyDonuts wants to merge 1 commit intoopenai:mainfrom
GodlyDonuts:record-pr1874-rank192-3seed-1.06996

Conversation

@GodlyDonuts
Copy link
Copy Markdown

This submission is fundamentally an independent end-to-end reproduction of PR #1874 by @AjAnubolu, run from scratch on a separate 8×H100 SXM pod across three independent seeds, with one additional hyperparameter change (TTT_LORA_RANK default $128 \rightarrow 192$).

Every numerical claim below maps to a specific line in one of the unedited training logs included in the submission folder; every .int6.ptz artifact described is shipped reload-ready in models/.

Headline numbers (verbatim from the included logs)

Seed val_bpb (quantized_ttt_phased) Total submission bytes Headroom under 16 MB Eval time (s)
42 1.06927777 15,954,871 45,129 438.3
314 1.07023963 15,954,924 45,076 440.6
999 1.07035739 15,947,796 52,204 434.3
Mean 1.069958 15,952,530 47,470 437.7
Std 0.000592

Comparison:
Vs. current merged SOTA 1.0810 (PR #1493):

  • Improvement: 0.011042 nats
  • Excess over threshold: 0.006042 nats ($> 0.005$ nat requirement)
  • Significance: $t$-statistic = $17.67$ ($df = 2$, one-tailed).
  • P-value: Bound $< 0.005$ (Critical $t$ for $p < 0.01$ is $6.965$; for $p < 0.005$ is $9.925$).

Byte-budget compliance

The included train_gpt.py runs _compressed_code_size() at the end of every training run. It reads its own source, runs it through pyminify + lzma + b85, and adds that to the brotli-compressed int6 artifact.

  • Minimum Headroom: 45,076 bytes under the 16,000,000-byte cap.
  • Code Size: The script on disk is 32,353 B; it self-reports 33,710 B for the budget calculation at runtime.

What's ours, in one diff

- ttt_lora_rank = int(os.environ.get("TTT_LORA_RANK", 128))
+ ttt_lora_rank = int(os.environ.get("TTT_LORA_RANK", 192))

That is the entire code delta vs PR #1874. In our single-seed sweep, rank=192 measured $-0.00019$ nat against rank=128. This is effectively within the noise floor; the primary contribution of this PR is the independent reproduction and the provision of full logs/artifacts.

Compliance with Issue #1017 Track B

  • Causality: Sliding-window eval scores from prefix tokens only.
  • Score before update: Every chunk is fully scored under torch.no_grad() before any TTT update.
  • Single pass: Each token scored exactly once.
  • Restricted Techniques: No SLOT, no pre-quant TTT on val data, no ETLB, no n-gram cache.
  • Time Constraints: * Train: $< 600$ s on all 3 seeds (9.2 min avg).
    • Eval: $< 600$ s on all 3 seeds (440.6 s max).

Five reload-ready artifacts shipped in models/

Including artifacts allows reviewers to verify the work without a full re-train. (~76 MB total):

Regarding PR #1900's provenance review

We are aware of PR #1900 regarding provenance concerns on upstream techniques (MIN_LR and LQER).

  1. Independent Verification: Every number here was generated on our own hardware. Logs are provided for trace-ability.
  2. Technique Inheritance: We reproduced PR #1874 knowing it uses these high-performance techniques.
  3. Policy Compliance: If derivative submissions inherit blocked status, we will not contest closure.
  4. Fallback Offer: We can submit a variant with MIN_LR=0.0 and LQER_ENABLED=0 (est. mean: 1.077–1.079 BPB) upon request.

Attribution


Submitted by: Saicharan Ramineni (@GodlyDonuts)
Compute: 8×H100 80 GB SXM (RunPod) | PyTorch 2.9.1 | FlashAttention 3

…+ TTT_LORA_RANK=192 — val_bpb 1.06996 (3-seed mean)

Independent end-to-end reproduction of @AjAnubolu's PR openai#1874 stack
(SmearGate / AttnOutGate / LoRA-TTT / Phased Global SGD TTT / Polar Express NS /
MIN_LR / LQER) on a separate 8xH100 SXM pod across three independent seeds,
with one additional hyperparameter change: ttt_lora_rank default 128 -> 192.

Headline numbers (verbatim from logs in this folder):

  seed   val_bpb     total_submission_bytes  eval_time
   42    1.06927777  15,954,871              438.3 s
  314    1.07023963  15,954,924              440.6 s
  999    1.07035739  15,947,796              434.3 s
  -----  ----------  ----------------------  ---------
  mean   1.069958    15,952,530              437.7 s
  std     0.000592   --                      --

vs current merged SOTA 1.0810 (PR openai#1493): improvement 0.011042 nats,
excess over 0.005-nat threshold 0.006042 nats, t-statistic = 17.67
(df = 2, one-tailed). Critical t for p < 0.01 is 6.965; for p < 0.005 is 9.925.
Both pass; p-value bound < 0.005.

Compliance: train < 600 s, eval < 600 s, total submission < 16,000,000 B
on all 3 seeds. Score-first TTT per Issue openai#1017 Track B. No SLOT, no
pre-quant TTT, no ETLB, no n-gram cache.

Five reload-ready int6 quantized artifacts shipped in models/ for direct
verification (3 seed champions + rank=128 PR openai#1874 baseline + rank=192
single-seed sweep run, the latter two so a reviewer can independently
verify the rank-delta sweep evidence).

PR openai#1900 / @regina-openai provenance review: openly disclosed in the
README. Both upstream blocked PRs (openai#1787 MIN_LR, openai#1797 LQER) are env-var-
gated in the shipped train_gpt.py; a no-blocked-parents variant
(MIN_LR=0.0 LQER_ENABLED=0) is offered on request.

Author: Saicharan Ramineni <[email protected]>
alertcat added a commit to alertcat/parameter-golf that referenced this pull request Apr 29, 2026
V19c (seed 42) result: 1.06179 BPB (LOSS by +0.001 vs PR openai#1908 frontier 1.06081).

V19c data attribution:
  pre-quant 1.06906 vs PR openai#1908 1.06384 = +0.0052 hurt
    -> primary cause: MATRIX_LR=0.028 (vs default 0.026) penalty on seed 42
  TTT recovery -0.01489 vs PR openai#1908 -0.01269 = +0.0022 helped
    -> AsymLogit + PHASED_TTT_PREFIX=3500 actually working

V20 strategy: remove LR penalty + keep TTT helpers + add LORA capacity:
  - DROP MATRIX_LR=0.028 -> default 0.026 (recovers +0.005 BPB on pre-quant)
  - KEEP ASYM_LOGIT_RESCALE=1 (eval-only, verified -0.001 to -0.002)
  - KEEP TTT_WEIGHT_DECAY=2.0 (stability fix)
  - KEEP PHASED_TTT_PREFIX_DOCS=3500 (verified more LoRA training data)
  - ADD TTT_LORA_RANK=144 (vs 96 default, +50% LoRA capacity)
    PR openai#1909 GodlyDonuts verified rank=192 gives small benefit on PR openai#1874
    Conservative 144 to balance benefit vs eval-time budget (V19c was 527s, 73s buffer)

Predicted (seed 42):
  pre-quant: ~1.063 (no train hparam changes from PR openai#1908)
  quantized: ~1.072 (matches PR openai#1908 quant tax)
  post-TTT:  ~1.057 (TTT recovery -0.013 base + -0.002 AsymLogit/PHASED + -0.001 RANK = -0.016)

Win threshold: < 1.06021 (PR openai#1908 - 0.0006 community floor)
Probability of true win: ~50%

Cost: ~$22 single-seed scout on 8xH100 SXM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant