Record val_bpb 1.06996: Independent 3-seed reproduction of PR #1874 + TTT_LORA_RANK=192 #1909
Open
GodlyDonuts wants to merge 1 commit intoopenai:mainfrom
Open
Record val_bpb 1.06996: Independent 3-seed reproduction of PR #1874 + TTT_LORA_RANK=192 #1909GodlyDonuts wants to merge 1 commit intoopenai:mainfrom
GodlyDonuts wants to merge 1 commit intoopenai:mainfrom
Conversation
…+ TTT_LORA_RANK=192 — val_bpb 1.06996 (3-seed mean) Independent end-to-end reproduction of @AjAnubolu's PR openai#1874 stack (SmearGate / AttnOutGate / LoRA-TTT / Phased Global SGD TTT / Polar Express NS / MIN_LR / LQER) on a separate 8xH100 SXM pod across three independent seeds, with one additional hyperparameter change: ttt_lora_rank default 128 -> 192. Headline numbers (verbatim from logs in this folder): seed val_bpb total_submission_bytes eval_time 42 1.06927777 15,954,871 438.3 s 314 1.07023963 15,954,924 440.6 s 999 1.07035739 15,947,796 434.3 s ----- ---------- ---------------------- --------- mean 1.069958 15,952,530 437.7 s std 0.000592 -- -- vs current merged SOTA 1.0810 (PR openai#1493): improvement 0.011042 nats, excess over 0.005-nat threshold 0.006042 nats, t-statistic = 17.67 (df = 2, one-tailed). Critical t for p < 0.01 is 6.965; for p < 0.005 is 9.925. Both pass; p-value bound < 0.005. Compliance: train < 600 s, eval < 600 s, total submission < 16,000,000 B on all 3 seeds. Score-first TTT per Issue openai#1017 Track B. No SLOT, no pre-quant TTT, no ETLB, no n-gram cache. Five reload-ready int6 quantized artifacts shipped in models/ for direct verification (3 seed champions + rank=128 PR openai#1874 baseline + rank=192 single-seed sweep run, the latter two so a reviewer can independently verify the rank-delta sweep evidence). PR openai#1900 / @regina-openai provenance review: openly disclosed in the README. Both upstream blocked PRs (openai#1787 MIN_LR, openai#1797 LQER) are env-var- gated in the shipped train_gpt.py; a no-blocked-parents variant (MIN_LR=0.0 LQER_ENABLED=0) is offered on request. Author: Saicharan Ramineni <[email protected]>
alertcat
added a commit
to alertcat/parameter-golf
that referenced
this pull request
Apr 29, 2026
V19c (seed 42) result: 1.06179 BPB (LOSS by +0.001 vs PR openai#1908 frontier 1.06081). V19c data attribution: pre-quant 1.06906 vs PR openai#1908 1.06384 = +0.0052 hurt -> primary cause: MATRIX_LR=0.028 (vs default 0.026) penalty on seed 42 TTT recovery -0.01489 vs PR openai#1908 -0.01269 = +0.0022 helped -> AsymLogit + PHASED_TTT_PREFIX=3500 actually working V20 strategy: remove LR penalty + keep TTT helpers + add LORA capacity: - DROP MATRIX_LR=0.028 -> default 0.026 (recovers +0.005 BPB on pre-quant) - KEEP ASYM_LOGIT_RESCALE=1 (eval-only, verified -0.001 to -0.002) - KEEP TTT_WEIGHT_DECAY=2.0 (stability fix) - KEEP PHASED_TTT_PREFIX_DOCS=3500 (verified more LoRA training data) - ADD TTT_LORA_RANK=144 (vs 96 default, +50% LoRA capacity) PR openai#1909 GodlyDonuts verified rank=192 gives small benefit on PR openai#1874 Conservative 144 to balance benefit vs eval-time budget (V19c was 527s, 73s buffer) Predicted (seed 42): pre-quant: ~1.063 (no train hparam changes from PR openai#1908) quantized: ~1.072 (matches PR openai#1908 quant tax) post-TTT: ~1.057 (TTT recovery -0.013 base + -0.002 AsymLogit/PHASED + -0.001 RANK = -0.016) Win threshold: < 1.06021 (PR openai#1908 - 0.0006 community floor) Probability of true win: ~50% Cost: ~$22 single-seed scout on 8xH100 SXM
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This submission is fundamentally an independent end-to-end reproduction of PR #1874 by @AjAnubolu, run from scratch on a separate 8×H100 SXM pod across three independent seeds, with one additional hyperparameter change ($128 \rightarrow 192$ ).
TTT_LORA_RANKdefaultEvery numerical claim below maps to a specific line in one of the unedited training logs included in the submission folder; every
.int6.ptzartifact described is shipped reload-ready inmodels/.Headline numbers (verbatim from the included logs)
Comparison:
Vs. current merged SOTA 1.0810 (PR #1493):
Byte-budget compliance
The included
train_gpt.pyruns_compressed_code_size()at the end of every training run. It reads its own source, runs it throughpyminify+lzma+b85, and adds that to the brotli-compressedint6artifact.What's ours, in one diff
That is the entire code delta vs PR #1874. In our single-seed sweep,$-0.00019$ nat against
rank=192measuredrank=128. This is effectively within the noise floor; the primary contribution of this PR is the independent reproduction and the provision of full logs/artifacts.Compliance with Issue #1017 Track B
torch.no_grad()before any TTT update.Five reload-ready artifacts shipped in
models/Including artifacts allows reviewers to verify the work without a full re-train. (~76 MB total):
champion_3seed_{42,314,999}.int6.ptz: The headline 3-seed results.pr1874_baseline_rank128_seed42.int6.ptz: Baseline reproduction of PR Record: SP8192 + Polar Express NS + MIN_LR + LQER Asym Rank-4 — val_bpb 1.06766 (3-seed mean) #1874.sweep_rank192_seed42.int6.ptz: Sweep run for A/B verification.Regarding PR #1900's provenance review
We are aware of PR #1900 regarding provenance concerns on upstream techniques (
MIN_LRandLQER).MIN_LR=0.0andLQER_ENABLED=0(est. mean: 1.077–1.079 BPB) upon request.Attribution
Submitted by: Saicharan Ramineni (@GodlyDonuts)
Compute: 8×H100 80 GB SXM (RunPod) | PyTorch 2.9.1 | FlashAttention 3