Record: PR #1908 reproduction with compliant 600s wallclock — val_bpb 1.06044 (3-seed mean)#1956
Open
AayushBaniya2006 wants to merge 1 commit intoopenai:mainfrom
Open
Conversation
…al_bpb 1.06044 (3-seed mean) 3-seed mean post-TTT val_bpb: 1.06043952 (std 0.00091) - seed 42: 1.05938494 (599521ms / 15943518 bytes) - seed 0: 1.06101359 (599665ms / 15945548 bytes) - seed 1234: 1.06092004 (599676ms / 15950342 bytes) vs PR openai#1908 (current candidate at 1.06081076): -0.00037124 BPB. All three seeds compliant under the 600,000 ms training cap; PR openai#1908's seed 42 used 601,153 ms (over cap) because FORCE_STOP_STEP=4945 ignored the wallclock check. Identical training recipe and quantization knobs to PR openai#1908 (verbatim train_gpt.py from commit 291d3ab, AWQ_LITE_GROUP_TOP_K=1, LQER_TOP_K=3, LQER_GAIN_SELECT=0). Only difference: organic 600s wallclock cap (no FORCE_STOP_STEP). Taken with explicit invitation from @romeerp on PR openai#1908. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
3-seed mean val_bpb = 1.06043952 (std 0.00091) on top of @romeerp's PR #1908 stack, run with organic 600 s wallclock control instead of
FORCE_STOP_STEP=4945. All three seeds finish strictly under the 600,000 ms training cap.vs PR #1908
PR #1908 used
FORCE_STOP_STEP=4945, which causestrain_gpt.pyto ignore the wallclock cap and forces step 4945 regardless of elapsed time. Their seed-42 run consumed 601,153 ms — over the 600,000 ms cap that defines thetrack_10min_16mbtrack. This submission removesFORCE_STOP_STEPand lets training stop organically at the wallclock cap; on the GPU instance used here (8×H100 80GB SXM, RunPod community cloud) that yields stop steps within 15 of PR #1908's targets, while staying compliant.The PR #1908 author flagged this themselves: "this should be open for anyone who wants to claim a new record if they can re-run it under the 600s wallclock" (source). This submission takes them up on it.
What changed
train_gpt.pyis byte-identical to PR #1908's submission file (commit291d3abdonromeerp/parameter-golf:codex/awq-stepmatched). All architectural and quantization logic — including activation-aware GPTQ mixed-precision, LQER asymmetric int4, and per-group lrzip-zpaq compression — is unchanged. All quantization knobs (AWQ_LITE_GROUP_TOP_K=1,LQER_TOP_K=3,LQER_GAIN_SELECT=0) are unchanged.The only difference is the wallclock control path:
MAX_WALLCLOCK_SECONDSFORCE_STOP_STEPReproducing
Same dataset and tokenizer as PR #1908:
romeerp/parameter-golf-caseops-v1(HuggingFace), variantsp8192_lossless_caps_caseops_v1_reserved. Seerecords/track_10min_16mb/2026-04-29_PR1908Repro_Compliant600s_VAL_BPB_1.06044/README.mdfor the full env-var block.Credits
This stands entirely on the work of @romeerp (PR #1908, PR #1729), @codemath3000 (PR #1855), @dexhunter (PR #1797, PR #1626), @nprime06 (PR #1787), and the rest of the PR #1855 lineage. The contribution of this submission is narrow: demonstrating that the PR #1908 stack achieves its quality strictly within the 600 s training cap when run with organic wallclock control.
Test plan
MAX_WALLCLOCK_SECONDS=600and noFORCE_STOP_STEPtrain_seed*.logfiles captured and includedsubmission.jsonincludes per-seed metadata + compliance attestation🤖 Generated with Claude Code