Record: PR #1908 reproduction with compliant 600s wallclock — val_bpb 1.06044 (3-seed mean) by AayushBaniya2006 · Pull Request #1956 · openai/parameter-golf

AayushBaniya2006 · 2026-04-30T02:18:14Z

Summary

3-seed mean val_bpb = 1.06043952 (std 0.00091) on top of @romeerp's PR #1908 stack, run with organic 600 s wallclock control instead of FORCE_STOP_STEP=4945. All three seeds finish strictly under the 600,000 ms training cap.

Seed	Train wallclock	Post-TTT val_bpb	Artifact bytes
42	599,521 ms	1.05938494	15,943,518
0	599,665 ms	1.06101359	15,945,548
1234	599,676 ms	1.06092004	15,950,342
Mean	599,621 ms	1.06043952	15,946,469

vs PR #1908

	PR #1908	This submission	Δ
3-seed mean post-TTT	1.06081076	1.06043952	−0.00037124
Seed 42 train wallclock	601,153 ms (over cap)	599,521 ms	−1,632 ms
Max artifact bytes	15,996,559	15,950,342	−46,217

PR #1908 used FORCE_STOP_STEP=4945, which causes train_gpt.py to ignore the wallclock cap and forces step 4945 regardless of elapsed time. Their seed-42 run consumed 601,153 ms — over the 600,000 ms cap that defines the track_10min_16mb track. This submission removes FORCE_STOP_STEP and lets training stop organically at the wallclock cap; on the GPU instance used here (8×H100 80GB SXM, RunPod community cloud) that yields stop steps within 15 of PR #1908's targets, while staying compliant.

The PR #1908 author flagged this themselves: "this should be open for anyone who wants to claim a new record if they can re-run it under the 600s wallclock" (source). This submission takes them up on it.

What changed

train_gpt.py is byte-identical to PR #1908's submission file (commit 291d3abd on romeerp/parameter-golf:codex/awq-stepmatched). All architectural and quantization logic — including activation-aware GPTQ mixed-precision, LQER asymmetric int4, and per-group lrzip-zpaq compression — is unchanged. All quantization knobs (AWQ_LITE_GROUP_TOP_K=1, LQER_TOP_K=3, LQER_GAIN_SELECT=0) are unchanged.

The only difference is the wallclock control path:

	PR #1908	This submission
`MAX_WALLCLOCK_SECONDS`	0	600
`FORCE_STOP_STEP`	4945	unset

Reproducing

Same dataset and tokenizer as PR #1908: romeerp/parameter-golf-caseops-v1 (HuggingFace), variant sp8192_lossless_caps_caseops_v1_reserved. See records/track_10min_16mb/2026-04-29_PR1908Repro_Compliant600s_VAL_BPB_1.06044/README.md for the full env-var block.

Credits

This stands entirely on the work of @romeerp (PR #1908, PR #1729), @codemath3000 (PR #1855), @dexhunter (PR #1797, PR #1626), @nprime06 (PR #1787), and the rest of the PR #1855 lineage. The contribution of this submission is narrow: demonstrating that the PR #1908 stack achieves its quality strictly within the 600 s training cap when run with organic wallclock control.

Test plan

All three seeds run with MAX_WALLCLOCK_SECONDS=600 and no FORCE_STOP_STEP
All three train_seed*.log files captured and included
All three artifacts under 16,000,000 bytes
All three training wallclocks under 600,000 ms
submission.json includes per-seed metadata + compliance attestation

🤖 Generated with Claude Code

@romeerp

…al_bpb 1.06044 (3-seed mean) 3-seed mean post-TTT val_bpb: 1.06043952 (std 0.00091) - seed 42: 1.05938494 (599521ms / 15943518 bytes) - seed 0: 1.06101359 (599665ms / 15945548 bytes) - seed 1234: 1.06092004 (599676ms / 15950342 bytes) vs PR openai#1908 (current candidate at 1.06081076): -0.00037124 BPB. All three seeds compliant under the 600,000 ms training cap; PR openai#1908's seed 42 used 601,153 ms (over cap) because FORCE_STOP_STEP=4945 ignored the wallclock check. Identical training recipe and quantization knobs to PR openai#1908 (verbatim train_gpt.py from commit 291d3ab, AWQ_LITE_GROUP_TOP_K=1, LQER_TOP_K=3, LQER_GAIN_SELECT=0). Only difference: organic 600s wallclock cap (no FORCE_STOP_STEP). Taken with explicit invitation from @romeerp on PR openai#1908. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ndokutovich mentioned this pull request Apr 30, 2026

Record: V21 + N-gram Tilt + LeakyReLU 0.3 — val_bpb 1.05851 (3-seed mean) #1967

Open

Devchandrasen mentioned this pull request Apr 30, 2026

Non-record: AWQ 2xH100 proxy no-compile quantized eval #1976

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: PR #1908 reproduction with compliant 600s wallclock — val_bpb 1.06044 (3-seed mean)#1956

Record: PR #1908 reproduction with compliant 600s wallclock — val_bpb 1.06044 (3-seed mean)#1956
AayushBaniya2006 wants to merge 1 commit intoopenai:mainfrom
AayushBaniya2006:submission/pr1908-compliant-1.06044

AayushBaniya2006 commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AayushBaniya2006 commented Apr 30, 2026

Summary

vs PR #1908

What changed

Reproducing

Credits

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant