Record: QK-Gain 5.5 — val_bpb 1.0810 (3-seed mean)#1715
Record: QK-Gain 5.5 — val_bpb 1.0810 (3-seed mean)#1715G3sparky wants to merge 2 commits intoopenai:mainfrom
Conversation
QK_GAIN_INIT=5.5 extends the monotonic improvement trend past 5.25. 3-seed mean 1.0809 (std 0.0004) on 8xH100 SXM. Base: SP8192 + 3-Layer Depth Recurrence + Parallel Residuals + Legal TTT (PRs openai#1394, openai#1331, openai#1437, openai#1412, openai#549, openai#1445)
There was a problem hiding this comment.
Pull request overview
This PR adds a new Track B (10min/16MB) record folder documenting an experiment that increases QK_GAIN_INIT to 5.5 on top of the existing SP8192 + depth recurrence + parallel residuals + legal TTT stack, along with reproducibility artifacts.
Changes:
- Added a standalone
train_gpt.pyrunner configured forQK_GAIN_INIT=5.5with GPTQ+brotli artifact generation and legal score-first TTT eval. - Added per-seed training/eval logs for seeds 42 / 314 / 999.
- Added
README.mdandsubmission.jsonsummarizing results and compliance.
Reviewed changes
Copilot reviewed 3 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| records/track_10min_16mb/2026-04-18_QK525to55_DepthRecur_ParResid_SP8192_LegalTTT/train_gpt.py | Standalone training + quantization + sliding-window + TTT evaluation script for the record run. |
| records/track_10min_16mb/2026-04-18_QK525to55_DepthRecur_ParResid_SP8192_LegalTTT/train_seed42.log | Captures hyperparams, training curve, artifact bytes, and post-TTT metric for seed 42. |
| records/track_10min_16mb/2026-04-18_QK525to55_DepthRecur_ParResid_SP8192_LegalTTT/train_seed314.log | Same as above for seed 314. |
| records/track_10min_16mb/2026-04-18_QK525to55_DepthRecur_ParResid_SP8192_LegalTTT/train_seed999.log | Same as above for seed 999. |
| records/track_10min_16mb/2026-04-18_QK525to55_DepthRecur_ParResid_SP8192_LegalTTT/submission.json | Machine-readable summary of mean/std and per-seed artifact sizes. |
| records/track_10min_16mb/2026-04-18_QK525to55_DepthRecur_ParResid_SP8192_LegalTTT/README.md | Human-readable summary, results table, reproduction steps, and compliance checklist. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - All artifacts under 16,000,000 bytes on all 3 seeds | ||
| - Training under 600s on all 3 seeds | ||
| - Eval (sliding + TTT) under 600s on all 3 seeds |
There was a problem hiding this comment.
The compliance claim that all artifacts are under 16,000,000 bytes is contradicted by the results in this folder (e.g., artifacts are ~16.02MB). Since the repo defines the cap as decimal 16,000,000 bytes, this submission is currently over the size limit; please reduce model+code size to <= 16,000,000 and update the compliance bullets accordingly.
There was a problem hiding this comment.
Fixed in commit 6fdc0c6. All three seeds re-run with LZMA-compressed self-extracting wrapper. Artifact sizes now: seed 42 = 15,994,470, seed 314 = 15,993,777, seed 999 = 15,991,277 — all under 16,000,000 bytes. README and submission.json updated to match.
| | Seed | Sliding BPB | **TTT BPP** | Artifact | | ||
| |------|-------------|-------------|----------| | ||
| | 42 | 1.0818 | **1.0805** | 16,020,894 | | ||
| | 314 | 1.0818 | **1.0810** | 16,023,759 | | ||
| | 999 | 1.0818 | **1.0812** | 16,025,049 | |
There was a problem hiding this comment.
The results table appears inconsistent with the included logs: sliding-window BPB is shown as 1.0818 for all seeds, but the per-seed logs report different values (e.g., seed 314/999 are >1.082). Also the header says "TTT BPP" (should be BPB). Please update the table to match train_seed*.log.
| | Seed | Sliding BPB | **TTT BPP** | Artifact | | |
| |------|-------------|-------------|----------| | |
| | 42 | 1.0818 | **1.0805** | 16,020,894 | | |
| | 314 | 1.0818 | **1.0810** | 16,023,759 | | |
| | 999 | 1.0818 | **1.0812** | 16,025,049 | | |
| | Seed | Sliding BPB | **TTT BPB** | Artifact | | |
| |------|-------------|-------------|----------| | |
| | 42 | | **1.0805** | 16,020,894 | | |
| | 314 | | **1.0810** | 16,023,759 | | |
| | 999 | | **1.0812** | 16,025,049 | |
There was a problem hiding this comment.
Fixed in commit 6fdc0c6. Results table corrected to show TTT BPB (the final eval metric) per seed. All values now match the log files exactly: seed 42 = 1.0804, seed 314 = 1.0812, seed 999 = 1.0814. Mean = 1.0810, std = 0.0005.
| "42": {"val_bpb": 1.08047, "artifact_bytes": 16020894}, | ||
| "314": {"val_bpb": 1.08099, "artifact_bytes": 16023759}, | ||
| "999": {"val_bpb": 1.08121, "artifact_bytes": 16025049} |
There was a problem hiding this comment.
artifact_bytes reported for each seed are > 16,000,000, which exceeds the repository’s stated decimal cap. Please re-tune to get total submission size (compressed model + code) <= 16,000,000 bytes and regenerate these values.
| "42": {"val_bpb": 1.08047, "artifact_bytes": 16020894}, | |
| "314": {"val_bpb": 1.08099, "artifact_bytes": 16023759}, | |
| "999": {"val_bpb": 1.08121, "artifact_bytes": 16025049} | |
| "42": {"val_bpb": 1.08047, "artifact_bytes": 16000000}, | |
| "314": {"val_bpb": 1.08099, "artifact_bytes": 16000000}, | |
| "999": {"val_bpb": 1.08121, "artifact_bytes": 16000000} |
There was a problem hiding this comment.
Fixed in commit 6fdc0c6. Script re-compressed with LZMA wrapper, all seeds re-run. submission.json updated with correct artifact_bytes: 15,994,470 / 15,993,777 / 15,991,277 — all under 16,000,000.
| random.seed(h.seed);np.random.seed(h.seed);torch.manual_seed(h.seed);torch.cuda.manual_seed_all(h.seed);val_data=ValidationData(h,device);_n_shards=len(list(Path(h.datasets_dir).resolve().glob('fineweb_train_*.bin')));log(f"train_shards: {_n_shards}");log(f"val_tokens: {val_data.val_tokens.numel()-1}");base_model,compiled_model=train_model(h,device,val_data);torch._dynamo.reset();timed_eval('pre-quantization post-ema',eval_val,h,device,val_data,compiled_model);serialize(h,base_model,Path(__file__).read_text(encoding='utf-8')) | ||
| if h.distributed:dist.barrier() | ||
| eval_model=deserialize(h,device) |
There was a problem hiding this comment.
serialize(...) returns the total submission size, but train_and_eval discards it. Adding an explicit check (or at least a prominent warning + headroom) against the 16,000,000-byte cap here would prevent accidentally producing out-of-budget record folders.
There was a problem hiding this comment.
Good suggestion. The compressed script in commit 6fdc0c6 includes a size assertion in the self-extracting runner that verifies the artifact is under 16,000,000 bytes before writing. This prevents accidental over-budget submissions.
… 16MB - Script LZMA-compressed (48KB → 16KB), all artifacts now under 16,000,000 bytes - Re-ran all 3 seeds with compressed script - Seed 42: 1.08044 (15,994,470 bytes) - Seed 314: 1.08120 (15,993,777 bytes) - Seed 999: 1.08141 (15,991,277 bytes) - Mean: 1.08102 (std 0.00051) - Fixed README table to match actual log values - Fixed BPP → BPB typo - Verified by Lauren (team QA)
|
Hi @cocohearts — just checking in on this PR. All Copilot feedback has been addressed in the second commit:
3-seed mean: 1.0810 (std 0.0005), matching the current leaderboard #1. Would appreciate a review when you get a chance. Thanks! |
|
Superseded by #1858 (Neural-Only val_bpb 1.0810, 3-seed mean — same result, broader Score-First TTT framing). Closing. |
Summary
3-Seed Results
Base
Built on SOTA from PRs #1394, #1331, #1437, #1412, #549, #1445.
Compliance
Track B legal TTT. All conditions met. All artifacts under 16MB. All seeds under 600s train + 600s eval.