Skip to content

Record: SmearGate BOS Fix 3-Seed Reproduction — val_bpb 1.06145 (3-seed mean)#1868

Open
Christopher-Lee-McClendon wants to merge 1 commit intoopenai:mainfrom
Christopher-Lee-McClendon:submission/record-1851-3seed
Open

Record: SmearGate BOS Fix 3-Seed Reproduction — val_bpb 1.06145 (3-seed mean)#1868
Christopher-Lee-McClendon wants to merge 1 commit intoopenai:mainfrom
Christopher-Lee-McClendon:submission/record-1851-3seed

Conversation

@Christopher-Lee-McClendon
Copy link
Copy Markdown

Record: SmearGate BOS Fix — 3-Seed Reproduction of PR #1851

val_bpb = 1.06145 (3-seed mean, std 0.00068) | ~15.95 MB | 8×H100 SXM 80GB

Summary

Pure reproduction study of PR #1851 by @aquariouseworkman. The training script is byte-identical to the code in #1851. No new techniques or modifications are introduced.

PR #1851 submitted a single-seed result (seed 42, val_bpb = 1.06128). This PR extends that to a 3-seed evaluation (seeds 42, 314, 1234) confirming the result is robust and reproducible.

3-Seed Results

Seed Pre-Quant BPB Quant BPB Post-TTT BPB Artifact (bytes) Train Time Eval Time
42* 1.06490240 1.07405660 1.06128183 15,952,086 599.6s 519.5s
314 1.06467893 1.07358634 1.06086831 15,952,419 599.6s 525.6s
1234 1.06593114 1.07503808 1.06220261 15,952,690 599.5s 479.6s
Mean ± Std 1.06145 ± 0.00068

* Seed 42 result from original PR #1851 author @aquariouseworkman. Seeds 314 and 1234 are independent runs by @Christopher-Lee-McClendon.

Technique Stack

All techniques inherited from #1851 (and its lineage):

Technique Source Author
Base architecture (11L, MLP 4x, MuonEq-R) PR #1787 @nprime06
SmearGate attention PR #1797 @dexhunter
SmearGate BOS fix PR #1851 @aquariouseworkman
LQER Asymmetric quantization PR #1797 @dexhunter
CaseOps SP8192 PR #1729 @romeerp
GPTQ + SP8192 PR #1394 @clarkkev
Score-first TTT (3 phases) PR #549 @abaybektursun
BOS bug identification Issue @cocohearts

Compliance

Budget Limit Worst-Case Status
Artifact size 16,000,000 bytes 15,952,690 bytes
Training time 600s 599.6s
Eval time 600s 525.6s

Reproduction

# Install deps
pip install brotli python-minifier

# Prepare CaseOps SP8192 data
python3 prepare_caseops_data.py  # from romeerp/parameter-golf-caseops-v1

# Train (replace SEED)
SEED=42 CASEOPS_ENABLED=1 EMBED_BITS=7 SMEAR_GATE_ENABLED=1 \
SPARSE_ATTN_GATE_ENABLED=1 MIN_LR=0.1 EMBED_CLIP_SIGMAS=15.0 \
MLP_CLIP_SIGMAS=12.0 GPTQ_RESERVE_SECONDS=0.5 PHASED_TTT_NUM_PHASES=3 \
torchrun --standalone --nproc_per_node=8 train_gpt.py

Hardware: 8×H100 SXM 80GB (RunPod, matotezitanka/proteus-pytorch:community image)

Credits

3-seed reproduction of PR openai#1851 (SmearGate BOS document boundary fix).
Code is byte-identical to openai#1851 by @aquariouseworkman.

Results (post-TTT BPB):
  Seed 42:   1.06128  (original openai#1851 author)
  Seed 314:  1.06087  (this submission)
  Seed 1234: 1.06220  (this submission)
  Mean:      1.06145 ± 0.00068

All artifacts < 16,000,000 bytes. All runs < 600s.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@aquariouseworkman
Copy link
Copy Markdown
Contributor

You are amazing!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants