Progressive Depth + Hedge Mixer — val_bpb 1.1441 (3-seed mean) by iverbovoy · Pull Request #1384 · openai/parameter-golf

iverbovoy · 2026-04-05T14:47:34Z

Summary

val_bpb: 1.1441 (3-seed mean, std 0.0051) — 5-expert Hedge Mixer eval
sliding_bpb: 1.1960 (3-seed mean) — standard sliding window eval
3 shared blocks × 4 repeats = 12 effective layers, 17.14M params
Progressive depth training (2→3→4 repeats): +30% training steps vs fixed depth
int8 + zstd-22, ~15.88 MB artifact
8×H100 SXM, PyTorch 2.5.1, 600s training + ~582s Hedge eval

3-Seed Results

Seed	Steps	Roundtrip bpb	Sliding bpb	Hedge bpb
1337	5,668	1.2302	1.1965	1.1441
42	5,170	1.2298	1.1962	1.1491
7	5,405	1.2286	1.1952	1.1390
Mean	5,414	1.2295	1.1960	1.1441

Key Innovations

Depth Recurrence: 3 shared blocks repeated 4× with cross-repeat skip connections, loop embeddings, and value embeddings
Progressive Depth Training: Train at 2 repeats (fast) → 3 → 4 repeats (full depth), gaining +30% steps
Hedge Mixer: Eval-time 5-expert online ensemble (neural + unigram + bigram + trigram + entropy) providing −0.052 bpb improvement

Prior Work in This Repo

This submission consolidates our iterative work across several earlier PRs:

PR	Score	What
#148	1.2196	Depth recurrence architecture
#784	1.2065	+ XSA, LeakyReLU²
#835	1.1980	+ Progressive depth
#856	1.1454	+ Hedge Mixer (1 seed)
This PR	1.1441	Clean submission, 3-seed validation

Previous PRs will be closed in favor of this clean submission.

…eed mean) 3 shared blocks × 4 repeats (12 effective layers), MLP 3× (d=880), int7 attention (63 levels) + int5 MLP (16 levels) mixed quantization, 8-GPU parallel Hedge Mixer eval (164s). Key finding: int7 is the sweet spot for attention quantization — recovers 98% of int8 hedge quality while saving 2MB for a wider model. Improves on PR openai#1384 (1.1441) by −0.012 bpb.

iverbovoy · 2026-04-07T22:19:51Z

Superseded by #1453 (1.1324 bpb, int7 mixed quantization). Keeping for historical reference — this was the first 3-seed validated submission in this depth recurrence line.

Record: Progressive Depth + Hedge Mixer — val_bpb 1.1441 (3-seed mean)

46e17ef

iverbovoy changed the title ~~Record: Progressive Depth + Hedge Mixer — val_bpb 1.1441 (3-seed mean)~~ Progressive Depth + Hedge Mixer — val_bpb 1.1441 (3-seed mean) Apr 5, 2026

iverbovoy mentioned this pull request Apr 7, 2026

Non-record: Depth Recurrence + Int7 Mixed Quant — val_bpb 1.1324 (3-seed mean) #1453

Open

5 tasks

iverbovoy closed this Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Progressive Depth + Hedge Mixer — val_bpb 1.1441 (3-seed mean)#1384

Progressive Depth + Hedge Mixer — val_bpb 1.1441 (3-seed mean)#1384
iverbovoy wants to merge 1 commit intoopenai:mainfrom
iverbovoy:submission/progressive-depth-hedge-mixer

iverbovoy commented Apr 5, 2026

Uh oh!

iverbovoy commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

iverbovoy commented Apr 5, 2026

Summary

3-Seed Results

Key Innovations

Prior Work in This Repo

Uh oh!

iverbovoy commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant