[Record candidate] TTT Peer-LoRA Ensemble on PR #2014, val_bpb = 1.05749 by varunneal · Pull Request #2139 · openai/parameter-golf

varunneal · 2026-05-01T23:57:50Z

val_bpb = 1.05749 (1 seed) | ~15.99 MB | 8xH100 SXM | PyTorch 2.10.0+cu130

This record introduces peer-LoRA ensembling into the test-time training (TTT) evaluation loop. After each batch's per-doc LoRAs are fully trained, we run k-1 additional forwards using other docs' trained LoRAs from the same batch. This is leakage-free: LoRA_p was trained only on doc_p's tokens, so applying it to doc_q reveals no target information. On uncertain tokens (high predictive entropy), we blend own and peer predictions in probability space; confident tokens use only their own prediction. The routing decision is target-free -- it depends only on the model's output distribution, not on validation labels.

Built on PR #2014, descending from @samacqua's work on doc-independent LoRAs.

Results

Seed	Pre-Quant BPB	Post-Quant BPB	Post-TTT BPB	Artifact
42	1.05899	1.06755	1.05749	15,986,824

Baseline PR #2014 3-seed mean: val_bpb 1.05855 (as reported by @simonbissonnette).

Delta: -0.00106 vs PR #2014 baseline (1.05855)

Key Changes vs PR #2014

1. Peer-LoRA ensemble with confidence routing

After each batch's per-doc LoRAs finish sliding-window training (k docs per batch -> k independent LoRAs), run k-1 peer forwards per doc using other docs' LoRAs:

Stash phase: during the normal sliding-window eval, stash each doc's per-token NLLs and predictive entropies (entropy of the output distribution -- no target labels used).
Peer phase: for each doc, run k-1 forwards with randomly-selected peer LoRAs from the same batch. BatchedLinearLoRA.PEER_IDX routes each batch row to a different doc's LoRA weights.
Blend: on tokens where predictive_entropy >= threshold (uncertain), blend: p = w * p_own + (1 - w) * mean(p_peers). Confident tokens use p_own only.

The routing gate is target-free: it uses the model's own entropy, not validation NLLs. This means the ensemble prediction is committed before seeing targets, avoiding post-hoc selection.

With threshold = 0.5, roughly 75% of tokens are routed through the ensemble.

2. TTT hyperparameter tuning

Per-doc LoRA LR and weight decay were tuned via line search (on a single H100, using TTT_EVAL_ONLY to skip retraining):

Param	PR #2014	This submission
`TTT_LORA_LR`	0.0001	0.00015
`TTT_WEIGHT_DECAY`	0.5	0.25

Higher LR lets the per-doc LoRAs fit more aggressively; lower weight decay gives them more freedom. Both changes improve the baseline and the peer ensemble independently.

New Env Vars

Env var	Default	Description
`TTT_PEER_ENSEMBLE_K`	4	Peers per batch incl. self (set 1 to disable)
`TTT_PEER_CONF_THRESHOLD`	0.5	Predictive entropy threshold for routing
`TTT_PEER_CONF_BLEND_W`	0.8	Weight on own prediction in blend

Reproducing

Uses the same CaseOps sp8192 dataset/tokenizer as PR #2014, sourced from HuggingFace:

Dataset: romeerp/parameter-golf-caseops-v1
Variant: sp8192_lossless_caps_caseops_v1_reserved

# Install lrzip (artifact compression)
sudo apt-get install -y lrzip

# Download data
python3 data/cached_challenge_fineweb.py --variant sp8192_lossless_caps_caseops_v1_reserved

# Run
SEED=42 torchrun --standalone --nproc_per_node=8 train_gpt.py

All hyperparameters (CASEOPS_ENABLED=1, VOCAB_SIZE=8192, ensemble settings, etc.) are baked into train_gpt.py.

Hardware / Software

8xH100 80GB SXM
PyTorch 2.10.0+cu130
lrzip 0.651 (for pergroup compression)

Attribution

See submission.json. Built on the PR #2014 stack (@simonbissonnette and earlier contributors).

…LR=0.00015, WD=0.25

varunneal · 2026-05-02T00:00:33Z

Submission mostly just for fun to get one in before the deadline. I only have one seed :p

…es, paper scan Post-deadline PR activity: PR openai#2138 Lock-In Byte Mixer confirmed BPB bug (corrected ~1.0671, not 0.979556); PR openai#2135 codemath3000 1.05651 narrowly misses 0.005 threshold; PR openai#2139 TTT Peer-LoRA Ensemble novel technique; PR openai#2140 flagged for target-token n-gram gating violation. New papers: BBQ quantization (ICLR 2026, arXiv:2603.01599), EntroLLM (2505.02380), In-Place TTT NTP-aligned (2604.06169). https://claude.ai/code/session_01CxuVyZaKMxMMc8Q4sMb2dF

cocohearts · 2026-05-02T18:15:25Z

Leaderboard audit note (pre-cutoff state): I don't think this is valid as a record row. The evidence is only one seed despite submission.json claiming three_seeds=true. Separately, the peer-LoRA ensemble description appears to train/use same-batch validation-derived LoRAs for final scoring, so the strict original-order score-first property is not established.

varunneal added 2 commits May 1, 2026 19:46

TTT Peer-LoRA Ensemble on PR openai#2014: K=3, threshold=0.5, w=0.8, …

bf23ef1

…LR=0.00015, WD=0.25

.

2e2bc50

varunneal closed this May 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Record candidate] TTT Peer-LoRA Ensemble on PR #2014, val_bpb = 1.05749#2139

[Record candidate] TTT Peer-LoRA Ensemble on PR #2014, val_bpb = 1.05749#2139
varunneal wants to merge 2 commits intoopenai:mainfrom
varunneal:ttt-ensemble-clean

varunneal commented May 1, 2026 •

edited

Loading

Uh oh!

varunneal commented May 2, 2026

Uh oh!

cocohearts commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

varunneal commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Results

Key Changes vs PR #2014

1. Peer-LoRA ensemble with confidence routing

2. TTT hyperparameter tuning

New Env Vars

Reproducing

Hardware / Software

Attribution

Uh oh!

varunneal commented May 2, 2026

Uh oh!

cocohearts commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

varunneal commented May 1, 2026 •

edited

Loading