Record: BIJEPAX-lite JEPA + SP8192 CaseOps PPM — val_bpb 0.97271 by NewyorkDev · Pull Request #2080 · openai/parameter-golf

NewyorkDev · 2026-05-01T03:43:56Z

BIJEPAX-lite JEPA + SP8192 CaseOps PPM

This record submits a Claude-designed, JEPA-inspired training-only auxiliary regularizer on top of the SP8192 CaseOps + per-group compression + PPM sliding stack.

The final 3-seed mean is:

ppm_sliding val_bpb: 0.97271454

Results

Seed	Final `ppm_sliding val_bpb`	Quantized diagnostic	Artifact bytes	Train stop	Eval time	Exit
42	`0.97234287`	`1.11544494`	`15,997,180`	`2014` steps / `599.843s`	`502.131s`	`0`
314	`0.97206308`	`1.11562304`	`15,999,539`	`2012` steps / `599.586s`	`499.038s`	`0`
999	`0.97373767`	`1.11757370`	`15,997,593`	`2013` steps / `599.821s`	`496.384s`	`0`

Three-seed sample std: 0.00089703.

All three runs are under:

strict decimal 16,000,000 byte artifact cap
600s training cap
600s evaluation cap

What is new

BIJEPAX-lite adds a small custom JEPA-style hidden-state prediction objective during training:

hop-4 forward hidden-state prediction
hop-4 backward hidden-state prediction
cosine embedding-space loss
LayerNorm-stabilized predictor heads
no cycle head in the submitted lightweight config
active only from 35% to 80% of the wallclock schedule
separate optimizer and separate module from the base GPT

The predictor heads are not serialized. Final scoring is performed by the quantized base model with the existing causal PPM sliding evaluator.

Compliance notes

TTT_ENABLED=0
LQER_TOP_K=1 keeps all seeds below the strict byte cap
SmearGate BOS masking is present for packed-document cross-boundary safety
BIJEPAX-lite trains only on training batches from DocumentPackingLoader
BIJEPAX-lite does not access validation tokens or validation byte sidecars during training
Final score is from ppm_sliding

The folder includes:

train_gpt.py
three seed logs
full source/log captures for each seed
submission.json
LEGALITY_AUDIT.md
STATIC_AUDIT_NOTES.md
REFERENCES.md
JEPA.mp4 as a short visual/demo asset

Acknowledgements

Thanks to Claude for designing the custom BIJEPAX-lite auxiliary objective and helping turn the JEPA idea into a runnable candidate. Thanks to Codex for implementing the run path, auditing legality, coordinating the 3-seed package, and assembling this PR. Thanks also to the Parameter Golf community for the public ideas and fast iteration that this stack builds on.

Validation

python3 -m py_compile records/track_10min_16mb/2026-05-01_BIJEPAXLite_JEPA_PPM_0.97271/train_gpt.py
python3 -m json.tool records/track_10min_16mb/2026-05-01_BIJEPAXLite_JEPA_PPM_0.97271/submission.json
3 full remote runs on 8xH100 completed with rc=0

JEPA.mp4

Attribution update

I expanded README.md and REFERENCES.md in this PR to explicitly credit the inherited public Parameter Golf components: SP8192/tokenizer and recurrence lineage (PR #1394, #1493, #1855), byte-PPM lineage (PR #1795, #1959, #1991), SmearGate/BOS masking lineage (modded-nanogpt @classiclarryd, PR #1667, #1797, #2014), compression lineage (PR #1586, #1667, #1729), quantization/optimizer/scoring pieces (PR #1530, #1886, #1923, #1344, #1145, #1967), and JEPA-Lite local precedent (PR #2027). The BIJEPAX-lite-specific contribution remains the Claude-designed training-only bidirectional hop-4 hidden-state prediction objective and the run package around it.

cocohearts · 2026-05-02T18:15:09Z

Leaderboard audit note (pre-cutoff state): I don't think this is valid as a record row. The PPM/byte-mixer score has the same C2 normalization problem: it scores the realized byte sequence and mixes that with the NN probability after knowing the realized token, rather than committing a full normalized distribution over possible next tokens/bytes at the scoring point. So the 0.9727 BPB headline should not be merged as a leaderboard score.

NewyorkDev added 2 commits April 30, 2026 23:43

Record BIJEPAX-lite JEPA PPM submission

dafa2ff

Expand JEPA attribution notes

0a2bee8

cocohearts mentioned this pull request May 2, 2026

Update leaderboard with May 1 audited rows #2146

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: BIJEPAX-lite JEPA + SP8192 CaseOps PPM — val_bpb 0.97271#2080

Record: BIJEPAX-lite JEPA + SP8192 CaseOps PPM — val_bpb 0.97271#2080
NewyorkDev wants to merge 2 commits intoopenai:mainfrom
NewyorkDev:codex/bijepaxlite-jepa-097271

NewyorkDev commented May 1, 2026 •

edited

Loading

Uh oh!

cocohearts commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NewyorkDev commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

BIJEPAX-lite JEPA + SP8192 CaseOps PPM

Results

What is new

Compliance notes

Acknowledgements

Validation

Attribution update

Uh oh!

cocohearts commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NewyorkDev commented May 1, 2026 •

edited

Loading