Skip to content

SP8192 + LongCtx NoQV QK5.25 Prefix2750 — 1.05827 BPB (seed 42)#1963

Open
someone114514 wants to merge 2 commits intoopenai:mainfrom
someone114514:longctx-noqv-prefix2750-1953
Open

SP8192 + LongCtx NoQV QK5.25 Prefix2750 — 1.05827 BPB (seed 42)#1963
someone114514 wants to merge 2 commits intoopenai:mainfrom
someone114514:longctx-noqv-prefix2750-1953

Conversation

@someone114514
Copy link
Copy Markdown

Summary

Single-seed result: val_bpb=1.05826976 on seed 42.

This PR is a deliberately narrow follow-up to #1953. It keeps the #1953 training, quantization, tokenizer, CaseOps data, long-context eval, no_qv TTT mask, TTT local LR multiplier, and QK gain stack intact, and changes only the phased-TTT prefix schedule:

Result

Run Seed Prefix docs Final BPB Eval time Total bytes
#1953 seed42 reference 42 2500 1.05824720 430.0s 15,988,861
This PR 42 2750 1.05826976 495.0s 15,978,173

The larger prefix lands within 0.00003 BPB of the #1953 seed42 reference while staying under the 16 MB cap.

Files

Reproduce

The README contains the full 8x GPU command. CaseOps data should be prepared with:

python3 records/track_10min_16mb/2026-04-30_LongCtx_NoQV_QK525_Prefix2750/download_caseops_data.py --local-dir /workspace/caseops_data

@aquariouseworkman
Copy link
Copy Markdown
Contributor

must beat current SOTA by >or - 0.005 nats

@cocohearts
Copy link
Copy Markdown
Collaborator

Leaderboard audit note (pre-cutoff state): I don't think this is record-ready as submitted. The PR reports only a single seed and no std/p-value evidence, and the seed-42 number is essentially a narrow follow-up around PR #1953 rather than a statistically supported new frontier row. Please rerun a matching 3-seed package if this is intended as a leaderboard claim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants