Update README leaderboard with recent record submissions#1806
Update README leaderboard with recent record submissions#1806cocohearts merged 4 commits intomainfrom
Conversation
a53635a to
c810781
Compare
c810781 to
5f87b21
Compare
5f87b21 to
f47b26c
Compare
|
Hi @cocohearts - wanted to flag the two Scylla tokenizer submissions. Under the PR conversations, both admit to byte accounting errors that make BPB invalid. |
|
@cocohearts, I second @romeerp's concern. Thank you for looking into this! |
|
Thanks @cocohearts for getting these entries onto the leaderboard — useful for newcomers tracing lineage. Two of the six new rows have known byte-accounting issues that meaningfully change their ranking. Flagging here so the table reflects the actual canonical numbers. TL;DR — affected entries
What the issue isA "record" entry's Two known mismatch classes inflate the byte denominator and thus deflate the reported BPB:
A 1-line diagnostic anyone can run (no rerun needed)For canonical sp8192:
The Scylla case is special because its tokenizer isn't sp8192 — the diagnostic ratio test alone won't catch it; the actual reproducer is in PR #1271's "Reproducing" section. Happy to help cross-check any specific submission. |
|
hi sorry thanks for correction lemme triple check |
Thank you so much, @cocohearts ! |
|
@cocohearts thank you!! could we please also get a ruling on the legality of caseops tokenizers? it's being debated in issue #1604 and it's currently splitting the leaderboard of unmerged record submissions |
… merges (openai#1806) * Update leaderboard with recent record submissions * Keep only valid recent leaderboard rows * Remove invalid Scylla record * Remove non-record Muon TTT submission
Summary
#1019 was already present on
main, so this PR leaves that existing row in place.Checks
git diff --checkrecord submission ready for reviewlabels were removed from Record: MTP-2 Funnel + LeakyReLU(0.75)² + Legal TTT + Parallel Muon #1031, Record: Scylla (novel tokenizer) + Legal Score-First TTT (val_bpb: 1.08056553) #1143, Record: 11L Muon Legal TTT + Entropy-Adaptive Epochs (8×H100) — val_bpb 1.1179 (3-seed mean) #1148, and Record: Scylla + Full GPTQ + XSA-all + FA3 — val_bpb 0.9485 (3-seed mean) #1184