Add a solution of 5,784,824 bytes by XinyuZeng · Pull Request #19 · agavra/compression-golf

XinyuZeng · 2026-02-03T11:49:06Z

Thanks @agavra for the golf :)! I learned a lot on how to vibe code on a performance problem through this. Especially after the first closed PR, in this new iteration I fixed some mediocre vibe settings and let the agents self-evolve with observable performance and trackable optimization history.

Fix MTF performance regression by limiting alphabet to top 4096 frequent repo names. Infrequent repos use fallback encoding (raw indices). - Runtime: 2+ minutes → ~12 seconds (10x faster) - Size: 5,723,601 → 5,784,824 bytes (+61KB, +1.1%) The full MTF with 261K unique repos was O(n*m) = billions of operations. Limited alphabet keeps MTF benefits for frequent repos while avoiding the quadratic blowup for the long tail.

Co-authored-by: Codex <[email protected]> Co-authored-by: OpenCode <[email protected]>

Training data had max delta 251 (fits in u8), but test data has deltas up to 2689+ causing silent truncation and decode failures. Switch to LEB128 varint encoding which handles arbitrary delta sizes. Minimal size impact on training data (+3 bytes: 5,784,824 -> 5,784,827). Tested on 3 random GitHub Archive hours - xinyuzeng ranks agavra#1 on all.

agavra · 2026-02-03T16:23:51Z

Confirmed over CI/CD, very nicely done @XinyuZeng!

┌────────────────────────┬────────────────┬────────────┐
│ Codec                  │           Size │ vs Naive   │
├────────────────────────┼────────────────┼────────────┤
│ Naive                  │    210,727,389 │   baseline │
│ xinyuzeng              │      5,784,827 │     -97.3% │
└────────────────────────┴────────────────┴────────────┘

XinyuZeng and others added 17 commits February 3, 2026 12:29

Add xinyuzeng codec (6,522,467 bytes)

ae69e50

Improve xinyuzeng repo_id_idx (6,488,673 bytes)

0cc19ad

Document xinyuzeng failed attempts (6,488,673 bytes)

cef705d

Delimiter repo names (6,345,953 bytes)

f8e279d

Row-group id/ts deltas (6,384,798 bytes)

4ca9e0e

Note front-coding attempt (6,345,953 bytes)

b398665

Name-first repo mapping (6,286,709 bytes)

234e282

Remove repo_name_variant_idx (6,283,516 bytes)

d93a281

Split base from id_deltas (6,214,202 bytes)

4f3e811

Remove zigzag for id_deltas (6,098,398 bytes)

8a19932

Byte plane splitting for repo_ids (6,006,181 bytes)

2d3b245

MTF + byte planes for repo_name_idx (5,723,601 bytes)

42b2b60

Document failed optimization attempts (5,784,824 bytes)

2f0158a

Remove AGENTS.md from git tracking

3d0426c

Update codec description to reflect current strategy

2790b16

Co-authored-by: Codex <[email protected]> Co-authored-by: OpenCode <[email protected]>

XinyuZeng force-pushed the xinyuzeng branch from 7bd16e4 to cff8ee4 Compare February 3, 2026 12:07

Merge branch 'main' into xinyuzeng

1ddab28

agavra merged commit b0f162f into agavra:main Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a solution of 5,784,824 bytes#19

Add a solution of 5,784,824 bytes#19
agavra merged 18 commits intoagavra:mainfrom
XinyuZeng:xinyuzeng

XinyuZeng commented Feb 3, 2026

Uh oh!

agavra commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

XinyuZeng commented Feb 3, 2026

Uh oh!

agavra commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments