Skip to content

Comments

5854037 bytes#16

Closed
XinyuZeng wants to merge 2 commits intoagavra:mainfrom
XinyuZeng:xinyu-sorted-events-squashed
Closed

5854037 bytes#16
XinyuZeng wants to merge 2 commits intoagavra:mainfrom
XinyuZeng:xinyu-sorted-events-squashed

Conversation

@XinyuZeng
Copy link

TL;DR Tried various methods but natebrennand's general approach still works well. The major improvements come from using LZMA to compress the repo column that yields better results than ZSTD.

  • Add columnar storage for repository data with FoR encoding
  • Implement varint delta encoding for sorted u64 values
  • Add 24-bit packing/unpacking for u32 values
  • Add patched bitpacking with outlier handling, but discarded
  • Use LZMA for repo-related data compression
  • Implement streaming decoding for varint delta and FoR values
  • Sort events by id and use zstd level 22
  • Various optimizations and cleanup

XinyuZeng and others added 2 commits February 1, 2026 12:41
- Add columnar storage for repository data with FoR encoding
- Implement varint delta encoding for sorted u64 values
- Add 24-bit packing/unpacking for u32 values
- Add patched bitpacking with outlier handling
- Use LZMA for repo-related data compression
- Implement streaming decoding for varint delta and FoR values
- Sort events by id and use zstd level 22
- Various optimizations and cleanup

Co-authored-by: Cursor <cursoragent@cursor.com>
…achieved by xinyuzeng, replacing the previous score of 5,996,236 bytes.
@agavra
Copy link
Owner

agavra commented Feb 2, 2026

Hi @XinyuZeng - very neat! For the spirit of the competition I prefer that we avoid adding new dependencies outside of zstd (that way the focus is on the compression of the data itself instead of trying different bytewise compression libraries to shave a few bytes). I'm pretty certain that even with zstd insteadof LZMA you will be near the top of the leaderboard. Do you want to try that out?

I will update the competition instructions to verify this, I hope you learned a lot during the challenge though!

@XinyuZeng
Copy link
Author

Yes, switching block compression isn't fancy at all! Just found the repo names is the major bottleneck to optimize so trying better string compression here. Closing this and going to vibe a new one if I have time!

@XinyuZeng XinyuZeng closed this Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants