Closed
Conversation
- Add columnar storage for repository data with FoR encoding - Implement varint delta encoding for sorted u64 values - Add 24-bit packing/unpacking for u32 values - Add patched bitpacking with outlier handling - Use LZMA for repo-related data compression - Implement streaming decoding for varint delta and FoR values - Sort events by id and use zstd level 22 - Various optimizations and cleanup Co-authored-by: Cursor <cursoragent@cursor.com>
…achieved by xinyuzeng, replacing the previous score of 5,996,236 bytes.
Owner
|
Hi @XinyuZeng - very neat! For the spirit of the competition I prefer that we avoid adding new dependencies outside of zstd (that way the focus is on the compression of the data itself instead of trying different bytewise compression libraries to shave a few bytes). I'm pretty certain that even with zstd insteadof LZMA you will be near the top of the leaderboard. Do you want to try that out? I will update the competition instructions to verify this, I hope you learned a lot during the challenge though! |
Author
|
Yes, switching block compression isn't fancy at all! Just found the repo names is the major bottleneck to optimize so trying better string compression here. Closing this and going to vibe a new one if I have time! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR Tried various methods but natebrennand's general approach still works well. The major improvements come from using LZMA to compress the repo column that yields better results than ZSTD.