Merged
Conversation
Experimental codec developed using Claude Code's Ralph Loop for automated iteration. Achieves ~6.40MB on the 1M event dataset. Key techniques: - 2-bit category encoding for ID deltas (0/2/4 as common zigzag values) - 2-bit category encoding for timestamp deltas (0/1/2 as common values) - Columnar layout with per-column zstd compression - Timestamp sorting within row groups for optimal ID delta encoding - 140K row group size Validated on 5 different GitHub Archive datasets spanning March 2023 to January 2025 - consistently beats previous best by 5.6-10%. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
Owner
|
Thanks for the submission! I guess this proves that our AI overlords are pretty darn good π¨ now I'm extra hoping that someone comes in and improves on this with manual human intuition haha! Just a format issue causing check failure. I'll merge and address when I update the leaderboard. Confirmed with CI/CD: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This codec was developed as an experiment with Claude Code's Ralph Loop - an automated iteration system that explores solution spaces. The goal was to see how far automated exploration could push compression on this problem.
The Ralph Loop iteratively tested various compression techniques, keeping what worked and discarding what didn't. This submission represents the result of that exploration rather than a carefully hand-crafted solution.
Result: 6,402,499 bytes (~6.40 MB) on the 1M event dataset.
Techniques Discovered by the Loop
What Didn't Work
The loop also tried several approaches that made compression worse:
Validation
Tested on 5 different GitHub Archive datasets spanning March 2023 to January 2025 to verify the techniques generalize and aren't overfit to the training data.
π€ Generated with Claude Code using the Ralph Loop