Skip to content

Some new tricks#10

Merged
agavra merged 2 commits intoagavra:mainfrom
hachikuji:hachikuji-improved
Jan 30, 2026
Merged

Some new tricks#10
agavra merged 2 commits intoagavra:mainfrom
hachikuji:hachikuji-improved

Conversation

@hachikuji
Copy link
Contributor

Approach:

  • Dictionary compression: Repo names stored once with prefix compression; repos referenced by index
  • Columnar layout: Separate streams for event types, repo indices, IDs, and timestamps
  • Global sorting: Events sorted by (type, id) to minimize deltas
  • Arithmetic coding: Repo indices encoded with frequency-based arithmetic coding
  • Timestamp encoding: 2-bit categories (zero/one/small/large) pack common cases tightly
  • Zstd finisher: Level 22 compression on the prepared binary stream

Down to 6.52MB. I tried a bunch of other stuff, but this was my limit. Can't wait to see how it gets beat!

@agavra agavra merged commit e0a00a7 into agavra:main Jan 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments