Skip to content

Robustness improvements for natebrennand codec#15

Merged
agavra merged 5 commits intoagavra:mainfrom
natebrennand:nate/robustness-v2
Jan 31, 2026
Merged

Robustness improvements for natebrennand codec#15
agavra merged 5 commits intoagavra:mainfrom
natebrennand:nate/robustness-v2

Conversation

@natebrennand
Copy link
Contributor

Summary

Improves robustness of the natebrennand codec by using varint encoding for event_id deltas and documenting encoding limits.

Changes

Varint encoding for event_id deltas

  • Previous: u8 encoding (max delta 255, training data max was 251)
  • New: varint u64 encoding (handles arbitrary gaps)
  • Cost: +72 bytes for robustness

Documented encoding limits

Column Encoding Limit Failure Mode
event_type 4-bit packed 16 types Panics if >16 (GitHub has exactly 16)
event_id delta varint u64 unlimited Safe
timestamp delta i16 varint ±9 hours Panics if adjacent events >9h apart
repo_pair_idx 24-bit 16M pairs Safe (60x headroom)

Other

  • Clippy warning fixes
  • cargo fmt

🤖 Generated with Claude Code

natebrennand and others added 5 commits January 30, 2026 16:29
- Switch event_id deltas from u8 to varint-encoded u64
  - Handles arbitrary gaps safely (u8 would overflow at 256)
  - Training data max delta is 251, varint costs only +72 bytes
- Document all 16 GitHub event types in module docs
  - 4-bit encoding supports max 16 types
  - Will break if GitHub adds more event types

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Timestamp deltas use i16 (±32,767 seconds = ~9 hours max gap).
Will break if adjacent events sorted by ID are >9 hours apart.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Size increased from 5,996,164 to 5,996,236 (+72 bytes) due to
varint encoding for event_id deltas, which adds robustness for
datasets with larger gaps between event IDs.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove unnecessary .into_iter() call
- Remove redundant u64 cast (event_id_deltas already Vec<u64>)
- Add #[allow(dead_code)] to debug functions (used only with NATE_DEBUG=1)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
@agavra agavra merged commit 6fe3bf4 into agavra:main Jan 31, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments