Skip to content

Conversation

evanlinjin
Copy link
Member

Description

Implements a single-level skiplist for CheckPoint to improve traversal performance from O(n) to O(√n).

Notes to the reviewers

The skiplist uses checkpoint indices (not block heights) with a fixed interval of 100. This ensures consistent skip pointer distribution even with sparse checkpoint chains.

Key implementation details:

  • Skip pointers are set every 100 checkpoints based on index
  • The insert() method rebuilds indices to maintain skiplist invariants
  • All existing APIs remain unchanged

Changelog notice

Added

  • Skiplist support for CheckPoint with skip pointer and index fields
  • O(√n) traversal for get(), floor_at(), and range() methods
  • Performance benchmarks demonstrating ~265x speedup for deep searches in 10k checkpoint chains

Checklists

All Submissions:

New Features:

  • I've added tests for the new feature
  • I've added docs for the new feature

🤖 Generated with Claude Code

@evanlinjin evanlinjin marked this pull request as draft September 25, 2025 08:08
@evanlinjin
Copy link
Member Author

Guys, this is purely done by Claude. I haven't reviewed it yet.

@evanlinjin evanlinjin force-pushed the feature/skiplist branch 2 times, most recently from 153c401 to 098c076 Compare September 25, 2025 09:25
@evanlinjin evanlinjin moved this to In Progress in BDK Chain Sep 25, 2025
@evanlinjin
Copy link
Member Author

Performance Benchmark Comparison

Benchmarks comparing the old O(n) implementation vs new skiplist O(√n) implementation for a 10,000 checkpoint chain:

🎯 Key Results

Operation Old Implementation Skiplist Implementation Speedup
get(100) - near start 98.270 μs 421 ns 233x faster
get(9000) - near end 9.668 μs 44 ns 220x faster
linear_traversal(100) 56.965 μs 110.66 μs 0.5x (expected*)

📊 Detailed Benchmarks

Finding checkpoint at position 100 (from 10k chain):

  • Old: 98.270 μs - Linear search from tip
  • New: 421 ns - Skip pointers jump directly to target
  • Improvement: 233x faster 🚀

Finding checkpoint at position 9000 (from 10k chain):

  • Old: 9.668 μs - Linear search through 1000 nodes
  • New: 44 ns - Skip pointers minimize traversal
  • Improvement: 220x faster 🚀

* Note: The linear_traversal benchmark shows the new implementation is slightly slower because it's doing the same linear traversal but with additional overhead from the skip/index fields. The real performance gains come from using the skiplist-aware methods like get(), floor_at(), and range().

Summary

The skiplist implementation provides massive performance improvements for checkpoint lookups, especially for deep searches in long chains. The O(√n) complexity is clearly demonstrated with 200x+ speedups in real-world scenarios.

@evanlinjin evanlinjin self-assigned this Sep 25, 2025
@evanlinjin evanlinjin force-pushed the feature/skiplist branch 2 times, most recently from 5544fee to 4b9ccd1 Compare October 17, 2025 10:36
@evanlinjin evanlinjin marked this pull request as ready for review October 17, 2025 12:14
@evanlinjin
Copy link
Member Author

evanlinjin commented Oct 17, 2025

Guys, this is purely done by Claude. I haven't reviewed it yet.

It's now fully reviewed by myself! Made many simplifications.

Let's merge #2055 and rebase this on top of that!

@evanlinjin
Copy link
Member Author

Skiplist Performance Update

After the optimizations, here are the updated benchmark results:

get() Performance

Benchmark Time Notes
get_100_near_start 475.89 ns Get checkpoint near start of 100-item chain
get_1000_middle 31.07 ns Get checkpoint in middle of 1000-item chain
get_10000_near_end 57.12 ns Get checkpoint near end of 10000-item chain
get_10000_near_start 535.37 ns Get checkpoint near start of 10000-item chain

floor_at() Performance

Benchmark Time Notes
floor_at_1000 286.33 ns Floor at height 750 in 1000-item chain
floor_at_10000 673.27 ns Floor at height 7500 in 10000-item chain

range() Performance

Benchmark Time Notes
range_1000_middle_10pct 1.67 µs Range 450..=550 in 1000-item chain
range_10000_large_50pct 97.59 µs Range 2500..=7500 in 10000-item chain
range_10000_from_start 3.11 µs Range ..=100 in 10000-item chain
range_10000_near_tip 1.21 µs Range 9900.. in 10000-item chain
range_single_element 942.21 ns Range 5000..=5000 in 10000-item chain

Traversal Comparison

Benchmark Time Notes
linear_traversal_10000 140.90 µs Linear search to height 100 in 10000-item chain
skiplist_get_10000 539.80 ns Skip-enhanced search to height 100 in 10000-item chain

Speedup: 261x faster with skip pointers!

Summary

The skip list implementation successfully achieves O(√n) time complexity for search operations. Key improvements from our optimizations:

  1. Cleaner two-phase traversal in get() and range()
  2. Simplified floor_at() from 33 lines to 1 line
  3. Restored elegant insert() implementation (removed 60+ lines)
  4. Refactored push() with clearer skip pointer logic

All tests pass and the implementation is now both performant and maintainable.

evanlinjin and others added 9 commits October 19, 2025 10:54
Add skip pointers and index tracking to CheckPoint structure with
CHECKPOINT_SKIP_INTERVAL=100. Update get(), floor_at(), range(),
insert() and push() methods to leverage skip pointers.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Test index tracking, skip pointer placement, get/floor_at/range
performance, and insert operation with index maintenance.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Demonstrate ~265x speedup for deep searches in 10k checkpoint chains.
Linear traversal: ~108μs vs skiplist get: ~407ns.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Split skip pointer and linear traversal into separate loops for better
performance. Benchmarks show 99% improvement for middle-range queries
and 30% improvement for small chains.
Apply the same two-phase optimization from get() to range():
- Phase 1: Use skip pointers exclusively to jump close to target
- Phase 2: Linear traversal for precise positioning

Additional improvements:
- Extract is_above_bound helper as local closure
- Add comprehensive edge case tests
- Improve benchmark coverage for different access patterns

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Replace the manual traversal logic with a simple delegation to range().
This eliminates code duplication and reuses all the optimizations from
the range() method.

The new implementation is just:
  self.range(..=height).next()

Performance impact:
- Significant improvement for smaller chains (85% faster)
- Minor regression for very large chains due to iterator setup
- Overall worth it for the massive code simplification
Remove unnecessary push_with_index() helper and restore the clean
implementation from master that uses iter::once().chain() with extend().

The complex manual index management was not needed - extend() correctly
handles index assignment and skip pointer calculation automatically.

Removes 60+ lines of unnecessary code while maintaining all functionality
and performance.
- Use early return pattern for readability
- Add `needs_skip_pointer` variable for clarity
- Simplify traversal to straightforward step counting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
evanlinjin and others added 2 commits October 19, 2025 11:34
Skip pointers at index 100+ create additional Arc references to earlier
checkpoints. The test now expects 3 strong refs to genesis instead of 2.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant