Skip to content

Conversation

@acooks
Copy link
Owner

@acooks acooks commented Dec 28, 2025

No description provided.

acooks and others added 3 commits December 28, 2025 10:32
Add benchmarking infrastructure to measure performance of identified
hot paths in the packet processing pipeline. Benchmarks use rdtsc for
cycle-accurate timing and avoid ASAN to get accurate measurements.

Benchmarks included:
- bench-decode: Header parsing overhead (decode vs re-parse)
- bench-malloc: Per-packet allocation cost (malloc vs ring buffer)
- bench-rotation: Interval table rotation (copy vs swap)
- bench-sort: Flow sorting (full sort vs partial top-N)

Initial results show significant optimization opportunities:
- Ring buffer: 72% fewer cycles than malloc/free
- Partial sort: 86% fewer cycles than HASH_SRT
- Swap rotation: 37% fewer cycles than copy

Run with: make -C deps/toptalk bench

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add bench_regression.c with automated performance threshold checks
for critical operations:
- Header decoding: <500 cycles
- Ring buffer allocation: <100 cycles
- Top-N flow selection: <5000 cycles (100 flows)

Run with: make -C deps/toptalk bench-test

This ensures performance improvements are not regressed by future
changes. Thresholds are set conservatively high to avoid false
failures on varying hardware.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add Test 4 to bench_regression.c measuring the actual per-packet hot
path through update_stats_tables(). Tests two scenarios:

- Single flow: ~57 cycles/op (hash hit case)
- 10K flows: ~200 cycles/op (hash table stress)

Adds benchmark hooks to intervals.h/c:
- tt_bench_init(): Initialize data structures without pcap
- tt_bench_cleanup(): Clean up after benchmarking
- tt_bench_update_stats(): Process decoded packet through stats path

Results show the optimizations achieved the 200 cycle/packet target,
enabling ~15M pps theoretical throughput on a 3 GHz CPU.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@acooks acooks merged commit ce19599 into master Dec 28, 2025
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants