A high-concurrency benchmark suite for Approximate Nearest Neighbor (ANN) indexes with mixed read/write workloads.
| Feature | Description | |
|---|---|---|
| β‘ | Concurrent | Real-time read/write benchmarking with configurable rates |
| π | Streaming | Incremental inserts with recall tracking over time |
| βοΈ | Comparison | Record/replay mode for lock strategy analysis |
| π§© | Flexible | Multiple query modes: round_robin, chasing, peeking |
| π | Profiling | Memory tracking via jemalloc integration |
| π¨ | Visualization | Auto-generated recall plots and diff analysis |
| Algorithm | Description |
|---|---|
| HNSW | Hierarchical Navigable Small World |
| ANNchor | ANNchor with future-skip optimization |
| Vamana | DiskANN's Vamana graph |
| ParlayANN | Parallel ANN (HNSW/Vamana variants) |
Datasets: SIFT (128d) Β· GIST (960d) Β· Deep1M (96d) Β· MSong (420d) Β· FreewayML
| Requirement | Version |
|---|---|
| Go | 1.21+ |
| GCC/G++ | C++17 support |
| CMake | 3.16+ |
| jemalloc | - |
# 1. Build native libraries
cmake -S bench -B bench/build && cmake --build bench/build -j
cmake -S utils -B utils/build && cmake --build utils/build -j
# 2. Build benchmark binary
cd bench && go build -o bench .# Single config
./bench -config config/annchor/sift/ch_b20.yaml
# Batch experiments
bash run.shConfig path:
bench/config/<algorithm>/<dataset>/*.yaml
| Pattern | Description |
|---|---|
rr.yaml |
Round-robin mode, multiple batch sizes |
ch_b{N}.yaml |
Chasing mode, batch_size=N |
pk_b{N}.yaml |
Peeking mode, batch_size=N |
overall_only.yaml |
Overall recall test (no streaming) |
seg.yaml |
Segmented index |
data:
dataset_name: sift # dataset identifier
max_elements: -1 # max index size (-1 = unlimited)
begin_num: 50000 # initial index size before streaming
data_type: float # float | int8 | uint8
data_path: ../data/sift/base.bin
incr_query_path: ../data/sift/query_stream.bin
incr_gt_path: ../data/sift/stream_i20_offset_index.txt
overall_query_path: ../data/sift/query.bin
overall_gt_path: ../data/sift/base.gt20
index:
index_type: annchor # hnsw | annchor | vamana | segmented
m: 16 # max neighbors per node
ef_construction: 200 # construction search depth
sealed_type: hnsw # (segmented only) base index type
seal_threshold: [50000, 100000] # (segmented only) segment thresholds
search:
recall_at: 10 # k for recall@k
ef_search: 35 # search depth
with_node_lock: [true, false] # node lock during search
workload:
batch_size: [20, 100, 200] # vectors per batch
num_threads: [16, 64] # worker threads
queue_size: 0 # event queue size (0 = unbounded)
rate_groups(r/w): # [read_rate, write_rate] pairs
- [20000, 20000] # balanced
- [4000, 36000] # write-heavy
- [36000, 4000] # read-heavy
with_external_rw_lock: false # external RW lock
query_mode: chasing # round_robin | chasing | peeking
compare:
enabled: [true, false] # enable lock comparison mode
simulate:
enabled: false # simulation mode
result:
output_dir: ./result/ch
profile:
memory_monitor_interval: 100 # memory sampling interval (ms)
enable_memory_profile: trueArrays in config generate multiple experiment variants:
batch_size: [20, 100, 200] # 3 variants
num_threads: [16, 64] # Γ 2 variants
rate_groups(r/w): # Γ 3 variants
- [20000, 20000]
- [4000, 36000]
- [36000, 4000]
# Total: 3 Γ 2 Γ 3 = 18 experiments| Mode | Behavior |
|---|---|
round_robin |
Queries cycle through query set sequentially |
chasing |
Queries follow the insertion watermark |
peeking |
Queries peek ahead of the watermark |
| Option | Behavior |
|---|---|
with_node_lock: true |
Lock nodes during search (default) |
with_node_lock: false |
No node lock (faster, may have consistency issues) |
with_external_rw_lock: true |
External RW lock (auto-disables node lock) |
bench/result/<algo>/<dataset>/
βββ files/
β βββ *.res # search results
β βββ *_diff.csv # recall diff data
βββ figures/
βββ *.png # recall plots
Manual plotting:
python3 utils/scripts/plot_recall_diff.py --csv <diff.csv> --out <plot.png>ccANN-Bench/
βββ bench/
β βββ algorithms/ # Index implementations (C++)
β βββ config/ # Workload configs (YAML)
β βββ internal/ # Benchmark logic (Go)
β βββ result/ # Output directory
βββ utils/
β βββ build/ # calc_recall, calc_incr_recall
β βββ scripts/ # Plotting scripts (Python)
βββ data/ # Datasets & ground truth
run.shsetsLD_PRELOADto jemalloc for memory tracking- ANNchor exposes extra stats (future-skip counters) in compare runs
- Set
compare.enabled: truefor lock strategy experiments - When
with_external_rw_lock: true, node lock is automatically disabled
MIT