ccANN-Bench

A high-concurrency benchmark suite for Approximate Nearest Neighbor (ANN) indexes with mixed read/write workloads.

Features

	Feature	Description
⚡	Concurrent	Real-time read/write benchmarking with configurable rates
🌊	Streaming	Incremental inserts with recall tracking over time
⚖️	Comparison	Record/replay mode for lock strategy analysis
🧩	Flexible	Multiple query modes: `round_robin`, `chasing`, `peeking`
📊	Profiling	Memory tracking via jemalloc integration
🎨	Visualization	Auto-generated recall plots and diff analysis

Supported Algorithms

Algorithm	Description
HNSW	Hierarchical Navigable Small World
ANNchor	ANNchor with future-skip optimization
Vamana	DiskANN's Vamana graph
ParlayANN	Parallel ANN (HNSW/Vamana variants)

Datasets: SIFT (128d) · GIST (960d) · Deep1M (96d) · MSong (420d) · FreewayML

Quick Start

Prerequisites

Requirement	Version
Go	1.21+
GCC/G++	C++17 support
CMake	3.16+
jemalloc	-

Build

# 1. Build native libraries
cmake -S bench -B bench/build && cmake --build bench/build -j
cmake -S utils -B utils/build && cmake --build utils/build -j

# 2. Build benchmark binary
cd bench && go build -o bench .

Run

# Single config
./bench -config config/annchor/sift/ch_b20.yaml

# Batch experiments
bash run.sh

Configuration

Config path: bench/config/<algorithm>/<dataset>/*.yaml

Config File Naming

Pattern	Description
`rr.yaml`	Round-robin mode, multiple batch sizes
`ch_b{N}.yaml`	Chasing mode, batch_size=N
`pk_b{N}.yaml`	Peeking mode, batch_size=N
`overall_only.yaml`	Overall recall test (no streaming)
`seg.yaml`	Segmented index

Full Config Reference

data:
  dataset_name: sift                    # dataset identifier
  max_elements: -1                      # max index size (-1 = unlimited)
  begin_num: 50000                      # initial index size before streaming
  data_type: float                      # float | int8 | uint8
  data_path: ../data/sift/base.bin
  incr_query_path: ../data/sift/query_stream.bin
  incr_gt_path: ../data/sift/stream_i20_offset_index.txt
  overall_query_path: ../data/sift/query.bin
  overall_gt_path: ../data/sift/base.gt20

index:
  index_type: annchor                   # hnsw | annchor | vamana | segmented
  m: 16                                 # max neighbors per node
  ef_construction: 200                  # construction search depth
  sealed_type: hnsw                     # (segmented only) base index type
  seal_threshold: [50000, 100000]       # (segmented only) segment thresholds

search:
  recall_at: 10                         # k for recall@k
  ef_search: 35                         # search depth
  with_node_lock: [true, false]         # node lock during search

workload:
  batch_size: [20, 100, 200]            # vectors per batch
  num_threads: [16, 64]                 # worker threads
  queue_size: 0                         # event queue size (0 = unbounded)
  rate_groups(r/w):                     # [read_rate, write_rate] pairs
    - [20000, 20000]                    # balanced
    - [4000, 36000]                     # write-heavy
    - [36000, 4000]                     # read-heavy
  with_external_rw_lock: false          # external RW lock
  query_mode: chasing                   # round_robin | chasing | peeking

compare:
  enabled: [true, false]                # enable lock comparison mode
  simulate:
    enabled: false                      # simulation mode

result:
  output_dir: ./result/ch

profile:
  memory_monitor_interval: 100          # memory sampling interval (ms)
  enable_memory_profile: true

Sweep Parameters

Arrays in config generate multiple experiment variants:

batch_size: [20, 100, 200]    # 3 variants
num_threads: [16, 64]         # × 2 variants
rate_groups(r/w):             # × 3 variants
  - [20000, 20000]
  - [4000, 36000]
  - [36000, 4000]
# Total: 3 × 2 × 3 = 18 experiments

Query Modes

Mode	Behavior
`round_robin`	Queries cycle through query set sequentially
`chasing`	Queries follow the insertion watermark
`peeking`	Queries peek ahead of the watermark

Lock Strategy

Option	Behavior
`with_node_lock: true`	Lock nodes during search (default)
`with_node_lock: false`	No node lock (faster, may have consistency issues)
`with_external_rw_lock: true`	External RW lock (auto-disables node lock)

Output

bench/result/<algo>/<dataset>/
├── files/
│   ├── *.res              # search results
│   └── *_diff.csv         # recall diff data
└── figures/
    └── *.png              # recall plots

Manual plotting:

python3 utils/scripts/plot_recall_diff.py --csv <diff.csv> --out <plot.png>

Repository Structure

ccANN-Bench/
├── bench/
│   ├── algorithms/        # Index implementations (C++)
│   ├── config/            # Workload configs (YAML)
│   ├── internal/          # Benchmark logic (Go)
│   └── result/            # Output directory
├── utils/
│   ├── build/             # calc_recall, calc_incr_recall
│   └── scripts/           # Plotting scripts (Python)
└── data/                  # Datasets & ground truth

Notes

run.sh sets LD_PRELOAD to jemalloc for memory tracking
ANNchor exposes extra stats (future-skip counters) in compare runs
Set compare.enabled: true for lock strategy experiments
When with_external_rw_lock: true, node lock is automatically disabled

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 516 Commits
.github/workflows		.github/workflows
bench		bench
data		data
utils		utils
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ccANN-Bench

Table of Contents

Features

Supported Algorithms

Quick Start

Prerequisites

Build

Run

Configuration

Config File Naming

Full Config Reference

Sweep Parameters

Query Modes

Lock Strategy

Output

Repository Structure

Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Kwan-Yiu/ccANN-Bench

Folders and files

Latest commit

History

Repository files navigation

ccANN-Bench

Table of Contents

Features

Supported Algorithms

Quick Start

Prerequisites

Build

Run

Configuration

Config File Naming

Full Config Reference

Sweep Parameters

Query Modes

Lock Strategy

Output

Repository Structure

Notes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages