Optimize HNSW Memory Allocation with Chunked Arrays #633

michaelbautin · 2025-09-26T21:15:30Z

Original patch was created by @PingXie (reviewed by @yairgott). These changes originated in the valkey-search project.

Original commit message below:

This change improves the memory allocation strategy of the HNSW index data structures by adopting chunked arrays. Previously, resizing the HNSW index involved expensive realloc calls, which were particularly costly in two ways:

Memory Consumption: During resizing, realloc temporarily requires double the memory to accommodate the old data block until the new block is ready and the old block is freed. This posed a substantial issue when dealing with gigabytes of data, leading to excessive memory usage.
Performance Overhead: The realloc operation entails copying data from the old block to the new block, invoking an expensive memcpy operation. This process becomes increasingly burdensome as the data size grows, resulting in significant performance degradation for large-scale indices.
By transitioning to chunked arrays for memory allocation, we circumvent the need for realloc calls during resizing. This approach not only reduces the memory footprint by eliminating the temporary doubling of memory but also enhances overall performance by avoiding the costly memcpy operations. The chunked array strategy allocates new memory in smaller, manageable chunks as the data grows, ensuring more efficient use of memory and improving the scalability of HNSW index operations, especially critical for indices spanning multiple gigabytes.

…tempt to fix Python bindings.

…ng in Python bindings.

…fix build

PingXie

LGTM overall. Thanks for upstreaming this change, Michael!

hnswlib/hnswalg.h

yairgott · 2025-10-02T15:31:42Z

LGTM!

In case not already done, it would be useful to run a sanity test with an ASAN build.

michaelbautin · 2025-10-02T16:38:29Z

@yairgott : ASAN tests are already being run in GitHub Actions.

michaelbautin added 9 commits September 26, 2025 21:14

Chunked memory allocation

688f06a

Turn prefetching back on

5de0c43

Fix compilation

e23f6ca

Fix another invalid prefetch. Better output on range check error.

a070bc5

Attempting to fix Python bindings

11bffd7

Another attempt to fix the Python bindings build

17ef9d3

Fix copyFrom argument (should be const char*)

0912989

Simplify managing chunked array of neighbor list pointers. Another at…

55644e8

…tempt to fix Python bindings.

Clear level 0 memory when calling clear(). Fix HNSW pickling/unpickli…

cc68c8f

…ng in Python bindings.

michaelbautin changed the title ~~Chunked memory allocation~~ Optimize HNSW Memory Allocation with Chunked Arrays Sep 29, 2025

michaelbautin requested review from yurymalkov and dyashuni September 29, 2025 05:13

michaelbautin added 2 commits September 29, 2025 07:08

Make the number of elements per chunk configurable

517f5eb

Make field declaration order consistent with initialization order to …

12fc750

…fix build

PingXie approved these changes Oct 1, 2025

View reviewed changes

hnswlib/hnswalg.h Outdated Show resolved Hide resolved

Revert unintended indentation

c32cc21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize HNSW Memory Allocation with Chunked Arrays #633

Optimize HNSW Memory Allocation with Chunked Arrays #633

Uh oh!

michaelbautin commented Sep 26, 2025 •

edited

Loading

Uh oh!

PingXie left a comment

Uh oh!

Uh oh!

yairgott commented Oct 2, 2025

Uh oh!

michaelbautin commented Oct 2, 2025

Uh oh!

Uh oh!

Optimize HNSW Memory Allocation with Chunked Arrays #633

Are you sure you want to change the base?

Optimize HNSW Memory Allocation with Chunked Arrays #633

Uh oh!

Conversation

michaelbautin commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PingXie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yairgott commented Oct 2, 2025

Uh oh!

michaelbautin commented Oct 2, 2025

Uh oh!

Uh oh!

michaelbautin commented Sep 26, 2025 •

edited

Loading