Adjust IVF fixup phase to sometimes bypass some of the neighborhood calculations #130490

benwtrent · 2025-07-02T20:00:45Z

During the fixup phase, we compare the vector against every neighbor in the cluster neighborhood, no matter what. This seems pretty wasteful, especially for tightly clustered set of neighbors.

This change adjusts the fix up phase to only check for assignment if the currently assigned centroid is worse than the maximum intra-distance of the neighborhood.

This further reduces index time at no perceivable recall loss. I ran over 3 data sets, multi-segment and force merged.

Additionally, I noticed that we seemed to compute neighborhoods and use those calculations even when the total number of clusters is fewer than the neighborhood size. I adjusted this logic and we only compute the neighborhoods when the number of clusters is larger than the configured fixup neighborhood size.

All in all, this gives us about 5-15% index performance boost with no substantial drop in recall (the most I saw across all my runs was 0.01)

@iverase @john-wagster let me know what y'all think

elasticsearchmachine · 2025-07-02T20:01:11Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

john-wagster

lgtm

iverase · 2025-07-03T08:35:50Z

server/src/main/java/org/elasticsearch/index/codec/vectors/cluster/KMeansLocal.java

-            neighborhoods.set(i, neighbors);
+            float maxIntraDistance = queue.consumeNodesWithWorstScore(neighbors, scores);
+            // Sort neighbors by their score
+            for (int j = 0; j < neighborCount; j++) {


Why not populate the array using the pop method of the priority queue?

for (int j = neighborCount - 1; j >= 0; j--) { neighbors[j] = queue.pop(); }

@iverase let me benchmark this

iverase

Just left a recommendation to simplify the code. The approach makes sense to me I have observed the same behaviour as explained in the description when running it over my local tests.

iverase · 2025-07-03T14:06:39Z

server/src/main/java/org/elasticsearch/index/codec/vectors/cluster/KMeansLocal.java

+                continue;
+            }
+            // consume the queue into the neighbors array and get the maximum intra-cluster distance
+            int[] neighbors = new int[queue.size()];


looks much nicer now

and faster!

benwtrent added 2 commits July 2, 2025 13:17

Improve ivf index time during fixup phase

2dd98f1

iter

5026756

benwtrent requested a review from iverase July 2, 2025 20:00

benwtrent added >non-issue :Search Relevance/Vectors Vector search v9.2.0 labels Jul 2, 2025

benwtrent requested a review from john-wagster July 2, 2025 20:00

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 2, 2025

benwtrent changed the title ~~Feature/improve ivf index time~~ Adjust IVF fixup phase to sometimes bypass some of the neighborhood calculations Jul 2, 2025

john-wagster approved these changes Jul 2, 2025

View reviewed changes

iverase reviewed Jul 3, 2025

View reviewed changes

iverase approved these changes Jul 3, 2025

View reviewed changes

addressing PR comments

21a89f2

benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jul 3, 2025

Merge branch 'main' into feature/improve-ivf-index-time

8294e11

iverase mentioned this pull request Jul 3, 2025

Don't accept clustersPerNeighborhood lower than 2 #130526

Open

iverase reviewed Jul 3, 2025

View reviewed changes

benwtrent merged commit e5da80f into elastic:main Jul 3, 2025
31 of 32 checks passed

benwtrent deleted the feature/improve-ivf-index-time branch July 3, 2025 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adjust IVF fixup phase to sometimes bypass some of the neighborhood calculations #130490

Adjust IVF fixup phase to sometimes bypass some of the neighborhood calculations #130490

benwtrent commented Jul 2, 2025

Uh oh!

elasticsearchmachine commented Jul 2, 2025

Uh oh!

john-wagster left a comment

Uh oh!

iverase Jul 3, 2025 •

edited

Loading

Uh oh!

benwtrent Jul 3, 2025

Uh oh!

iverase left a comment

Uh oh!

iverase Jul 3, 2025

Uh oh!

benwtrent Jul 3, 2025

Uh oh!

Uh oh!

Uh oh!

Adjust IVF fixup phase to sometimes bypass some of the neighborhood calculations #130490

Adjust IVF fixup phase to sometimes bypass some of the neighborhood calculations #130490

Conversation

benwtrent commented Jul 2, 2025

Uh oh!

elasticsearchmachine commented Jul 2, 2025

Uh oh!

john-wagster left a comment

Choose a reason for hiding this comment

Uh oh!

iverase Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benwtrent Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

iverase left a comment

Choose a reason for hiding this comment

Uh oh!

iverase Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

benwtrent Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

iverase Jul 3, 2025 •

edited

Loading