Reduce quantization optimization steps at ivf query time #130493

benwtrent · 2025-07-02T20:52:01Z

Since we are quantizing for posting list centroid, I think we can get away with fewer optimization iterations.

Dropping from 5 to 2 reduces latency when hitting many centroids, with no recall impact (at least on my data sets).

baseline:

index_name                      index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited
------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------
cohere-wikipedia-docs-768d.vec         ivf      100         2.43              0.00           0.00   411.52    0.91  23766.65

candidate:

index_name                      index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited
------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------
cohere-wikipedia-docs-768d.vec         ivf      100         1.84              0.00           0.00   543.48    0.91  23766.65

Here is a more extreme case (many segments):

baseline:

index_name                      index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited
------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------
cohere-wikipedia-docs-768d.vec         ivf      100        36.10              0.00           0.00   27.70    0.87  364480.37

candidate:

index_name                      index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited
------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------
cohere-wikipedia-docs-768d.vec         ivf      100        24.94              0.00           0.00   40.10    0.87  364480.37

Need to test against more data sets, but this is a nice improvement.

elasticsearchmachine · 2025-07-02T20:52:24Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

john-wagster

agreed need to test this more but good results are good and I buy that it makes sense; lgtm

benwtrent · 2025-07-03T15:06:12Z

@john-wagster

I am still running more benchmarks, but I ran over

glove-normalized-200.train
corpus-dbpedia-entity-arctic-0.fvec
cohere-wikipedia-docs-768d.vec
corpus-quora-E5-small.fvec.flat

And observed no recall difference.

I am running a larger 8M run of cohere-wikipedia-docs-768d.vec. But even with iter=1 at query time, there has been zero impact on recall, with nice query time latency improvements when hitting many centroids.

benwtrent · 2025-07-03T16:23:21Z

@john-wagster I ran over all 8M of the cohere 768, observed no recall difference between 5 iterations and 1 iteration for multiple segments and force merged to one.

I am thinking we should call it. If we noticed funky recall in other datasets, we can revisit.

server/src/main/java/org/elasticsearch/index/codec/vectors/DefaultIVFVectorsReader.java

Reduce quantization optimization steps at ivf query time

73a10ad

benwtrent added >non-issue :Search Relevance/Vectors Vector search v9.2.0 labels Jul 2, 2025

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 2, 2025

john-wagster approved these changes Jul 2, 2025

View reviewed changes

benwtrent commented Jul 3, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/index/codec/vectors/DefaultIVFVectorsReader.java Outdated Show resolved Hide resolved

benwtrent added 2 commits July 3, 2025 12:23

Apply suggestions from code review

b771b37

Merge branch 'main' into adj-quantization-iters-ivf

d035720

benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jul 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce quantization optimization steps at ivf query time #130493

Reduce quantization optimization steps at ivf query time #130493

benwtrent commented Jul 2, 2025

Uh oh!

elasticsearchmachine commented Jul 2, 2025

Uh oh!

john-wagster left a comment

Uh oh!

benwtrent commented Jul 3, 2025

Uh oh!

benwtrent commented Jul 3, 2025

Uh oh!

Uh oh!

Uh oh!

Reduce quantization optimization steps at ivf query time #130493

Are you sure you want to change the base?

Reduce quantization optimization steps at ivf query time #130493

Conversation

benwtrent commented Jul 2, 2025

Uh oh!

elasticsearchmachine commented Jul 2, 2025

Uh oh!

john-wagster left a comment

Choose a reason for hiding this comment

Uh oh!

benwtrent commented Jul 3, 2025

Uh oh!

benwtrent commented Jul 3, 2025

Uh oh!

Uh oh!

Uh oh!