Skip to content

Reduce quantization optimization steps at ivf query time #130493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

benwtrent
Copy link
Member

Since we are quantizing for posting list centroid, I think we can get away with fewer optimization iterations.

Dropping from 5 to 2 reduces latency when hitting many centroids, with no recall impact (at least on my data sets).

baseline:

index_name                      index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited
------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------
cohere-wikipedia-docs-768d.vec         ivf      100         2.43              0.00           0.00   411.52    0.91  23766.65

candidate:

index_name                      index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited
------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------
cohere-wikipedia-docs-768d.vec         ivf      100         1.84              0.00           0.00   543.48    0.91  23766.65

Here is a more extreme case (many segments):

baseline:

index_name                      index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited
------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------
cohere-wikipedia-docs-768d.vec         ivf      100        36.10              0.00           0.00   27.70    0.87  364480.37

candidate:

index_name                      index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited
------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------
cohere-wikipedia-docs-768d.vec         ivf      100        24.94              0.00           0.00   40.10    0.87  364480.37

Need to test against more data sets, but this is a nice improvement.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 2, 2025
Copy link
Contributor

@john-wagster john-wagster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed need to test this more but good results are good and I buy that it makes sense; lgtm

@benwtrent
Copy link
Member Author

@john-wagster

I am still running more benchmarks, but I ran over

glove-normalized-200.train
corpus-dbpedia-entity-arctic-0.fvec
cohere-wikipedia-docs-768d.vec
corpus-quora-E5-small.fvec.flat

And observed no recall difference.

I am running a larger 8M run of cohere-wikipedia-docs-768d.vec. But even with iter=1 at query time, there has been zero impact on recall, with nice query time latency improvements when hitting many centroids.

@benwtrent
Copy link
Member Author

@john-wagster I ran over all 8M of the cohere 768, observed no recall difference between 5 iterations and 1 iteration for multiple segments and force merged to one.

I am thinking we should call it. If we noticed funky recall in other datasets, we can revisit.

@benwtrent benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants