Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent KNN query results with multiple leafs #14191

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

tteofili
Copy link
Contributor

@tteofili tteofili commented Feb 3, 2025

This is a first attempt at fixing #14180.
It's based on @jpountz idea mentioned here.

@tteofili
Copy link
Contributor Author

tteofili commented Feb 3, 2025

preliminary tests with luceneutil on Cohere-768.

with force-merge=true
baseline

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  force merge s  num segments  index size (MB)  selectivity   filterType  vec disk (MB)  vec RAM (MB)
 0.913         1.620  200000   100      50       32        100         no     3294    43.94       4551.45          29.43             1           596.16         1.00  post-filter        585.938       585.938
 0.646         1.317  200000   100      50       32        100     7 bits     3338    33.13       6037.19          20.29             1           742.72         1.00  post-filter        732.422       146.484
 0.449         1.066  200000   100      50       32        100     4 bits     3928    24.33       8219.29          12.32             1           670.88         1.00  post-filter        659.180        73.242

candidate

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  force merge s  num segments  index size (MB)  selectivity   filterType  vec disk (MB)  vec RAM (MB)
 0.915         1.636  200000   100      50       32        100         no     3289    44.20       4524.68          29.79             1           596.11         1.00  post-filter        585.938       585.938
 0.652         1.336  200000   100      50       32        100     7 bits     3326    32.86       6086.98          19.58             1           742.74         1.00  post-filter        732.422       146.484
 0.445         1.049  200000   100      50       32        100     4 bits     3912    24.01       8331.25          11.82             1           670.87         1.00  post-filter        659.180        73.242

without force-merge
baseline

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  force merge s  num segments  index size (MB)  selectivity   filterType  vec disk (MB)  vec RAM (MB)
 0.915         1.446  200000   100      50       32        100         no     3313    38.04       5257.35          26.52             1           596.11         1.00  post-filter        585.938       585.938
 0.641         1.243  200000   100      50       32        100     7 bits     3367    28.65       6980.32          18.50             1           742.91         1.00  post-filter        732.422       146.484
 0.446         0.939  200000   100      50       32        100     4 bits     3918    21.19       9437.08          10.73             1           670.88         1.00  post-filter        659.180        73.242

candidate

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  force merge s  num segments  index size (MB)  selectivity   filterType  vec disk (MB)  vec RAM (MB)
 0.914         1.429  200000   100      50       32        100         no     3304    43.20       4629.74          29.62             1           596.16         1.00  post-filter        585.938       585.938
 0.652         1.238  200000   100      50       32        100     7 bits     3371    28.75       6957.49          18.49             1           742.91         1.00  post-filter        732.422       146.484
 0.440         0.950  200000   100      50       32        100     4 bits     3943    21.31       9386.59          10.97             1           670.88         1.00  post-filter        659.180        73.242

these are meant to check that the latency doesn't change (increase) with these changes.

@tteofili
Copy link
Contributor Author

tteofili commented Feb 4, 2025

I've adjusted AbstractKnnVectorQuery to pick the largest LeafReaderContext (largest #reader().numDocs()) for the first search, this introduces an additive O(|leafReaderContexts|) term, but possibly more efficient execution for the other tasks. perhaps we can optimize the LeafReaderContext selection process to be more efficient though.

@tteofili
Copy link
Contributor Author

tteofili commented Feb 4, 2025

my previous luceneutil runs were useless, now with changes in luceneutil (NoMergePolicy and no force merge on index side, using the ExecutorService on the search side), I get far different numbers:

baseline

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  index s  index docs/s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.951         1.840  200000   100      50       32        100         no    12.84      15571.47            33           593.98        585.938       585.938

candidate (use first segment)

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  index s  index docs/s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.999         4.840  200000   100      50       32        100         no    12.68      15771.63            33           593.99        585.938       585.938

candidate (use largest segment)

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  index s  index docs/s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.999         4.976  200000   100      50       32        100         no    13.46      14861.05            33           593.98        585.938       585.938

@tteofili
Copy link
Contributor Author

tteofili commented Feb 4, 2025

I'm going to try a more promising way of slicing segments to threads

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants