Consistent KNN query results with multiple leafs #14191

tteofili · 2025-02-03T16:11:08Z

This is a first attempt at fixing #14180.
It's based on @jpountz idea mentioned here.

tteofili · 2025-02-03T16:15:10Z

preliminary tests with luceneutil on Cohere-768.

with force-merge=true
baseline

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  force merge s  num segments  index size (MB)  selectivity   filterType  vec disk (MB)  vec RAM (MB)
 0.913         1.620  200000   100      50       32        100         no     3294    43.94       4551.45          29.43             1           596.16         1.00  post-filter        585.938       585.938
 0.646         1.317  200000   100      50       32        100     7 bits     3338    33.13       6037.19          20.29             1           742.72         1.00  post-filter        732.422       146.484
 0.449         1.066  200000   100      50       32        100     4 bits     3928    24.33       8219.29          12.32             1           670.88         1.00  post-filter        659.180        73.242

candidate

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  force merge s  num segments  index size (MB)  selectivity   filterType  vec disk (MB)  vec RAM (MB)
 0.915         1.636  200000   100      50       32        100         no     3289    44.20       4524.68          29.79             1           596.11         1.00  post-filter        585.938       585.938
 0.652         1.336  200000   100      50       32        100     7 bits     3326    32.86       6086.98          19.58             1           742.74         1.00  post-filter        732.422       146.484
 0.445         1.049  200000   100      50       32        100     4 bits     3912    24.01       8331.25          11.82             1           670.87         1.00  post-filter        659.180        73.242

without force-merge
baseline

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  force merge s  num segments  index size (MB)  selectivity   filterType  vec disk (MB)  vec RAM (MB)
 0.915         1.446  200000   100      50       32        100         no     3313    38.04       5257.35          26.52             1           596.11         1.00  post-filter        585.938       585.938
 0.641         1.243  200000   100      50       32        100     7 bits     3367    28.65       6980.32          18.50             1           742.91         1.00  post-filter        732.422       146.484
 0.446         0.939  200000   100      50       32        100     4 bits     3918    21.19       9437.08          10.73             1           670.88         1.00  post-filter        659.180        73.242

candidate

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  force merge s  num segments  index size (MB)  selectivity   filterType  vec disk (MB)  vec RAM (MB)
 0.914         1.429  200000   100      50       32        100         no     3304    43.20       4629.74          29.62             1           596.16         1.00  post-filter        585.938       585.938
 0.652         1.238  200000   100      50       32        100     7 bits     3371    28.75       6957.49          18.49             1           742.91         1.00  post-filter        732.422       146.484
 0.440         0.950  200000   100      50       32        100     4 bits     3943    21.31       9386.59          10.97             1           670.88         1.00  post-filter        659.180        73.242

these are meant to check that the latency doesn't change (increase) with these changes.

lucene/core/src/java/org/apache/lucene/search/knn/MultiLeafKnnCollector.java

tteofili · 2025-02-04T16:32:35Z

I've adjusted AbstractKnnVectorQuery to pick the largest LeafReaderContext (largest #reader().numDocs()) for the first search, this introduces an additive O(|leafReaderContexts|) term, but possibly more efficient execution for the other tasks. perhaps we can optimize the LeafReaderContext selection process to be more efficient though.

tteofili · 2025-02-04T17:34:07Z

my previous luceneutil runs were useless, now with changes in luceneutil (NoMergePolicy and no force merge on index side, using the ExecutorService on the search side), I get far different numbers:

baseline

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  index s  index docs/s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.951         1.840  200000   100      50       32        100         no    12.84      15571.47            33           593.98        585.938       585.938

candidate (use first segment)

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  index s  index docs/s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.999         4.840  200000   100      50       32        100         no    12.68      15771.63            33           593.99        585.938       585.938

candidate (use largest segment)

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  index s  index docs/s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.999         4.976  200000   100      50       32        100         no    13.46      14861.05            33           593.98        585.938       585.938

tteofili · 2025-02-04T17:35:35Z

I'm going to try a more promising way of slicing segments to threads

Consistent KNN query results with multiple leafs

5fdaa78

github-actions bot added the module:core/search label Feb 3, 2025

tteofili added 3 commits February 3, 2025 17:27

Merge branch 'main' of github.com:apache/lucene into knn_consistency_ml

6d92ab0

merge

2723510

minor

e095f83

mayya-sharipova reviewed Feb 3, 2025

View reviewed changes

lucene/core/src/java/org/apache/lucene/search/knn/MultiLeafKnnCollector.java Show resolved Hide resolved

tteofili added 5 commits February 4, 2025 10:00

take mcs from global queue when frozen

44f9dca

MLKC should freeze on the first context passed by AKQ

6909fd4

tidy

84e44e4

minor

e0cca62

freeze biggest segment

3082f40

larger leaf slices to binned tasks

573c1e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent KNN query results with multiple leafs #14191

Consistent KNN query results with multiple leafs #14191

tteofili commented Feb 3, 2025

tteofili commented Feb 3, 2025 •

edited

Loading

tteofili commented Feb 4, 2025 •

edited

Loading

tteofili commented Feb 4, 2025

tteofili commented Feb 4, 2025

Consistent KNN query results with multiple leafs #14191

Are you sure you want to change the base?

Consistent KNN query results with multiple leafs #14191

Conversation

tteofili commented Feb 3, 2025

tteofili commented Feb 3, 2025 • edited Loading

tteofili commented Feb 4, 2025 • edited Loading

tteofili commented Feb 4, 2025

tteofili commented Feb 4, 2025

tteofili commented Feb 3, 2025 •

edited

Loading

tteofili commented Feb 4, 2025 •

edited

Loading