-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a HNSW collector that exits early when nearest neighbor queue saturates #14094
base: main
Are you sure you want to change the base?
Conversation
lucene/core/src/java/org/apache/lucene/search/HnswQueueSaturationCollector.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/HnswQueueSaturationCollector.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/HnswQueueSaturationCollector.java
Show resolved
Hide resolved
lucene/core/src/test/org/apache/lucene/search/HnswQueueSaturationCollectorTest.java
Outdated
Show resolved
Hide resolved
public interface HnswKnnCollector extends KnnCollector { | ||
|
||
/** Indicates exploration of the next HNSW candidate graph node. */ | ||
void nextCandidate(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this kind of collector is OK. But it makes most sense to me to be a delegate collector. An abstract collector to KnnCollector.Delegate
.
Then, I also think that the OrdinalTranslatingKnnCollector
should inherit directly from HnswKnnCollector
always assuming that the passed in collector is a HnswKnnCollector
.
Note, the default behavior for HnswKnnCollector#nextCandidate
can simply be nothing, allowing for overriding.
This might require a new HnswGraphSearcher#search
interface to keep the old collector actions, but it can be simple to add a new one that accepts a HnswKnnCollector
and delegate to it with new HnswKnnCollector(KnnCollector delegate)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adjusted my refactoring for the seeded queries similarly. It seems nicer IMO: #14170
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Ben. I'll incorporate your suggestions once #14170 is in.
This introduces a
HnswKnnCollector
interface, extendingKnnCollector
for HNSW, to make it possible to hook into HNSW execution for optimizations.It then adds a new collector which uses a saturation-based threshold to dynamically halt HNSW graph exploration, in order to early exit when the exploration of new candidates is unlikely to lead to addition of new neighbors.
The new collector records the number of added neighbors upon exploration of a new candidate (a HNSW node) and it compares it with the number of neighbors added while exploring the previous candidate, when the rate of added neighbors plateaus for a number of consecutive iterations, it stops graph exploration (
earlyTerminate
returnstrue
).