Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse entry point scores and provide mechanisms to provide scores for directly entry points #14256

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

benwtrent
Copy link
Member

Spinning out of: #14226

That particular evolution of kNN querying is attempting to re-entry individual segment graphs with new exit and search criteria. To prevent having to rescore the new entry points, this PR provides the ability to (optionally) keep track of the scores for entry points.

Additionally, this will take advantage of entry point score retention during graph building and searching. The performance improvements are marginal.

@msokolov
Copy link
Contributor

Hi, @benwtrent , this bakes support for this supplying scores feature pretty deeply. I was thinking if we were to use this only for the SeededKnnVectorQuery, it might suffice to create a wrapping RandomVectorScorer that would supply the cached scores, while delegating the others to the underlying scorer?

*
* @lucene.internal
*/
public sealed class MappedDISI extends DocIdSetIterator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoa I had to look up what a sealed class is - I guess it is like final but you can allow some classes to inherit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

eps = ArrayUtil.growExact(eps, candidates.size());
epsScores = ArrayUtil.growExact(epsScores, candidates.size());
epCount = candidates.size();
candidates.collectNodesAndScores(eps, epsScores);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are re-using the scores from upper levels when scoring the lower levels?

@msokolov
Copy link
Contributor

I also tried something based on the simpler approach I mentioned and also saw very minor gains in the seeded search with reentry when reusing scores.

@benwtrent
Copy link
Member Author

I also tried something based on the simpler approach I mentioned and also saw very minor gains in the seeded search with reentry when reusing scores.

Yeah, I don't expect this to be a huge performance gain. I would expect the actual bottlenecks addressed in your changes (thrashing, over exploration, etc.) would dominate any savings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants