Skip to content

ESQL - Add K mandatory param for KNN function #129763

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jul 2, 2025

Conversation

carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Jun 20, 2025

Right now KNN function sets k as an option:

| WHERE KNN(vector, [0, 1, 2], {"k": 10})

This is a problem as it makes k optional. Setting a default value makes no sense until we use LIMIT for setting k (#129353).

Until we use LIMIT, this PR makes k a mandatory parameter for KNN function:

  • Users must explicitly set the value of K they use
  • K is a first class citizen in KNN. This makes sense as it will be the most tweaked option by users.

The new function format will be:

| WHERE KNN(vector, [0, 1, 2], 10)

In case users want to set num_candidates, they can do so via option:

| WHERE KNN(vector, [0, 1, 2], 10, {"num_candidates": 50})

After #129353 is done, we can make this parameter optional and check that a LIMIT can be used to set K on behalf of the user. Users will be able to override the default K by setting it explicitly.

This PR removes a test for the k option, which makes no sense now, and closes the following related issues for this test:

Closes #129447
Closes #129512

@carlosdelest carlosdelest added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.19.0 Team:Search - Relevance The Search organization Search Relevance team labels Jun 20, 2025
Copy link
Contributor

github-actions bot commented Jun 20, 2025

🔍 Preview links for changed docs:

🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes.

@@ -511,9 +511,6 @@ tests:
- class: org.elasticsearch.entitlement.runtime.policy.FileAccessTreeTests
method: testWindowsAbsolutPathAccess
issue: https://github.com/elastic/elasticsearch/issues/129168
- class: org.elasticsearch.xpack.esql.qa.multi_node.EsqlSpecIT
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is removed in this PR as it's already being tested in all other tests

@@ -29,31 +29,12 @@ chartreuse | [127.0, 255.0, 0.0]
// end::knn-function-result[]
;

knnSearchWithKOption
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed test - k is already added to all other tests

@@ -100,14 +109,6 @@ public Knn(
description = "Floating point number used to decrease or increase the relevance scores of the query."
+ "Defaults to 1.0."
),
@MapParam.MapParamEntry(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k is removed as an option

@@ -2172,24 +2173,25 @@ private void checkFullTextFunctionNullArgs(String functionInvocation, String arg
);
}

public void testFullTextFunctionsConstantQuery() throws Exception {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed as we're checking other params not being null now

@carlosdelest carlosdelest marked this pull request as ready for review June 20, 2025 13:43
@elasticsearchmachine elasticsearchmachine removed Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch Team:Search - Relevance The Search organization Search Relevance team labels Jun 20, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Jun 20, 2025
Copy link
Contributor

@ioanatia ioanatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does what it says 👍

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one question

@ioanatia ioanatia removed auto-backport Automatically create backport pull requests when merged v8.19.0 labels Jun 23, 2025
ioanatia and others added 7 commits June 26, 2025 20:26
# Conflicts:
#	docs/reference/query-languages/esql/kibana/definition/functions/knn.json
#	docs/reference/query-languages/esql/kibana/docs/functions/knn.md
#	muted-tests.yml
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/EsqlFunctionRegistry.java
…o non-issue/knn-k-param

# Conflicts:
#	docs/reference/query-languages/esql/kibana/definition/functions/knn.json
#	muted-tests.yml
#	server/src/main/java/org/elasticsearch/TransportVersions.java
@carlosdelest
Copy link
Member Author

Hey @ioanatia @tteofili @kderusso I've removed the usage of TransportVersions by not serializing k parameter in a6faf49.

Once the query builder is created as part of the query rewriting, there is no need for storing k as it will be included in the query builder options, so we don't really need to serialize it to data nodes.

I think this is better for not dealing with transport versions, but LMK otherwise 👍

@carlosdelest carlosdelest enabled auto-merge (squash) July 2, 2025 07:05
Copy link
Contributor

github-actions bot commented Jul 2, 2025

🔍 Preview links for changed docs

@@ -236,6 +236,9 @@ tests:
- class: org.elasticsearch.packaging.test.DockerTests
method: test012SecurityCanBeDisabled
issue: https://github.com/elastic/elasticsearch/issues/116636
- class: org.elasticsearch.index.shard.StoreRecoveryTests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad merge? same as the other ones that are added?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 ouch. Sorry about that.

Opened #130523 to fix

return TypeResolution.TYPE_RESOLVED;
}

return isType(k(), dt -> dt == INTEGER, sourceText(), THIRD, "integer").and(isFoldable(k(), sourceText(), THIRD))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we missing a test here on what would happen if k is not a constant (or not foldable)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that is covered by this test

@carlosdelest
Copy link
Member Author

bwc tests fail - opened #130441 to fix

@carlosdelest carlosdelest merged commit 315aba6 into elastic:main Jul 2, 2025
32 checks passed
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >non-issue serverless-linked Added by automation, don't add manually Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] EsqlSpecIT test {knn-function.KnnSearchWithKOption SYNC} failing [CI] EsqlSpecIT test {knn-function.KnnSearchWithKOption ASYNC} failing
5 participants