From d89e3889f5bcba6935a1a299a14bd28cfbdb3b75 Mon Sep 17 00:00:00 2001 From: Mike Pellegrini Date: Tue, 8 Jul 2025 15:05:38 -0400 Subject: [PATCH 01/13] Update linear retriever documentation --- docs/reference/rest-api/common-parms.asciidoc | 59 ++++++++++++++----- docs/reference/search/retriever.asciidoc | 12 +++- 2 files changed, 54 insertions(+), 17 deletions(-) diff --git a/docs/reference/rest-api/common-parms.asciidoc b/docs/reference/rest-api/common-parms.asciidoc index 162e486158a95..f07f5e8bf8dff 100644 --- a/docs/reference/rest-api/common-parms.asciidoc +++ b/docs/reference/rest-api/common-parms.asciidoc @@ -1349,39 +1349,68 @@ according to each retriever's specifications. end::compound-retriever-filter[] tag::linear-retriever-components[] + +[NOTE] +==== +Either `query` or `retrievers` must be specified. +Combining `query` and `retrievers` is not supported. +==== + +`query`:: +(Optional, String) ++ +The query to use when using the <>. + +`fields`:: +(Optional, array of strings) ++ +The fields to query when using the <>. +Fields can include boost values using the `^` notation (e.g., `"field^2"`). +If not specified, uses the index's default fields from the `index.query.default_field` index setting, which is `*` by default. + +`normalizer`:: +(Optional, String) ++ +The normalizer to use when using the <>. +See <> for supported values. +Required when `query` is specified. + +[WARNING] +==== +Avoid using `none` as that will disable normalization and may bias the result set towards lexical matches. +See <> for more information. +==== + `retrievers`:: -(Required, array of objects) +(Optional, array of objects) + A list of the sub-retrievers' configuration, that we will take into account and whose result sets we will merge through a weighted sum. Each configuration can have a different weight and normalization depending on the specified retriever. -Each entry specifies the following parameters: +include::common-parms.asciidoc[tag=compound-retriever-rank-window-size] -* `retriever`:: +include::common-parms.asciidoc[tag=compound-retriever-filter] + +Each entry in the `retrievers` array specifies the following parameters: + +`retriever`:: (Required, a <> object) + Specifies the retriever for which we will compute the top documents for. The retriever will produce `rank_window_size` results, which will later be merged based on the specified `weight` and `normalizer`. -* `weight`:: +`weight`:: (Optional, float) + The weight that each score of this retriever's top docs will be multiplied with. Must be greater or equal to 0. Defaults to 1.0. -* `normalizer`:: +`normalizer`:: (Optional, String) + -Specifies how we will normalize the retriever's scores, before applying the specified `weight`. -Available values are: `minmax`, and `none`. Defaults to `none`. - -** `none` -** `minmax` : -A `MinMaxScoreNormalizer` that normalizes scores based on the following formula -+ -``` -score = (score - min) / (max - min) -``` +Specifies how the retriever’s score will be normalized before applying the specified `weight`. +See <> for supported values. +Defaults to `none`. See also <> using a linear retriever on how to independently configure and apply normalizers to retrievers. diff --git a/docs/reference/search/retriever.asciidoc b/docs/reference/search/retriever.asciidoc index a3cc4734fd23a..0fac02f0950a0 100644 --- a/docs/reference/search/retriever.asciidoc +++ b/docs/reference/search/retriever.asciidoc @@ -282,9 +282,17 @@ A retriever that normalizes and linearly combines the scores of other retrievers include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=linear-retriever-components] -include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-rank-window-size] +[[linear-retriever-normalizers]] +===== Normalizers -include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-filter] +The `linear` retriever supports the following normalizers: + +* `none`: No normalization +* `minmax`: Normalizes scores based on the following formula: + ``` + score = (score - min) / (max - min) + ``` +* `l2_norm`: Normalizes scores using the L2 norm of the score values [[rrf-retriever]] ==== RRF Retriever From 7bd8a6828b174cd5e43fad0186adaff53625fae4 Mon Sep 17 00:00:00 2001 From: Mike Pellegrini Date: Tue, 8 Jul 2025 15:29:26 -0400 Subject: [PATCH 02/13] Added multi-field query format section --- docs/reference/search/retriever.asciidoc | 184 +++++++++++++++++++++++ 1 file changed, 184 insertions(+) diff --git a/docs/reference/search/retriever.asciidoc b/docs/reference/search/retriever.asciidoc index 0fac02f0950a0..07bf02d2c407a 100644 --- a/docs/reference/search/retriever.asciidoc +++ b/docs/reference/search/retriever.asciidoc @@ -920,6 +920,190 @@ GET movies/_search <1> The `rule` retriever is the outermost retriever, applying rules to the search results that were previously reranked using the `rrf` retriever. <2> The `rrf` retriever returns results from all of its sub-retrievers, and the output of the `rrf` retriever is used as input to the `rule` retriever. +[discrete] +[[multi-field-query-format]] +=== Multi-field query format [multi-field-query-format] + +The `linear` and `rrf` retrievers support a multi-field query format that provides a simplified way to define searches across multiple fields without explicitly specifying inner retrievers. +This format automatically generates appropriate inner retrievers based on the field types and query parameters. +This is a great way to search an index, knowing little to nothing about its schema, while also handling normalization across lexical and semantic matches. + +[[multi-field-field-grouping]] +==== Field grouping [multi-field-field-grouping] + +The multi-field query format groups queried fields into two categories: + +- **Lexical fields**: fields that support term queries, such as `keyword` and `text` fields. +- **Semantic fields**: <>. + +Each field group is queried separately and the scores/ranks are normalized such that each contributes 50% to the final score/rank. +This balances the importance of lexical and semantic fields. +Most indices contain more lexical than semantic fields, and without this grouping the results would often bias towards lexical field matches. + +[WARNING] +==== +In the `linear` retriever, this grouping relies on using a normalizer other than `none` (i.e., `minmax` or `l2_norm`). +If you use the `none` normalizer, the scores across field groups will not be normalized and the results may be biased towards lexical field matches. +==== + +[[multi-field-field-boosting]] +==== Linear retriever field boosting + +When using the `linear` retriever, fields can be boosted using the `^` notation: + +[source,console] +---- +GET books/_search +{ + "retriever": { + "linear": { + "query": "elasticsearch", + "fields": [ + "title^3", <1> + "description^2", <2> + "title_semantic", <3> + "description_semantic^2" + ], + "normalizer": "minmax" + } + } +} +---- + +<1> 3x weight +<2> 2x weight +<3> 1x weight (default) + +Due to how the <> are normalized, per-field boosts have no effect on the range of the final score. +Instead, they affect the importance of the field's score within its group. + +For example, if the schema looks like: + +[source,console] +---- +PUT /books +{ + "mappings": { + "properties": { + "title": { + "type": "text", + "copy_to": "title_semantic" + }, + "description": { + "type": "text", + "copy_to": "description_semantic" + }, + "title_semantic": { + "type": "semantic_text" + }, + "description_semantic": { + "type": "semantic_text" + } + } + } +} +---- + +And we run this query: + +[source,console] +---- +GET books/_search +{ + "retriever": { + "linear": { + "query": "elasticsearch", + "fields": [ + "title", + "description", + "title_semantic", + "description_semantic" + ], + "normalizer": "minmax" + } + } +} +---- + +The score breakdown would be: + +* Lexical fields (50% of score): + * `title`: 50% of lexical fields group score, 25% of final score + * `description`: 50% of lexical fields group score, 25% of final score +* Semantic fields (50% of score): + * `title_semantic`: 50% of semantic fields group score, 25% of final score + * `description_semantic`: 50% of semantic fields group score, 25% of final score + +If we apply per-field boosts like so: + +[source,console] +---- +GET books/_search +{ + "retriever": { + "linear": { + "query": "elasticsearch", + "fields": [ + "title^3", + "description^2", + "title_semantic", + "description_semantic^2" + ], + "normalizer": "minmax" + } + } +} +---- + +The score breakdown would change to: + +* Lexical fields (50% of score): + * `title`: 60% of lexical fields group score, 30% of final score + * `description`: 40% of lexical fields group score, 20% of final score +* Semantic fields (50% of score): + * `title_semantic`: 33% of semantic fields group score, 16.5% of final score + * `description_semantic`: 66% of semantic fields group score, 33% of final score + +[[multi-field-wildcard-field-patterns]] +==== Wildcard field patterns + +Field names support the `*` wildcard character to match multiple fields: + +[source,console] +---- +GET books/_search +{ + "retriever": { + "rrf": { + "query": "machine learning", + "fields": [ + "title*", <1> + "*_text" <2> + ] + } + } +} +---- + +<1> Match fields that start with `title` +<2> Match fields that end with `_text` + +Note, however, that wildcard field patterns will only resolve to fields that either: + +- Support term queries, such as `keyword` and `text` fields +- Are `semantic_text` fields + +==== Limitations + +- **Single index**: Multi-field queries currently work with single index searches only +- **CCS (Cross Cluster Search)**: Multi-field queries do not support remote cluster searches + +==== Examples + +- <> +- <> + + [discrete] [[retriever-common-parameters]] === Common usage guidelines From 89dc8fa45c7ffc697434daefa2c507742d60ac31 Mon Sep 17 00:00:00 2001 From: Mike Pellegrini Date: Tue, 8 Jul 2025 15:31:32 -0400 Subject: [PATCH 03/13] Remove markdown anchors --- docs/reference/search/retriever.asciidoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/reference/search/retriever.asciidoc b/docs/reference/search/retriever.asciidoc index 07bf02d2c407a..d2ac789b1ea4e 100644 --- a/docs/reference/search/retriever.asciidoc +++ b/docs/reference/search/retriever.asciidoc @@ -922,14 +922,14 @@ GET movies/_search [discrete] [[multi-field-query-format]] -=== Multi-field query format [multi-field-query-format] +=== Multi-field query format The `linear` and `rrf` retrievers support a multi-field query format that provides a simplified way to define searches across multiple fields without explicitly specifying inner retrievers. This format automatically generates appropriate inner retrievers based on the field types and query parameters. This is a great way to search an index, knowing little to nothing about its schema, while also handling normalization across lexical and semantic matches. [[multi-field-field-grouping]] -==== Field grouping [multi-field-field-grouping] +==== Field grouping The multi-field query format groups queried fields into two categories: From 85faa1f7a948b94ea81053d01cfaedee02eb2b20 Mon Sep 17 00:00:00 2001 From: Mike Pellegrini Date: Tue, 8 Jul 2025 15:44:24 -0400 Subject: [PATCH 04/13] Formatting fixes --- docs/reference/search/retriever.asciidoc | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/docs/reference/search/retriever.asciidoc b/docs/reference/search/retriever.asciidoc index d2ac789b1ea4e..9a145b9304ded 100644 --- a/docs/reference/search/retriever.asciidoc +++ b/docs/reference/search/retriever.asciidoc @@ -289,9 +289,11 @@ The `linear` retriever supports the following normalizers: * `none`: No normalization * `minmax`: Normalizes scores based on the following formula: - ``` - score = (score - min) / (max - min) - ``` ++ +.... +score = (score - min) / (max - min) +.... + * `l2_norm`: Normalizes scores using the L2 norm of the score values [[rrf-retriever]] @@ -1028,11 +1030,11 @@ GET books/_search The score breakdown would be: * Lexical fields (50% of score): - * `title`: 50% of lexical fields group score, 25% of final score - * `description`: 50% of lexical fields group score, 25% of final score + ** `title`: 50% of lexical fields group score, 25% of final score + ** `description`: 50% of lexical fields group score, 25% of final score * Semantic fields (50% of score): - * `title_semantic`: 50% of semantic fields group score, 25% of final score - * `description_semantic`: 50% of semantic fields group score, 25% of final score + ** `title_semantic`: 50% of semantic fields group score, 25% of final score + ** `description_semantic`: 50% of semantic fields group score, 25% of final score If we apply per-field boosts like so: @@ -1058,11 +1060,11 @@ GET books/_search The score breakdown would change to: * Lexical fields (50% of score): - * `title`: 60% of lexical fields group score, 30% of final score - * `description`: 40% of lexical fields group score, 20% of final score + ** `title`: 60% of lexical fields group score, 30% of final score + ** `description`: 40% of lexical fields group score, 20% of final score * Semantic fields (50% of score): - * `title_semantic`: 33% of semantic fields group score, 16.5% of final score - * `description_semantic`: 66% of semantic fields group score, 33% of final score + ** `title_semantic`: 33% of semantic fields group score, 16.5% of final score + ** `description_semantic`: 66% of semantic fields group score, 33% of final score [[multi-field-wildcard-field-patterns]] ==== Wildcard field patterns From 55352fbd0f364f2546956992dbda40422ff58b5b Mon Sep 17 00:00:00 2001 From: Mike Pellegrini Date: Tue, 8 Jul 2025 15:48:52 -0400 Subject: [PATCH 05/13] Fix rank_window_size default value --- docs/reference/rest-api/common-parms.asciidoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/reference/rest-api/common-parms.asciidoc b/docs/reference/rest-api/common-parms.asciidoc index f07f5e8bf8dff..ac1206d8af6d5 100644 --- a/docs/reference/rest-api/common-parms.asciidoc +++ b/docs/reference/rest-api/common-parms.asciidoc @@ -1337,7 +1337,7 @@ This value determines the size of the individual result sets per query. A higher value will improve result relevance at the cost of performance. The final ranked result set is pruned down to the search request's <>. `rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`. -Defaults to the `size` parameter. +Defaults to 10. end::compound-retriever-rank-window-size[] tag::compound-retriever-filter[] @@ -1374,7 +1374,7 @@ If not specified, uses the index's default fields from the `index.query.default_ The normalizer to use when using the <>. See <> for supported values. Required when `query` is specified. - ++ [WARNING] ==== Avoid using `none` as that will disable normalization and may bias the result set towards lexical matches. From 57df361f271d8f9ab5d45c95e8ebea57bc6c090f Mon Sep 17 00:00:00 2001 From: Mike Pellegrini Date: Tue, 8 Jul 2025 16:07:22 -0400 Subject: [PATCH 06/13] Updated RRF retriever param documentation --- docs/reference/rest-api/common-parms.asciidoc | 20 ++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/docs/reference/rest-api/common-parms.asciidoc b/docs/reference/rest-api/common-parms.asciidoc index ac1206d8af6d5..8c1a922eeb8a3 100644 --- a/docs/reference/rest-api/common-parms.asciidoc +++ b/docs/reference/rest-api/common-parms.asciidoc @@ -1310,8 +1310,26 @@ See <>. end::wait_for_active_shards[] tag::rrf-retrievers[] + +[NOTE] +==== +Either `query` or `retrievers` must be specified. +Combining `query` and `retrievers` is not supported. +==== + +`query`:: +(Optional, String) ++ +The query to use when using the <>. + +`fields`:: +(Optional, array of strings) ++ +The fields to query when using the <>. +If not specified, uses the index's default fields from the `index.query.default_field` index setting, which is `*` by default. + `retrievers`:: -(Required, array of retriever objects) +(Optional, array of retriever objects) + A list of child retrievers to specify which sets of returned top documents will have the RRF formula applied to them. Each child retriever carries an From cfe645821dbf3166f1f776981a2ddbefe9edcb28 Mon Sep 17 00:00:00 2001 From: Mike Pellegrini Date: Tue, 8 Jul 2025 16:36:02 -0400 Subject: [PATCH 07/13] Added RRF with the multi-field query format example --- .../retrievers-examples.asciidoc | 184 ++++++++++++++++-- 1 file changed, 165 insertions(+), 19 deletions(-) diff --git a/docs/reference/search/search-your-data/retrievers-examples.asciidoc b/docs/reference/search/search-your-data/retrievers-examples.asciidoc index 5ff97673b8926..c17e5e6dfcdcb 100644 --- a/docs/reference/search/search-your-data/retrievers-examples.asciidoc +++ b/docs/reference/search/search-your-data/retrievers-examples.asciidoc @@ -30,7 +30,11 @@ PUT retrievers_example } }, "text": { - "type": "text" + "type": "text", + "copy_to": "text_semantic" + }, + "text_semantic": { + "type": "semantic_text" }, "year": { "type": "integer" @@ -285,32 +289,32 @@ This returns the following response based on the normalized weighted score for e "value": 3, "relation": "eq" }, - "max_score": -1, + "max_score": 3.5, "hits": [ { "_index": "retrievers_example", "_id": "2", - "_score": -1 + "_score": 3.5 }, { "_index": "retrievers_example", "_id": "1", - "_score": -2 + "_score": 2.3 }, { "_index": "retrievers_example", "_id": "3", - "_score": -3 + "_score": 0.1 } ] } } ---- // TESTRESPONSE[s/"took": 42/"took": $body.took/] -// TESTRESPONSE[s/"max_score": -1/"max_score": $body.hits.max_score/] -// TESTRESPONSE[s/"_score": -1/"_score": $body.hits.hits.0._score/] -// TESTRESPONSE[s/"_score": -2/"_score": $body.hits.hits.1._score/] -// TESTRESPONSE[s/"_score": -3/"_score": $body.hits.hits.2._score/] +// TESTRESPONSE[s/"max_score": 3.5/"max_score": $body.hits.max_score/] +// TESTRESPONSE[s/"_score": 3.5/"_score": $body.hits.hits.0._score/] +// TESTRESPONSE[s/"_score": 2.3/"_score": $body.hits.hits.1._score/] +// TESTRESPONSE[s/"_score": 0.1/"_score": $body.hits.hits.2._score/] ============== By normalizing scores and leveraging `function_score` queries, we can also implement more complex ranking strategies, @@ -402,38 +406,180 @@ Which would return the following results: "value": 4, "relation": "eq" }, - "max_score": -1, + "max_score": 3.5, "hits": [ { "_index": "retrievers_example", "_id": "3", - "_score": -1 + "_score": 3.5 }, { "_index": "retrievers_example", "_id": "2", - "_score": -2 + "_score": 2.0 }, { "_index": "retrievers_example", "_id": "4", - "_score": -3 + "_score": 1.1 }, { "_index": "retrievers_example", "_id": "1", - "_score": -4 + "_score": 0.1 } ] } } ---- // TESTRESPONSE[s/"took": 42/"took": $body.took/] -// TESTRESPONSE[s/"max_score": -1/"max_score": $body.hits.max_score/] -// TESTRESPONSE[s/"_score": -1/"_score": $body.hits.hits.0._score/] -// TESTRESPONSE[s/"_score": -2/"_score": $body.hits.hits.1._score/] -// TESTRESPONSE[s/"_score": -3/"_score": $body.hits.hits.2._score/] -// TESTRESPONSE[s/"_score": -4/"_score": $body.hits.hits.3._score/] +// TESTRESPONSE[s/"max_score": 3.5/"max_score": $body.hits.max_score/] +// TESTRESPONSE[s/"_score": 3.5/"_score": $body.hits.hits.0._score/] +// TESTRESPONSE[s/"_score": 2.0/"_score": $body.hits.hits.1._score/] +// TESTRESPONSE[s/"_score": 1.1/"_score": $body.hits.hits.2._score/] +// TESTRESPONSE[s/"_score": 0.1/"_score": $body.hits.hits.3._score/] +============== + +[discrete] +[[retrievers-examples-rrf-multi-field-query-format]] +==== Example: RRF with the multi-field query format + +There's an even simpler way to execute a hybrid search though: We can use the <>, which allows us to query multiple fields without explicitly specifying inner retrievers. +One of the major challenges with hybrid search is normalizing the scores across matches on all field types. +Scores from <> and <> fields don't always fall in the same range, so we need to normalize the ranks across matches on these fields to generate a result set. +For example, BM25 scores from `text` fields are unbounded, while vector similarity scores from `text_embedding` models are bounded between [0, 1]. +The multi-field query format <>. + +The following example uses the multi-field query format to query every field specified in the `index.query.default_field` index setting, which is set to `*` by default. +This default value will cause the retriever to query every field that either: + +- Supports term queries, such as `keyword` and `text` fields +- Is a `semantic_text` field + +In this example, that would translate to the `text`, `text_semantic`, `year`, `topic`, and `timestamp` fields. + +[source,console] +---- +GET /retrievers_example/_search +{ + "retriever": { + "rrf": { + "query": "artificial intelligence" + } + } +} +---- + +This returns the following response based on the final rrf score for each result. + +.Example response +[%collapsible] +============== +[source,console-result] +---- +{ + "took": 42, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 3, + "relation": "eq" + }, + "max_score": 0.8333334, + "hits": [ + { + "_index": "retrievers_example", + "_id": "1", + "_score": 0.8333334 + }, + { + "_index": "retrievers_example", + "_id": "2", + "_score": 0.8333334 + }, + { + "_index": "retrievers_example", + "_id": "3", + "_score": 0.25 + } + ] + } +} +---- +// TESTRESPONSE[skip:Requires inference] +============== + +We can also use the `fields` parameter to explicitly specify the fields to query. +The following example uses the multi-field query format to query the `text` and `text_semantic` fields. + +[source,console] +---- +GET /retrievers_example/_search +{ + "retriever": { + "rrf": { + "query": "artificial intelligence", + "fields": ["text", "text_semantic"] + } + } +} +---- + +[NOTE] +==== +The `fields` parameter also accepts <>. +==== + +This returns the following response based on the final rrf score for each result. + +.Example response +[%collapsible] +============== +[source,console-result] +---- +{ + "took": 42, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 3, + "relation": "eq" + }, + "max_score": 0.8333334, + "hits": [ + { + "_index": "retrievers_example", + "_id": "1", + "_score": -4 + "_score": 0.8333334 + }, + { + "_index": "retrievers_example", + "_id": "2", + "_score": 0.8333334 + }, + { + "_index": "retrievers_example", + "_id": "3", + "_score": 0.25 + } + ] + } +} +---- +// TESTRESPONSE[skip:Requires inference] ============== [discrete] From 3bcf0c745ef9b980c98e3a4d21b30ade7918d26c Mon Sep 17 00:00:00 2001 From: Mike Pellegrini Date: Tue, 8 Jul 2025 16:45:22 -0400 Subject: [PATCH 08/13] Added linear retriever with the multi-field query format example --- .../retrievers-examples.asciidoc | 71 +++++++++++++++++++ 1 file changed, 71 insertions(+) diff --git a/docs/reference/search/search-your-data/retrievers-examples.asciidoc b/docs/reference/search/search-your-data/retrievers-examples.asciidoc index c17e5e6dfcdcb..509760597f1d5 100644 --- a/docs/reference/search/search-your-data/retrievers-examples.asciidoc +++ b/docs/reference/search/search-your-data/retrievers-examples.asciidoc @@ -582,6 +582,77 @@ This returns the following response based on the final rrf score for each result // TESTRESPONSE[skip:Requires inference] ============== +[discrete] +[[retrievers-examples-linear-multi-field-query-format]] +==== Example: Linear retriever with the multi-field query format + +We can also use the <> with the `linear` retriever. +It works much the same way as <>, with a couple key differences: + +- We can use `^` notation to specify a <> +- We must set the `normalizer` parameter to specify the normalization method used to combine <> + +The following example uses the `linear` retriever to query the `text`, `text_semantic`, and `topic` fields, with a boost of 2 on the `topic` field: + +[source,console] +---- +GET /retrievers_example/_search +{ + "retriever": { + "linear": { + "query": "artificial intelligence", + "fields": ["text", "text_semantic", "topic^2"], + "normalizer": "minmax" + } + } +} +---- + +This returns the following response based on the normalized score for each result: + +.Example response +[%collapsible] +============== +[source,console-result] +---- +{ + "took": 42, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 3, + "relation": "eq" + }, + "max_score": 2.0, + "hits": [ + { + "_index": "retrievers_example", + "_id": "2", + "_score": 2.0 + }, + { + "_index": "retrievers_example", + "_id": "1", + "_score": 1.2 + }, + { + "_index": "retrievers_example", + "_id": "3", + "_score": 0.1 + } + ] + } +} +---- +// TESTRESPONSE[skip:Requires inference] +============== + [discrete] [[retrievers-examples-collapsing-retriever-results]] ==== Example: Grouping results by year with `collapse` From 4de438a8047eb6722ca4c4264dc7e2fd99244fd1 Mon Sep 17 00:00:00 2001 From: Mike Pellegrini Date: Wed, 9 Jul 2025 10:38:27 -0400 Subject: [PATCH 09/13] Fix tests in retrievers-examples --- .../retrievers-examples.asciidoc | 93 +++++++++++++++---- 1 file changed, 73 insertions(+), 20 deletions(-) diff --git a/docs/reference/search/search-your-data/retrievers-examples.asciidoc b/docs/reference/search/search-your-data/retrievers-examples.asciidoc index 509760597f1d5..a1b949f681595 100644 --- a/docs/reference/search/search-your-data/retrievers-examples.asciidoc +++ b/docs/reference/search/search-your-data/retrievers-examples.asciidoc @@ -466,9 +466,11 @@ GET /retrievers_example/_search "rrf": { "query": "artificial intelligence" } - } + }, + "_source": false } ---- +// TEST[continued] This returns the following response based on the final rrf score for each result. @@ -488,31 +490,47 @@ This returns the following response based on the final rrf score for each result }, "hits": { "total": { - "value": 3, + "value": 5, "relation": "eq" }, "max_score": 0.8333334, "hits": [ { "_index": "retrievers_example", - "_id": "1", + "_id": "2", "_score": 0.8333334 }, { "_index": "retrievers_example", - "_id": "2", - "_score": 0.8333334 + "_id": "3", + "_score": 0.82 }, { "_index": "retrievers_example", - "_id": "3", + "_id": "4", + "_score": 0.48 + }, + { + "_index": "retrievers_example", + "_id": "1", + "_score": 0.40 + }, + { + "_index": "retrievers_example", + "_id": "5", "_score": 0.25 } ] } } ---- -// TESTRESPONSE[skip:Requires inference] +// TESTRESPONSE[s/"took": 42/"took": $body.took/] +// TESTRESPONSE[s/"max_score": 0.8333334/"max_score": $body.hits.max_score/] +// TESTRESPONSE[s/"_score": 0.8333334/"_score": $body.hits.hits.0._score/] +// TESTRESPONSE[s/"_score": 0.82/"_score": $body.hits.hits.1._score/] +// TESTRESPONSE[s/"_score": 0.48/"_score": $body.hits.hits.2._score/] +// TESTRESPONSE[s/"_score": 0.40/"_score": $body.hits.hits.3._score/] +// TESTRESPONSE[s/"_score": 0.25/"_score": $body.hits.hits.4._score/] ============== We can also use the `fields` parameter to explicitly specify the fields to query. @@ -527,9 +545,11 @@ GET /retrievers_example/_search "query": "artificial intelligence", "fields": ["text", "text_semantic"] } - } + }, + "_source": false } ---- +// TEST[continued] [NOTE] ==== @@ -554,32 +574,47 @@ This returns the following response based on the final rrf score for each result }, "hits": { "total": { - "value": 3, + "value": 5, "relation": "eq" }, "max_score": 0.8333334, "hits": [ { "_index": "retrievers_example", - "_id": "1", - "_score": -4 + "_id": "2", "_score": 0.8333334 }, { "_index": "retrievers_example", - "_id": "2", - "_score": 0.8333334 + "_id": "3", + "_score": 0.82 }, { "_index": "retrievers_example", - "_id": "3", + "_id": "4", + "_score": 0.48 + }, + { + "_index": "retrievers_example", + "_id": "1", + "_score": 0.40 + }, + { + "_index": "retrievers_example", + "_id": "5", "_score": 0.25 } ] } } ---- -// TESTRESPONSE[skip:Requires inference] +// TESTRESPONSE[s/"took": 42/"took": $body.took/] +// TESTRESPONSE[s/"max_score": 0.8333334/"max_score": $body.hits.max_score/] +// TESTRESPONSE[s/"_score": 0.8333334/"_score": $body.hits.hits.0._score/] +// TESTRESPONSE[s/"_score": 0.82/"_score": $body.hits.hits.1._score/] +// TESTRESPONSE[s/"_score": 0.48/"_score": $body.hits.hits.2._score/] +// TESTRESPONSE[s/"_score": 0.40/"_score": $body.hits.hits.3._score/] +// TESTRESPONSE[s/"_score": 0.25/"_score": $body.hits.hits.4._score/] ============== [discrete] @@ -604,9 +639,11 @@ GET /retrievers_example/_search "fields": ["text", "text_semantic", "topic^2"], "normalizer": "minmax" } - } + }, + "_source": false } ---- +// TEST[continued] This returns the following response based on the normalized score for each result: @@ -626,7 +663,7 @@ This returns the following response based on the normalized score for each resul }, "hits": { "total": { - "value": 3, + "value": 5, "relation": "eq" }, "max_score": 2.0, @@ -638,19 +675,35 @@ This returns the following response based on the normalized score for each resul }, { "_index": "retrievers_example", - "_id": "1", + "_id": "3", "_score": 1.2 }, { "_index": "retrievers_example", - "_id": "3", + "_id": "4", + "_score": 1.0 + }, + { + "_index": "retrievers_example", + "_id": "1", + "_score": 0.8 + }, + { + "_index": "retrievers_example", + "_id": "5", "_score": 0.1 } ] } } ---- -// TESTRESPONSE[skip:Requires inference] +// TESTRESPONSE[s/"took": 42/"took": $body.took/] +// TESTRESPONSE[s/"max_score": 2.0/"max_score": $body.hits.max_score/] +// TESTRESPONSE[s/"_score": 2.0/"_score": $body.hits.hits.0._score/] +// TESTRESPONSE[s/"_score": 1.2/"_score": $body.hits.hits.1._score/] +// TESTRESPONSE[s/"_score": 1.0/"_score": $body.hits.hits.2._score/] +// TESTRESPONSE[s/"_score": 0.8/"_score": $body.hits.hits.3._score/] +// TESTRESPONSE[s/"_score": 0.1/"_score": $body.hits.hits.4._score/] ============== [discrete] From 127a31495f8cf28765546b7b30be5ad1349c4442 Mon Sep 17 00:00:00 2001 From: Mike Pellegrini Date: Wed, 9 Jul 2025 13:08:44 -0400 Subject: [PATCH 10/13] Test: Remove troublesome find/replace --- .../search/search-your-data/retrievers-examples.asciidoc | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/reference/search/search-your-data/retrievers-examples.asciidoc b/docs/reference/search/search-your-data/retrievers-examples.asciidoc index a1b949f681595..de32d420c429b 100644 --- a/docs/reference/search/search-your-data/retrievers-examples.asciidoc +++ b/docs/reference/search/search-your-data/retrievers-examples.asciidoc @@ -1617,8 +1617,6 @@ The output of which, albeit a bit verbose, will provide all the necessary info t } ---- // TESTRESPONSE[s/"took": 42/"took": $body.took/] -// TESTRESPONSE[s/\.\.\./$body.hits.hits.0._explanation.details.1.details.0.details.0.details.0.details.0.details.0/] -// TESTRESPONSE[s/\*\*\*/$body.hits.hits.0._explanation.details.1.details.0.details.0.details.0.details.1.details.0/] // TESTRESPONSE[s/jnrdZFKS3abUgWVsVdj2Vg/$body.hits.hits.0._node/] ============== From c686dabe6c46ee3328290008c513b825a8c74ab9 Mon Sep 17 00:00:00 2001 From: Mike Pellegrini Date: Wed, 9 Jul 2025 15:55:40 -0400 Subject: [PATCH 11/13] Fixed explain example test --- .../retrievers-examples.asciidoc | 93 ++++++++++++------- 1 file changed, 58 insertions(+), 35 deletions(-) diff --git a/docs/reference/search/search-your-data/retrievers-examples.asciidoc b/docs/reference/search/search-your-data/retrievers-examples.asciidoc index de32d420c429b..e0a97c8ffc896 100644 --- a/docs/reference/search/search-your-data/retrievers-examples.asciidoc +++ b/docs/reference/search/search-your-data/retrievers-examples.asciidoc @@ -1547,59 +1547,65 @@ The output of which, albeit a bit verbose, will provide all the necessary info t "_score": 0.5, "_explanation": { "value": 0.5, - "description": "rrf score: [0.5] computed for initial ranks [0, 1] with rankConstant: [1] as sum of [1 / (rank + rankConstant)] for each query", + "description": "sum of:", "details": [ { - "value": 0.0, - "description": "rrf score: [0], result not found in query at index [0]", - "details": [] - }, - { - "value": 1, - "description": "rrf score: [0.5], for rank [1] in query at index [1] computed as [1 / (1 + 1)], for matching query with score", + "value": 0.5, + "description": "rrf score: [0.5] computed for initial ranks [0, 1] with rankConstant: [1] as sum of [1 / (rank + rankConstant)] for each query", "details": [ { - "value": 0.8333334, - "description": "rrf score: [0.8333334] computed for initial ranks [2, 1] with rankConstant: [1] as sum of [1 / (rank + rankConstant)] for each query", + "value": 0.0, + "description": "rrf score: [0], result not found in query at index [0]", + "details": [] + }, + { + "value": 1, + "description": "rrf score: [0.5], for rank [1] in query at index [1] computed as [1 / (1 + 1)], for matching query with score", "details": [ { - "value": 2, - "description": "rrf score: [0.33333334], for rank [2] in query at index [0] computed as [1 / (2 + 1)], for matching query with score", + "value": 0.8333334, + "description": "rrf score: [0.8333334] computed for initial ranks [2, 1] with rankConstant: [1] as sum of [1 / (rank + rankConstant)] for each query", "details": [ { - "value": 2.8129659, - "description": "sum of:", + "value": 2, + "description": "rrf score: [0.33333334], for rank [2] in query at index [0] computed as [1 / (2 + 1)], for matching query with score", "details": [ { - "value": 1.4064829, - "description": "weight(text:information in 0) [PerFieldSimilarity], result of:", - "details": [ - *** - ] - }, - { - "value": 1.4064829, - "description": "weight(text:retrieval in 0) [PerFieldSimilarity], result of:", + "value": 2.8129659, + "description": "sum of:", "details": [ - *** + { + "value": 1.4064829, + "description": "weight(text:information in 1) [PerFieldSimilarity], result of:", + "details": [ + *** + ] + }, + { + "value": 1.4064829, + "description": "weight(text:retrieval in 1) [PerFieldSimilarity], result of:", + "details": [ + *** + ] + } ] } ] - } - ] - }, - { - "value": 1, - "description": "rrf score: [0.5], for rank [1] in query at index [1] computed as [1 / (1 + 1)], for matching query with score", - "details": [ + }, { "value": 1, - "description": "doc [0] with an original score of [1.0] is at rank [1] from the following source queries.", + "description": "rrf score: [0.5], for rank [1] in query at index [1] computed as [1 / (1 + 1)], for matching query with score", "details": [ { - "value": 1.0, - "description": "found vector with calculated similarity: 1.0", - "details": [] + "value": 1, + "description": "doc [1] with an original score of [1.0] is at rank [1] from the following source queries.", + "details": [ + { + "value": 1.0, + "description": "found vector with calculated similarity: 1.0", + "details": [] + } + ] } ] } @@ -1608,6 +1614,22 @@ The output of which, albeit a bit verbose, will provide all the necessary info t ] } ] + }, + { + "value": 0.0, + "description": "match on required clause, product of:", + "details": [ + { + "value": 0.0, + "description": "# clause", + "details": [] + }, + { + "value": 1.0, + "description": "FieldExistsQuery [field=_primary_term]", + "details": [] + } + ] } ] } @@ -1617,6 +1639,7 @@ The output of which, albeit a bit verbose, will provide all the necessary info t } ---- // TESTRESPONSE[s/"took": 42/"took": $body.took/] +// TESTRESPONSE[s/\*\*\*/$body.hits.hits.0._explanation.details.0.details.1.details.0.details.0.details.0.details.1.details.0/] // TESTRESPONSE[s/jnrdZFKS3abUgWVsVdj2Vg/$body.hits.hits.0._node/] ============== From 3aec0eefcda86a1182bf472747eaeca49bae9044 Mon Sep 17 00:00:00 2001 From: Mike Pellegrini Date: Wed, 9 Jul 2025 16:08:59 -0400 Subject: [PATCH 12/13] Fixed tests in retriever --- docs/reference/search/retriever.asciidoc | 29 ++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/docs/reference/search/retriever.asciidoc b/docs/reference/search/retriever.asciidoc index 9a145b9304ded..01ca29c9d1704 100644 --- a/docs/reference/search/retriever.asciidoc +++ b/docs/reference/search/retriever.asciidoc @@ -121,6 +121,28 @@ POST /restaurants/_bulk?refresh PUT /movies +PUT /books +{ + "mappings": { + "properties": { + "title": { + "type": "text", + "copy_to": "title_semantic" + }, + "description": { + "type": "text", + "copy_to": "description_semantic" + }, + "title_semantic": { + "type": "semantic_text" + }, + "description_semantic": { + "type": "semantic_text" + } + } + } +} + PUT _query_rules/my-ruleset { "rules": [ @@ -151,6 +173,8 @@ PUT _query_rules/my-ruleset DELETE /restaurants DELETE /movies + +DELETE /books -------------------------------------------------- // TEARDOWN //// @@ -971,6 +995,7 @@ GET books/_search } } ---- +// TEST[continued] <1> 3x weight <2> 2x weight @@ -1005,6 +1030,7 @@ PUT /books } } ---- +// TEST[skip:index created in test setup] And we run this query: @@ -1026,6 +1052,7 @@ GET books/_search } } ---- +// TEST[continued] The score breakdown would be: @@ -1056,6 +1083,7 @@ GET books/_search } } ---- +// TEST[continued] The score breakdown would change to: @@ -1086,6 +1114,7 @@ GET books/_search } } ---- +// TEST[continued] <1> Match fields that start with `title` <2> Match fields that end with `_text` From 7db83bf8d406c28dac5163f8f91a6713acb4c49b Mon Sep 17 00:00:00 2001 From: Mike Pellegrini Date: Thu, 10 Jul 2025 08:26:25 -0400 Subject: [PATCH 13/13] Fix heading anchors --- docs/reference/search/retriever.asciidoc | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/docs/reference/search/retriever.asciidoc b/docs/reference/search/retriever.asciidoc index 01ca29c9d1704..c8636162af0e0 100644 --- a/docs/reference/search/retriever.asciidoc +++ b/docs/reference/search/retriever.asciidoc @@ -954,6 +954,7 @@ The `linear` and `rrf` retrievers support a multi-field query format that provid This format automatically generates appropriate inner retrievers based on the field types and query parameters. This is a great way to search an index, knowing little to nothing about its schema, while also handling normalization across lexical and semantic matches. +[discrete] [[multi-field-field-grouping]] ==== Field grouping @@ -972,6 +973,7 @@ In the `linear` retriever, this grouping relies on using a normalizer other than If you use the `none` normalizer, the scores across field groups will not be normalized and the results may be biased towards lexical field matches. ==== +[discrete] [[multi-field-field-boosting]] ==== Linear retriever field boosting @@ -1094,6 +1096,7 @@ The score breakdown would change to: ** `title_semantic`: 33% of semantic fields group score, 16.5% of final score ** `description_semantic`: 66% of semantic fields group score, 33% of final score +[discrete] [[multi-field-wildcard-field-patterns]] ==== Wildcard field patterns @@ -1124,11 +1127,15 @@ Note, however, that wildcard field patterns will only resolve to fields that eit - Support term queries, such as `keyword` and `text` fields - Are `semantic_text` fields +[discrete] +[[multi-field-limitations]] ==== Limitations - **Single index**: Multi-field queries currently work with single index searches only - **CCS (Cross Cluster Search)**: Multi-field queries do not support remote cluster searches +[discrete] +[[multi-field-examples]] ==== Examples - <>