You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -258,13 +258,55 @@ A retriever that normalizes and linearly combines the scores of other retrievers
258
258
259
259
#### Parameters [linear-retriever-parameters]
260
260
261
+
::::{note}
262
+
Either `query` or `retrievers` must be specified.
263
+
Combining `query` and `retrievers` is not supported.
264
+
::::
265
+
266
+
`query` {applies_to}`stack: ga 9.1`
267
+
: (Optional, String)
268
+
269
+
The query to use when using the [multi-field query format](#multi-field-query-format).
270
+
271
+
`fields` {applies_to}`stack: ga 9.1`
272
+
: (Optional, array of strings)
273
+
274
+
The fields to query when using the [multi-field query format](#multi-field-query-format).
275
+
Fields can include boost values using the `^` notation (e.g., `"field^2"`).
276
+
If not specified, uses the index's default fields from the `index.query.default_field` index setting, which is `*` by default.
277
+
278
+
`normalizer` {applies_to}`stack: ga 9.1`
279
+
: (Optional, String)
280
+
281
+
The normalizer to use when using the [multi-field query format](#multi-field-query-format).
282
+
See [normalizers](#linear-retriever-normalizers) for supported values.
283
+
Required when `query` is specified.
284
+
285
+
::::{warning}
286
+
Avoid using `none` as that will disable normalization and may bias the result set towards lexical matches.
287
+
See [field grouping](#multi-field-field-grouping) for more information.
288
+
::::
289
+
261
290
`retrievers`
262
-
: (Required, array of objects)
291
+
: (Optional, array of objects)
292
+
293
+
A list of the sub-retrievers' configuration, that we will take into account and whose result sets we will merge through a weighted sum.
294
+
Each configuration can have a different weight and normalization depending on the specified retriever.
263
295
264
-
A list of the sub-retrievers' configuration, that we will take into account and whose result sets we will merge through a weighted sum. Each configuration can have a different weight and normalization depending on the specified retriever.
296
+
`rank_window_size`
297
+
: (Optional, integer)
298
+
299
+
This value determines the size of the individual result sets per query. A higher value will improve result relevance at the cost of performance.
300
+
The final ranked result set is pruned down to the search request’s [size](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#search-size-param).
301
+
`rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`.
302
+
Defaults to 10.
303
+
304
+
`filter`
305
+
: (Optional, [query object or list of query objects](/reference/query-languages/querydsl.md))
265
306
307
+
Applies the specified [boolean query filter](/reference/query-languages/query-dsl/query-dsl-bool-query.md) to all of the specified sub-retrievers, according to each retriever’s specifications.
266
308
267
-
Each entry specifies the following parameters:
309
+
Each entry in the `retrievers` array specifies the following parameters:
268
310
269
311
`retriever`
270
312
: (Required, a `retriever` object)
@@ -279,64 +321,74 @@ Each entry specifies the following parameters:
279
321
`normalizer`
280
322
: (Optional, String)
281
323
282
-
- Specifies how we will normalize the retriever’s scores, before applying the specified `weight`. Available values are: `minmax`, `l2_norm`, and `none`. Defaults to `none`.
324
+
Specifies how the retriever’s score will be normalized before applying the specified `weight`.
325
+
See [normalizers](#linear-retriever-normalizers) for supported values.
326
+
Defaults to `none`.
283
327
284
-
* `none`
285
-
* `minmax` : A `MinMaxScoreNormalizer` that normalizes scores based on the following formula
328
+
See also [this hybrid search example](docs-content://solutions/search/retrievers-examples.md#retrievers-examples-linear-retriever) using a linear retriever on how to independently configure and apply normalizers to retrievers.
286
329
287
-
```
288
-
score = (score - min) / (max - min)
289
-
```
330
+
#### Normalizers [linear-retriever-normalizers]
290
331
291
-
* `l2_norm` : An `L2ScoreNormalizer` that normalizes scores using the L2 norm of the score values.
332
+
The `linear` retriever supports the following normalizers:
292
333
293
-
See also [this hybrid search example](docs-content://solutions/search/retrievers-examples.md#retrievers-examples-linear-retriever) using a linear retriever on how to independently configure and apply normalizers to retrievers.
334
+
*`none`: No normalization
335
+
*`minmax`: Normalizes scores based on the following formula:
294
336
295
-
`rank_window_size`
296
-
: (Optional, integer)
337
+
```
338
+
score = (score - min) / (max - min)
339
+
```
340
+
* `l2_norm`: Normalizes scores using the L2 norm of the score values
297
341
298
-
This value determines the size of the individual result sets per query. A higher value will improve result relevance at the cost of performance. The final ranked result set is pruned down to the search request’s [size](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#search-size-param). `rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`. Defaults to the `size` parameter.
299
342
343
+
## RRF Retriever [rrf-retriever]
300
344
301
-
`filter`
302
-
: (Optional, [query object or list of query objects](/reference/query-languages/querydsl.md))
345
+
An [RRF](/reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md) retriever returns top documents based on the RRF formula, equally weighting two or more child retrievers.
346
+
Reciprocal rank fusion (RRF) is a method for combining multiple result sets with different relevance indicators into a single result set.
303
347
304
-
Applies the specified [boolean query filter](/reference/query-languages/query-dsl/query-dsl-bool-query.md) to all of the specified sub-retrievers, according to each retriever’s specifications.
305
348
349
+
#### Parameters [rrf-retriever-parameters]
306
350
351
+
::::{note}
352
+
Either `query` or `retrievers` must be specified.
353
+
Combining `query` and `retrievers` is not supported.
354
+
::::
307
355
308
-
## RRF Retriever [rrf-retriever]
356
+
`query` {applies_to}`stack: ga 9.1`
357
+
: (Optional, String)
309
358
310
-
An [RRF](/reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md) retriever returns top documents based on the RRF formula, equally weighting two or more child retrievers. Reciprocal rank fusion (RRF) is a method for combining multiple result sets with different relevance indicators into a single result set.
359
+
The query to use when using the [multi-field query format](#multi-field-query-format).
311
360
361
+
`fields` {applies_to}`stack: ga 9.1`
362
+
: (Optional, array of strings)
312
363
313
-
#### Parameters [rrf-retriever-parameters]
364
+
The fields to query when using the [multi-field query format](#multi-field-query-format).
365
+
If not specified, uses the index's default fields from the `index.query.default_field` index setting, which is `*` by default.
314
366
315
367
`retrievers`
316
-
: (Required, array of retriever objects)
317
-
318
-
A list of child retrievers to specify which sets of returned top documents will have the RRF formula applied to them. Each child retriever carries an equal weight as part of the RRF formula. Two or more child retrievers are required.
368
+
: (Optional, array of retriever objects)
319
369
370
+
A list of child retrievers to specify which sets of returned top documents will have the RRF formula applied to them.
371
+
Each child retriever carries an equal weight as part of the RRF formula. Two or more child retrievers are required.
320
372
321
373
`rank_constant`
322
374
: (Optional, integer)
323
375
324
376
This value determines how much influence documents in individual result sets per query have over the final ranked result set. A higher value indicates that lower ranked documents have more influence. This value must be greater than or equal to `1`. Defaults to `60`.
325
377
326
-
327
378
`rank_window_size`
328
379
: (Optional, integer)
329
380
330
-
This value determines the size of the individual result sets per query. A higher value will improve result relevance at the cost of performance. The final ranked result set is pruned down to the search request’s [size](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#search-size-param). `rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`. Defaults to the `size` parameter.
331
-
381
+
This value determines the size of the individual result sets per query.
382
+
A higher value will improve result relevance at the cost of performance.
383
+
The final ranked result set is pruned down to the search request’s [size](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#search-size-param).
384
+
`rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`.
385
+
Defaults to 10.
332
386
333
387
`filter`
334
388
: (Optional, [query object or list of query objects](/reference/query-languages/querydsl.md))
335
389
336
390
Applies the specified [boolean query filter](/reference/query-languages/query-dsl/query-dsl-bool-query.md) to all of the specified sub-retrievers, according to each retriever’s specifications.
A simple hybrid search example (lexical search + dense vector search) combining a `standard` retriever with a `knn` retriever using RRF:
@@ -976,6 +1028,181 @@ GET /restaurants/_search
976
1028
}
977
1029
```
978
1030
1031
+
## Multi-field query format [multi-field-query-format]
1032
+
```yaml {applies_to}
1033
+
stack: ga 9.1
1034
+
```
1035
+
1036
+
The `linear` and `rrf` retrievers support a multi-field query format that provides a simplified way to define searches across multiple fields without explicitly specifying inner retrievers.
1037
+
This format automatically generates appropriate inner retrievers based on the field types and query parameters.
1038
+
This is a great way to search an index, knowing little to nothing about its schema, while also handling normalization across lexical and semantic matches.
1039
+
1040
+
### Field grouping [multi-field-field-grouping]
1041
+
1042
+
The multi-field query format groups queried fields into two categories:
1043
+
1044
+
- **Lexical fields**: fields that support term queries, such as `keyword` and `text` fields.
Each field group is queried separately and the scores/ranks are normalized such that each contributes 50% to the final score/rank.
1048
+
This balances the importance of lexical and semantic fields.
1049
+
Most indices contain more lexical than semantic fields, and without this grouping the results would often bias towards lexical field matches.
1050
+
1051
+
::::{warning}
1052
+
In the `linear` retriever, this grouping relies on using a normalizer other than `none` (i.e., `minmax` or `l2_norm`).
1053
+
If you use the `none` normalizer, the scores across field groups will not be normalized and the results may be biased towards lexical field matches.
1054
+
::::
1055
+
1056
+
### Linear retriever field boosting [multi-field-field-boosting]
1057
+
1058
+
When using the `linear` retriever, fields can be boosted using the `^` notation:
1059
+
1060
+
```console
1061
+
GET books/_search
1062
+
{
1063
+
"retriever": {
1064
+
"linear": {
1065
+
"query": "elasticsearch",
1066
+
"fields": [
1067
+
"title^3", <1>
1068
+
"description^2", <2>
1069
+
"title_semantic", <3>
1070
+
"description_semantic^2"
1071
+
],
1072
+
"normalizer": "minmax"
1073
+
}
1074
+
}
1075
+
}
1076
+
```
1077
+
1078
+
1. 3x weight
1079
+
2. 2x weight
1080
+
3. 1x weight (default)
1081
+
1082
+
Due to how the [field group scores](#multi-field-field-grouping) are normalized, per-field boosts have no effect on the range of the final score.
1083
+
Instead, they affect the importance of the field's score within its group.
1084
+
1085
+
For example, if the schema looks like:
1086
+
1087
+
```console
1088
+
PUT /books
1089
+
{
1090
+
"mappings": {
1091
+
"properties": {
1092
+
"title": {
1093
+
"type": "text",
1094
+
"copy_to": "title_semantic"
1095
+
},
1096
+
"description": {
1097
+
"type": "text",
1098
+
"copy_to": "description_semantic"
1099
+
},
1100
+
"title_semantic": {
1101
+
"type": "semantic_text"
1102
+
},
1103
+
"description_semantic": {
1104
+
"type": "semantic_text"
1105
+
}
1106
+
}
1107
+
}
1108
+
}
1109
+
```
1110
+
1111
+
And we run this query:
1112
+
1113
+
```console
1114
+
GET books/_search
1115
+
{
1116
+
"retriever": {
1117
+
"linear": {
1118
+
"query": "elasticsearch",
1119
+
"fields": [
1120
+
"title",
1121
+
"description",
1122
+
"title_semantic",
1123
+
"description_semantic"
1124
+
],
1125
+
"normalizer": "minmax"
1126
+
}
1127
+
}
1128
+
}
1129
+
```
1130
+
1131
+
The score breakdown would be:
1132
+
1133
+
* Lexical fields (50% of score):
1134
+
* `title`: 50% of lexical fields group score, 25% of final score
1135
+
* `description`: 50% of lexical fields group score, 25% of final score
1136
+
* Semantic fields (50% of score):
1137
+
* `title_semantic`: 50% of semantic fields group score, 25% of final score
1138
+
* `description_semantic`: 50% of semantic fields group score, 25% of final score
1139
+
1140
+
If we apply per-field boosts like so:
1141
+
1142
+
```console
1143
+
GET books/_search
1144
+
{
1145
+
"retriever": {
1146
+
"linear": {
1147
+
"query": "elasticsearch",
1148
+
"fields": [
1149
+
"title^3",
1150
+
"description^2",
1151
+
"title_semantic",
1152
+
"description_semantic^2"
1153
+
],
1154
+
"normalizer": "minmax"
1155
+
}
1156
+
}
1157
+
}
1158
+
```
1159
+
1160
+
The score breakdown would change to:
1161
+
1162
+
* Lexical fields (50% of score):
1163
+
* `title`: 60% of lexical fields group score, 30% of final score
1164
+
* `description`: 40% of lexical fields group score, 20% of final score
1165
+
* Semantic fields (50% of score):
1166
+
* `title_semantic`: 33% of semantic fields group score, 16.5% of final score
1167
+
* `description_semantic`: 66% of semantic fields group score, 33% of final score
1168
+
1169
+
### Wildcard field patterns [multi-field-wildcard-field-patterns]
1170
+
1171
+
Field names support the `*` wildcard character to match multiple fields:
1172
+
1173
+
```console
1174
+
GET books/_search
1175
+
{
1176
+
"retriever": {
1177
+
"rrf": {
1178
+
"query": "machine learning",
1179
+
"fields": [
1180
+
"title*", <1>
1181
+
"*_text" <2>
1182
+
]
1183
+
}
1184
+
}
1185
+
}
1186
+
```
1187
+
1188
+
1. Match fields that start with `title`
1189
+
2. Match fields that end with `_text`
1190
+
1191
+
Note, however, that wildcard field patterns will only resolve to fields that either:
1192
+
1193
+
- Support term queries, such as `keyword` and `text` fields
1194
+
- Are `semantic_text` fields
1195
+
1196
+
### Limitations
1197
+
1198
+
- **Single index**: Multi-field queries currently work with single index searches only
1199
+
- **CCS (Cross Cluster Search)**: Multi-field queries do not support remote cluster searches
1200
+
1201
+
### Examples
1202
+
1203
+
<!-- - [RRF with the multi-field query format](docs-content://solutions/search/retrievers-examples.md#retrievers-examples-rrf-multi-field-query-format) -->
1204
+
<!-- - [Linear retriever with the multi-field query format](docs-content://solutions/search/retrievers-examples.md#retrievers-examples-linear-multi-field-query-format) -->
1205
+
979
1206
## Common usage guidelines [retriever-common-parameters]
0 commit comments