Skip to content

Commit 9c0f439

Browse files
jbaieradakrone
andauthored
Add entry for ingest pipeline field access pattern feature (#2994)
Adds a docs section explaining the new field access pattern feature on ingest pipelines. --------- Co-authored-by: Lee Hinman <[email protected]>
1 parent dc4fa4e commit 9c0f439

File tree

1 file changed

+227
-1
lines changed

1 file changed

+227
-1
lines changed

manage-data/ingest/transform-enrich/ingest-pipelines.md

Lines changed: 227 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -388,7 +388,7 @@ PUT _ingest/pipeline/my-pipeline
388388
Use dot notation to access object fields.
389389

390390
::::{important}
391-
If your document contains flattened objects, use the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor to expand them first. Other ingest processors cannot access flattened objects.
391+
If your document contains flattened objects, use the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor to expand them. If you wish to maintain your document structure, use the [`flexible`](ingest-pipelines.md#access-source-pattern-flexible) access pattern in your pipeline definition. Otherwise Ingest processors cannot access dotted field names.
392392
::::
393393

394394

@@ -431,6 +431,232 @@ PUT _ingest/pipeline/my-pipeline
431431
}
432432
```
433433

434+
## Ingest field access pattern [access-source-pattern]
435+
```{applies_to}
436+
serverless: ga
437+
stack: ga 9.2
438+
```
439+
440+
The default ingest pipeline access pattern does not recognize dotted field names in documents. Retrieving flattened and dotted field names from an ingest document requires a different field retrieval algorithm that does not have this limitation. We know that some pipelines have come to rely on these dotted field name limitations in their logic. In order to continue supporting the original behavior while still adding support for dotted field names, ingest pipelines now support configuring an access pattern to use for all processors in the pipeline.
441+
442+
The `field_access_pattern` property on an ingest pipeline defines how ingest document fields are read and written for all processors in the current pipeline. It accepts two values: `classic` (which is the default) and `flexible`.
443+
444+
```console
445+
PUT _ingest/pipeline/my-pipeline
446+
{
447+
"field_access_pattern": "classic", <1>
448+
"processors": [
449+
{
450+
"set": {
451+
"description": "Set some searchable tags in our document's flattened field",
452+
"field": "event.tags.ingest.processed_by", <2>
453+
"value": "my-pipeline"
454+
}
455+
}
456+
]
457+
}
458+
```
459+
1. All processors in this pipeline will use the `classic` access pattern.
460+
2. The logic for resolving field paths used by processors to read and write values to ingest documents is based on the access pattern.
461+
462+
### Classic field access pattern [access-source-pattern-classic]
463+
464+
The `classic` access pattern is the default access pattern that has been around since ingest node first released. Field paths given to processors (e.g. `event.tags.ingest.processed_by`) are split on the dot character (`.`). The processor then uses the resulting field names to traverse the document until a value is found. When writing a value to a document, if its parent fields do not exist in the source, the processor will create nested objects for the missing fields.
465+
466+
```console
467+
POST /_ingest/pipeline/_simulate
468+
{
469+
"pipeline" : {
470+
"description": "example pipeline",
471+
"field_access_pattern": "classic", <1>
472+
"processors": [
473+
{
474+
"set" : {
475+
"description" : "Copy the foo.bar field into the a.b.c.d field if it exists",
476+
"copy_from" : "foo.bar", <2>
477+
"field" : "a.b.c.d", <3>
478+
"ignore_empty_value": true
479+
}
480+
}
481+
]
482+
},
483+
"docs": [
484+
{
485+
"_index": "index",
486+
"_id": "id",
487+
"_source": {
488+
"foo": {
489+
"bar": "baz" <4>
490+
}
491+
}
492+
},
493+
{
494+
"_index": "index",
495+
"_id": "id",
496+
"_source": {
497+
"foo.bar": "baz" <5>
498+
}
499+
}
500+
]
501+
}
502+
```
503+
1. Explicitly declaring to use the `classic` access pattern in the pipeline. This is the default value.
504+
2. We are reading a value from the field `foo.bar`.
505+
3. We are writing its value to the field `a.b.c.d`.
506+
4. This document uses nested json objects in its structure.
507+
5. This document uses dotted field names in its structure.
508+
509+
```console-result
510+
{
511+
"docs": [
512+
{
513+
"doc": {
514+
"_id": "id",
515+
"_index": "index",
516+
"_version": "-3",
517+
"_source": {
518+
"foo": {
519+
"bar": "baz" <1>
520+
},
521+
"a": {
522+
"b": {
523+
"c": {
524+
"d": "baz" <2>
525+
}
526+
}
527+
}
528+
},
529+
"_ingest": {
530+
"timestamp": "2017-05-04T22:30:03.187Z"
531+
}
532+
}
533+
},
534+
{
535+
"doc": {
536+
"_id": "id",
537+
"_index": "index",
538+
"_version": "-3",
539+
"_source": {
540+
"foo.bar": "baz" <3>
541+
},
542+
"_ingest": {
543+
"timestamp": "2017-05-04T22:30:03.188Z"
544+
}
545+
}
546+
}
547+
]
548+
}
549+
```
550+
1. The first document's `foo.bar` field is located, because it uses nested json. The processor looks for a `foo` field, and then a `bar` field.
551+
2. The value from the `foo.bar` field is written to a nested json structure at field `a.b.c.d`. The processor creates objects for each field in the path.
552+
3. The second document uses a dotted field name for `foo.bar`. The `classic` access pattern does not recognize dotted field names, and so nothing is copied.
553+
554+
If the documents you are ingesting contain dotted field names, to read them with the `classic` access pattern, you must use the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor. This approach is not always reasonable though. Consider the following document:
555+
556+
```json
557+
{
558+
"event": {
559+
"tags": {
560+
"http.host": "localhost:9200",
561+
"http.host.name": "localhost",
562+
"http.host.port": 9200
563+
}
564+
}
565+
}
566+
```
567+
If the `event.tags` field was processed with the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor, the field values would collide. The `http.host` field cannot be a text value and an object value at the same time.
568+
569+
### Flexible field access pattern [access-source-pattern-flexible]
570+
571+
The `flexible` access pattern allows for ingest pipelines to access both nested and dotted field names without using the [`dot_expander`](elasticsearch://reference/enrich-processor/dot-expand-processor.md) processor. Additionally, when writing a value to a field that does not exist, any parent fields that are missing are concatenated to the start of the new key. Use the `flexible` access pattern if your documents have dotted field names, and also if you prefer to write missing fields to the document with dotted names.
572+
573+
```console
574+
POST /_ingest/pipeline/_simulate
575+
{
576+
"pipeline" : {
577+
"description": "example pipeline",
578+
"field_access_pattern": "flexible", <1>
579+
"processors": [
580+
{
581+
"set" : {
582+
"description" : "Copy the foo.bar field into the a.b.c.d field if it exists",
583+
"copy_from" : "foo.bar", <2>
584+
"field" : "a.b.c.d", <3>
585+
"ignore_empty_value": true
586+
}
587+
}
588+
]
589+
},
590+
"docs": [
591+
{
592+
"_index": "index",
593+
"_id": "id",
594+
"_source": {
595+
"foo": {
596+
"bar": "baz" <4>
597+
},
598+
"a": {} <5>
599+
}
600+
},
601+
{
602+
"_index": "index",
603+
"_id": "id",
604+
"_source": {
605+
"foo.bar": "baz", <6>
606+
}
607+
}
608+
]
609+
}
610+
```
611+
1. Using the `flexible` access pattern in the pipeline.
612+
2. We are reading a value from the field `foo.bar`.
613+
3. We are writing its value to the field `a.b.c.d`.
614+
4. The first document uses nested json objects in its structure.
615+
5. The first document has an existing `a` field in the root.
616+
6. The second document uses a dotted field name.
617+
618+
```console-result
619+
{
620+
"docs": [
621+
{
622+
"doc": {
623+
"_id": "id",
624+
"_index": "index",
625+
"_version": "-3",
626+
"_source": {
627+
"foo": {
628+
"bar": "baz" <1>
629+
},
630+
"a": {
631+
"b.c.d": "baz" <2>
632+
}
633+
},
634+
"_ingest": {
635+
"timestamp": "2017-05-04T22:30:03.187Z"
636+
}
637+
}
638+
},
639+
{
640+
"doc": {
641+
"_id": "id",
642+
"_index": "index",
643+
"_version": "-3",
644+
"_source": {
645+
"foo.bar": "baz", <3>
646+
"a.b.c.d": "baz" <4>
647+
},
648+
"_ingest": {
649+
"timestamp": "2017-05-04T22:30:03.188Z"
650+
}
651+
}
652+
}
653+
]
654+
}
655+
```
656+
1. The `flexible` access pattern supports nested object fields. The processor looks for a `foo` field, and then a `bar` field.
657+
2. The value from the `foo.bar` field is written to the dotted field name `b.c.d` underneath the field `a`. The processor concatenates the missing field names together as a prefix on the key.
658+
3. The `flexible` access pattern also supports dotted field names. The processor looks for a field named `foo`, and after not finding it, looks for a field named `foo.bar`.
659+
4. The value from the `foo.bar` field is written to the dotted field name `a.b.c.d`. Since none of those fields exist in the document yet, they are concatenated together into a dotted field name.
434660

435661
## Access metadata fields in a processor [access-metadata-fields]
436662

0 commit comments

Comments
 (0)