Skip to content

Add template_id to patterned-text type #131401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

parkertimmins
Copy link
Contributor

For patterned-text mapper foo, add a sub-field called foo.template_id. This is the an 8-byte hash of the template doc_value column. Unlike the template, template_id is accessible and can be used for querying, aggregations, etc. The template_id is stored as a KeywordField and can use any features of the KeyworkFieldType. template_id has doc_values, and is not stored or indexed. It uses doc value skippers, which should be quite fast given that the index will be sorted on template.

@@ -474,6 +496,10 @@ private FieldType resolveFieldType(
final IndexMode indexMode,
final String fullFieldName
) {
if (requireDocValueSkippers) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird to have both useDocValuesSkipper and requireDocValuesSkipper fields. But I don't think it's a good idea to use the name-based way to deciding whether or not to use docValueSkippers. It seems like we should just enforce that they are always used for KeywordFields that are created for the purpose of being a templateId.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads more like enableDocValuesSkipper and useDocValuesSkipper. Can we clean up the logic to apply this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked offline about changing useDocValuesSkipper to enableDocValuesSkipper and requireDocValuesSkipper to forceDocValuesSkipper. It's still a bit weird to have two such parameters, but one is needed to generally enable the use of doc values skippers, and another it needed for types (such as templateId) which are certain they want to use skippers, and don't require and name-based logic to decide (like host.name). With future changes to skippers, we can probably clean this up some.

Integer.MAX_VALUE,
indexCreatedVersion,
IndexMode.LOGSDB,
null,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be wiring a IndexSortConfig through here? I'm still a bit confused about how we want to control sorting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also find it weird that it's propagated through this class.. Looking at the uses, I see shouldUseDocValuesSkipper that's just inappropriate - this logic belongs to the LogsdbIndexModeSettingsProvider that can inject a setting to enable skiplists when appropriate.

@parkertimmins parkertimmins marked this pull request as ready for review July 17, 2025 02:27
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jul 17, 2025
@@ -222,6 +223,7 @@ public Builder(final String name, final MappingParserContext mappingParserContex
mappingParserContext.getIndexSettings().getMode(),
mappingParserContext.getIndexSettings().getIndexSortConfig(),
USE_DOC_VALUES_SKIPPER.get(mappingParserContext.getSettings()),
Copy link
Contributor

@kkrik-es kkrik-es Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this, or use regular indexing when this is not set? Maybe something to discuss with @martijnvg when he's back.

search:
index: test
body:
docvalue_fields: [ "foo.template_id" ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test that it can be used in index sort config.

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, some minor comments.

@parkertimmins
Copy link
Contributor Author

Fixes #128937

@parkertimmins parkertimmins added >non-issue :StorageEngine/Mapping The storage related side of mappings and removed needs:triage Requires assignment of a team area label labels Jul 18, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@parkertimmins parkertimmins requested a review from kkrik-es July 18, 2025 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants