Skip to content

Unicode Support for Case Insensitive Matching #122536

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

john-wagster
Copy link
Contributor

@john-wagster john-wagster commented Feb 13, 2025

Related to fixing: #109385

I am been investigating a good solution to fixing our various regex engines to support unicode matches similar to utilities like java.util.regex.Pattern.

There is unfortunately not a great way to do this without hard coding a number of edge case in the Unicode spec found here:
https://www.unicode.org/Public/16.0.0/ucd/CaseFolding.txt

I've considered various options to deal with some of the edge cases which are detailed in the Lucene PR for RegExp here: apache/lucene#14192

I will refine this PR based on the outcomes in the Lucene PR and see if I can reconcile our various regex approaches to incorporate a consistent case insensitive flag. For now this Draft PR is a placeholder for that work so I don't lose it.

I have a set of utilities for validating and generating these mappings that I've held back for now pending figuring out where I can put them.

@john-wagster john-wagster added >bug :Search Relevance/Search Catch all for Search Relevance labels Feb 13, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @john-wagster, I've created a changelog YAML for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Relevance/Search Catch all for Search Relevance v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants