Unicode Support for Case Insensitive Matching #122536

john-wagster · 2025-02-13T20:04:03Z

Related to fixing: #109385

I am been investigating a good solution to fixing our various regex engines to support unicode matches similar to utilities like java.util.regex.Pattern.

There is unfortunately not a great way to do this without hard coding a number of edge case in the Unicode spec found here:
https://www.unicode.org/Public/16.0.0/ucd/CaseFolding.txt

I've considered various options to deal with some of the edge cases which are detailed in the Lucene PR for RegExp here: apache/lucene#14192

I will refine this PR based on the outcomes in the Lucene PR and see if I can reconcile our various regex approaches to incorporate a consistent case insensitive flag. For now this Draft PR is a placeholder for that work so I don't lose it.

I have a set of utilities for validating and generating these mappings that I've held back for now pending figuring out where I can put them.

…ascii range

elasticsearchmachine · 2025-02-13T20:19:11Z

Hi @john-wagster, I've created a changelog YAML for you.

added support for regex matches for things like wildcard outside the …

c475a40

…ascii range

elasticsearchmachine added the v9.1.0 label Feb 13, 2025

john-wagster requested a review from mayya-sharipova February 13, 2025 20:04

[CI] Auto commit changes from spotless

d923170

john-wagster added >bug :Search Relevance/Search Catch all for Search Relevance labels Feb 13, 2025

Update docs/changelog/122536.yaml

082ef83

elasticsearchmachine added v9.2.0 and removed v9.1.0 labels Jun 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unicode Support for Case Insensitive Matching #122536

Unicode Support for Case Insensitive Matching #122536

john-wagster commented Feb 13, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Feb 13, 2025

Uh oh!

Uh oh!

Unicode Support for Case Insensitive Matching #122536

Are you sure you want to change the base?

Unicode Support for Case Insensitive Matching #122536

Conversation

john-wagster commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Feb 13, 2025

Uh oh!

Uh oh!

john-wagster commented Feb 13, 2025 •

edited

Loading