-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix Operations.reverse() to not add non-deterministic dead states (#1…
…4212) Operations.reverse() can create dead states, but ones that have non-determinism, which is worse than just creating dead states since it causes Automaton.isDeterministic() to return false, e.g. treated as NFA. This can lead to unnecessary det() calls expecially if automaton gets bigger or more complex. Operations.reverse() serves multiple use-cases today: * Search engine use-cases trying to speed up leading wildcards * Testing/academic use-case (Brzozowski minimize) In the search engine use-case, it is used by both Lucene and Solr. Lucene uses this method for infinite automata (e.g. leading wildcard) to compute a common suffix. if the expression has one (e.g. "*foo"), then we'll need to evaluate many candidates: so we reverse the automaton as part of computing the common suffix. Then memcmp can be used to filter out candidates quickly. Solr uses this method, where users can opt-in to also indexing the reversed form of every term, with a special marker to prevent false-positives from the extra reversed terms. At query-time, the reversed wildcard queries can be turned into something that looks more like a prefix query: https://github.com/apache/solr/blob/bca4cd630b9cff66ecc0431397a99f5289a6462b/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java#L1291-L1324 Move Operations.reverse(Automaton, Set) to AutomatonTestUtil, since it is too difficult to improve while also supporting this hook. Fix Operations.reverse(Automaton) to remove dead states.
- Loading branch information
Showing
4 changed files
with
65 additions
and
43 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters