-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expression: Let TiDB use Hyperscan to support multi-pattern-match #23497
base: master
Are you sure you want to change the base?
Conversation
[REVIEW NOTIFICATION] This pull request has not been approved. To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by writing |
Please follow PR Title Format:
Or if the count of mainly changed packages are more than 3, use
|
Hi @blacktear23 Would you please:
|
@bb7133 this PR seems big but almost all code contains in 3 files and there are also lots of comments. And in this PR there only have 2 function implementations, all those provided functions are based on them. So split it into small PR may not make PR smaller than this one. |
I saw your blog about this pr and would like a permission to reprint it in PingCAP's blog. I have sent you an email and it will be reviewed by you before publishing, so I hope you will reply soon~ |
@blacktear23: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Once this pr is ready to review, please re-request the planner's review. Thanks! |
What problem does this PR solve?
Problem Summary:
Let TiDB can support multi-pattern-match, powered by Hyperscan.
Proposal: #23504
What is changed and how it works?
What's Changed:
Add builtin-functions to support multi-pattern-match, all functions has
hs_
prefix.hs_build_db(id, pattern):
this is an aggregation function used to build Hyperscan database and encoded as
base64
.id
parameter should be a number andpattern
should be a string.id
parameter can be ignored so if callhs_build_db(pattern)
will generate Hyperscan database without pattern's ID.hs_build_db_json(patterns, [encodeFormat]):
build Hyperscan database use json format patterns source.
encodeFormat
can behex
orbase64
, default ishex
hs_match(input, patterns, [format]):
return true if input matched any patterns.
format
can belines
,json
,hex
,base64
, default islines
hs_match_all(input, patterns, [format]):
return true if input matched all patterns.
format
can belines
,json
, default islines
hs_match_ids(input, patterns, [format]):
return pattern's id list that matched input.
format
can belines
,json
,hex
,base64
, default islines
hs_match_json(input, patterns):
short write for hs_match(input, patterns, "json")
hs_match_all_json(input, patterns):
short write for hs_match_all(input, patterns, "json")
hs_match_ids_json(input, patterns):
short write for hs_match_ids(input, patterns, "json")
Limitations
hs_match
series function will treat patterns parameter as constant, so if patterns parameter is changed during evaluation, the Hyperscan database will only build once at the first row and cannot be changed!For example, query like below:
will not execute correctly.
Patterns Format
lines
line split patterns, example:
json
json array for pattern, example:
pattern
field is required contain regexp pattern,id
field can be ignored, if ignored id will assigned as array index plus 1.hex
hex encoded marshaled Hyperscan database which generated by
hs_build_db_json
functionbase64
base64 encoded marshaled Hyperscan database which generated by
hs_build_db_json
functionHow to build:
This PR introduce build tags for conditional compile if you want to enable Hyperscan functions and do the tests:
Or if you want to build Hyperscan supported
tidb-server
Some examples
Related changes
pingcap/docs
/pingcap/docs-cn
:Check List
Tests
Side effects
When matching multi patterns
Release note