-
Notifications
You must be signed in to change notification settings - Fork 1.8k
fix: escape underscores when simplifying starts_with
#19077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
bb36713 to
352bbff
Compare
|
@alamb / @andygrove : |
352bbff to
d41adfe
Compare
|
I've reviewed the changes. Looks correct to me. |
30507bd to
c55f53d
Compare
|
Fixed the formatting, and rebased on the latest |
|
Should you replace ‘\’ as well ? ‘%’, ‘_’ and ‘\’ are the 3 special characters within a like expression See 'predicate.rs' |
Agh, but of course ... |
42f63c6 to
f59f039
Compare
|
Backslashes are now also escaped. I also adjusted the SLT's: they were not failing before the fix, because the simplification for |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ---- | ||
| logical_plan | ||
| 01)Projection: test.column1_utf8 LIKE Utf8("foo\%%") AS c1, test.column1_large_utf8 LIKE LargeUtf8("foo\%%") AS c2, test.column1_utf8view LIKE Utf8View("foo\%%") AS c3, test.column1_utf8 LIKE Utf8("f_o%") AS c4, test.column1_large_utf8 LIKE LargeUtf8("f_o%") AS c5, test.column1_utf8view LIKE Utf8View("f_o%") AS c6 | ||
| 01)Projection: test.column1_utf8 LIKE Utf8("foo\%%") AS c1, test.column1_large_utf8 LIKE LargeUtf8("foo\%%") AS c2, test.column1_utf8view LIKE Utf8View("foo\%%") AS c3, test.column1_utf8 LIKE Utf8("f\_o%") AS c4, test.column1_large_utf8 LIKE LargeUtf8("f\_o%") AS c5, test.column1_utf8view LIKE Utf8View("f\_o%") AS c6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 -- _ matches exactly one character -- so escaping them is the right thing to do
| // 1. 'ja%' (input pattern) | ||
| // 2. 'ja\%' (escape special char '%') | ||
| // 3. 'ja\%%' (add suffix for starts_with) | ||
| // Example: starts_with(col, 'j_a%') -> col LIKE 'j\_a\%%' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example: starts_with(col, 'j\_a%') -> col LIKE 'j\\\_a\%%' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I adjusted the steps below it but not this line ...
Fixed now
f59f039 to
dd01c71
Compare
|
fmt test failure is unrelated. Merging up to get fix from main |
Which issue does this PR close?
starts_withfunction are simplified incorrectly when the prefix contains an underscore #19076.What changes are included in this PR?
The
simplifyimplementation forStartsWithFuncis adjusted to also escape underscores instead of only percent signs.Are these changes tested?
Yes, new sql logic tests were added testing
starts_withfunctions with string literals and patterns that contain an underscore.Similar tests were added for
ends_with, even though that function has no simplification and is not affected by this bug. If simplification is ever introduced there, escaping the underscore cannot be overlooked then.Are there any user-facing changes?
No