Skip to content

Conversation

@willemv
Copy link

@willemv willemv commented Dec 3, 2025

Which issue does this PR close?

What changes are included in this PR?

The simplify implementation for StartsWithFunc is adjusted to also escape underscores instead of only percent signs.

Are these changes tested?

Yes, new sql logic tests were added testing starts_with functions with string literals and patterns that contain an underscore.

Similar tests were added for ends_with, even though that function has no simplification and is not affected by this bug. If simplification is ever introduced there, escaping the underscore cannot be overlooked then.

Are there any user-facing changes?

No

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Dec 3, 2025
@willemv willemv force-pushed the fix/19076-starts-with-simplification branch from bb36713 to 352bbff Compare December 3, 2025 20:30
@willemv
Copy link
Author

willemv commented Dec 3, 2025

@alamb / @andygrove :
Hi, I'm a new contributor. I'm pinging you to trigger the CI tasks, as documented in your contributor guide.

@willemv willemv force-pushed the fix/19076-starts-with-simplification branch from 352bbff to d41adfe Compare December 3, 2025 21:11
@pepijnve
Copy link
Contributor

pepijnve commented Dec 3, 2025

I've reviewed the changes. Looks correct to me.
FYI, this was introduced by #14119

@willemv willemv force-pushed the fix/19076-starts-with-simplification branch 2 times, most recently from 30507bd to c55f53d Compare December 4, 2025 06:28
@willemv
Copy link
Author

willemv commented Dec 4, 2025

Fixed the formatting, and rebased on the latest main

@bert-beyondloops
Copy link
Contributor

Should you replace ‘\’ as well ?

‘%’, ‘_’ and ‘\’ are the 3 special characters within a like expression

See 'predicate.rs'

fn contains_like_pattern(pattern: &str) -> bool {
    memchr3(b'%', b'_', b'\\', pattern.as_bytes()).is_some()
}

@willemv
Copy link
Author

willemv commented Dec 4, 2025

Should you replace ‘\’ as well ?

‘%’, ‘_’ and ‘\’ are the 3 special characters within a like expression

See 'predicate.rs'

fn contains_like_pattern(pattern: &str) -> bool {
    memchr3(b'%', b'_', b'\\', pattern.as_bytes()).is_some()
}

Agh, but of course ...

@willemv willemv force-pushed the fix/19076-starts-with-simplification branch 2 times, most recently from 42f63c6 to f59f039 Compare December 4, 2025 11:33
@willemv
Copy link
Author

willemv commented Dec 4, 2025

Backslashes are now also escaped.

I also adjusted the SLT's: they were not failing before the fix, because the simplification for starts_with is not invoked when simply select starts_with('literal', 'anotherliteral'). That's probably because constant expressions are evaluated before the simplification.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me -- thank you @willemv

I think this was initially introduced here (fyi @jatin510 )

Thank you @pepijnve for the assist

----
logical_plan
01)Projection: test.column1_utf8 LIKE Utf8("foo\%%") AS c1, test.column1_large_utf8 LIKE LargeUtf8("foo\%%") AS c2, test.column1_utf8view LIKE Utf8View("foo\%%") AS c3, test.column1_utf8 LIKE Utf8("f_o%") AS c4, test.column1_large_utf8 LIKE LargeUtf8("f_o%") AS c5, test.column1_utf8view LIKE Utf8View("f_o%") AS c6
01)Projection: test.column1_utf8 LIKE Utf8("foo\%%") AS c1, test.column1_large_utf8 LIKE LargeUtf8("foo\%%") AS c2, test.column1_utf8view LIKE Utf8View("foo\%%") AS c3, test.column1_utf8 LIKE Utf8("f\_o%") AS c4, test.column1_large_utf8 LIKE LargeUtf8("f\_o%") AS c5, test.column1_utf8view LIKE Utf8View("f\_o%") AS c6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 -- _ matches exactly one character -- so escaping them is the right thing to do

// 1. 'ja%' (input pattern)
// 2. 'ja\%' (escape special char '%')
// 3. 'ja\%%' (add suffix for starts_with)
// Example: starts_with(col, 'j_a%') -> col LIKE 'j\_a\%%'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example: starts_with(col, 'j\_a%') -> col LIKE 'j\\\_a\%%' ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I adjusted the steps below it but not this line ...

Fixed now

@willemv willemv force-pushed the fix/19076-starts-with-simplification branch from f59f039 to dd01c71 Compare December 5, 2025 07:16
@alamb
Copy link
Contributor

alamb commented Dec 5, 2025

fmt test failure is unrelated. Merging up to get fix from main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

starts_with function are simplified incorrectly when the prefix contains an underscore

4 participants