-
Notifications
You must be signed in to change notification settings - Fork 267
perf: Improve performance of CAST from string to int #3017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Improve performance of CAST from string to int #3017
Conversation
|
Initial benchmarks showed improvement to match with Spark's performance. I will continue to profile the code to see if we can squeeze in additional optimization |
| cast_array.append_value(cast_value); | ||
| } else { | ||
| cast_array.append_null() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made null check conditional to remove unwanted branching
| if len == 1 { | ||
| return none_or_err(eval_mode, type_name, str); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same (removed unwanted If branching)
| let mut parse_sign_and_digits = true; | ||
|
|
||
| for (i, ch) in trimmed_str.char_indices() { | ||
| for &ch in &trimmed_bytes[idx..] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaner and faster approach to access the chars directly
|
Results : |
|
Proceeding with some rather unsafe options to see if we can squeeze in further optimizations |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3017 +/- ##
============================================
+ Coverage 56.12% 59.55% +3.43%
- Complexity 976 1379 +403
============================================
Files 119 167 +48
Lines 11743 15496 +3753
Branches 2251 2569 +318
============================================
+ Hits 6591 9229 +2638
- Misses 4012 4970 +958
- Partials 1140 1297 +157 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
cargo bench results (main vs feature branch) |
|
Tried a bunch of unsafe , low level (SIMD) ops but the gains were diminishing and perhaps would make code maintenance difficult. |
|
Thanks @coderfender. I think it would be useful to add a criterion benchmark as well, so we can more easily measure the improvement compared to the main branch. |
|
@andygrove , sure Here are the benchmarks compared through |
| } | ||
| while end > start && bytes[end - 1].is_ascii_whitespace() { | ||
| end -= 1; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not creating a new string through trim function and just looping through whitespaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually wrong in an earlier review when I suggested stopping using trim. trim does just return a slice on a &str, not a new String.
It may be worth considering using trim_ascii instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look at using trim_ascii in a separate PR along with some other minor changes.
| if eval_mode == EvalMode::Legacy { | ||
| // truncate decimal in legacy mode | ||
| parse_sign_and_digits = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The eval_mode does not change for different rows. It would likely be more performant to have separate implementations for legacy vs other modes to avoid the conditional in the hot loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion . Let me go ahead and make separate paths for each eval mode to prevent hot loops and update this thread with benchmarks
|
In other notes, I was also experimenting in implementing a two pass fast algorithm using a switch fallthrough but the implementation became super complicated with diminishing returns largely due to rust's inability to not have a fallthrough switch statement : |
We could also look at https://crates.io/crates/atoi_simd (as a separate PR) |
|
Sure @andygrove . If my understanding is right, we just want to be inspired by and implement a spark-like parser but not completely rely on that package for our string -> int parsing needs is it ? |
|
Removed branching below are the latest bench numbers : main - main branch |
|
TODO : remove common methods to a utility function to follow DRY principle |
|
|
@andygrove , I removed redundant processing of decimals (in try and ansi eval modes) and that improved the benchmarks as well |
|
These are nice speedups. Thanks @coderfender. I will review again later today. |
|
Thank you @andygrove |
| type_name: &str, | ||
| min_value: T, | ||
| ) -> SparkResult<Option<T>> { | ||
| match eval_mode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
andygrove
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @coderfender. This is a really nice improvement. I have some ideas for some additional improvements, so I will follow up with a smaller PR once this is merged.
|
Sure thank you very much @andygrove |
Which issue does this PR close?
Closes #2981
Rationale for this change
What changes are included in this PR?
How are these changes tested?