Skip to content

Conversation

@coderfender
Copy link
Contributor

Which issue does this PR close?

Closes #2981

Rationale for this change

What changes are included in this PR?

How are these changes tested?

@coderfender
Copy link
Contributor Author

Initial benchmarks showed improvement to match with Spark's performance. I will continue to profile the code to see if we can squeeze in additional optimization

@coderfender coderfender changed the title perf: Improve string to date perf perf: Improve string to int perf Jan 1, 2026
@coderfender coderfender marked this pull request as ready for review January 3, 2026 02:57
cast_array.append_value(cast_value);
} else {
cast_array.append_null()
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made null check conditional to remove unwanted branching

if len == 1 {
return none_or_err(eval_mode, type_name, str);
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same (removed unwanted If branching)

let mut parse_sign_and_digits = true;

for (i, ch) in trimmed_str.char_indices() {
for &ch in &trimmed_bytes[idx..] {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaner and faster approach to access the chars directly

@coderfender
Copy link
Contributor Author

Results :

================================================================================================
Running benchmark cast operation from : StringType to : IntegerType
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
Cast function to : IntegerType , ansi mode enabled : false:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------------------------
Spark                                                                 668            683          14         15.7          63.8       1.0X
Comet (Scan)                                                          708            739          30         14.8          67.5       0.9X
Comet (Scan + Exec)                                                   816            829          18         12.9          77.8       0.8X


================================================================================================
Running benchmark cast operation from : StringType to : ShortType
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
Cast function to : ShortType , ansi mode enabled : false:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
----------------------------------------------------------------------------------------------------------------------------------------
Spark                                                               689            692           2         15.2          65.7       1.0X
Comet (Scan)                                                        639            658          21         16.4          60.9       1.1X
Comet (Scan + Exec)                                                 793            798           4         13.2          75.6       0.9X


================================================================================================
Running benchmark cast operation from : StringType to : ByteType
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
Cast function to : ByteType , ansi mode enabled : false:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
---------------------------------------------------------------------------------------------------------------------------------------
Spark                                                              659            694          39         15.9          62.9       1.0X
Comet (Scan)                                                       639            655          12         16.4          60.9       1.0X
Comet (Scan + Exec)                                                795            810          20         13.2          75.8       0.8X

@coderfender
Copy link
Contributor Author

Proceeding with some rather unsafe options to see if we can squeeze in further optimizations

@codecov-commenter
Copy link

codecov-commenter commented Jan 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.55%. Comparing base (f09f8af) to head (f1299eb).
⚠️ Report is 822 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3017      +/-   ##
============================================
+ Coverage     56.12%   59.55%   +3.43%     
- Complexity      976     1379     +403     
============================================
  Files           119      167      +48     
  Lines         11743    15496    +3753     
  Branches       2251     2569     +318     
============================================
+ Hits           6591     9229    +2638     
- Misses         4012     4970     +958     
- Partials       1140     1297     +157     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderfender
Copy link
Contributor Author

coderfender commented Jan 5, 2026

cargo bench results (main vs feature branch)

  | Type | Before (main) | After (feature) | Improvement |
  |------|---------------|-----------------|-------------|
  | i8   | 26.5 µs       | 19.8 µs         | 1.34x (34%) |
  | i16  | 27.2 µs       | 21.3 µs         | 1.27x (27%) |
  | i32  | 26.8 µs       | 19.9 µs         | 1.34x (34%) |
  | i64  | 31.8 µs       | 25.6 µs         | 1.24x (24%) |

@coderfender
Copy link
Contributor Author

coderfender commented Jan 5, 2026

Tried a bunch of unsafe , low level (SIMD) ops but the gains were diminishing and perhaps would make code maintenance difficult.

@andygrove
Copy link
Member

Thanks @coderfender. I think it would be useful to add a criterion benchmark as well, so we can more easily measure the improvement compared to the main branch.

@coderfender
Copy link
Contributor Author

coderfender commented Jan 5, 2026

@andygrove , sure

Here are the benchmarks compared through critcmp (we already have benchmarks cast_string_to_int functions). Please let me know if you think we need further info here ?

group                                    feature                                main
-----                                    -------                                ----
cast_string_to_int/cast_string_to_i16    1.00     21.3±1.92µs        ? ?/sec    1.27     27.2±2.95µs        ? ?/sec
cast_string_to_int/cast_string_to_i32    1.00     19.9±0.40µs        ? ?/sec    1.34     26.8±1.08µs        ? ?/sec
cast_string_to_int/cast_string_to_i64    1.00     25.6±0.85µs        ? ?/sec    1.24     31.8±0.35µs        ? ?/sec
cast_string_to_int/cast_string_to_i8     1.00     19.8±0.36µs        ? ?/sec    1.34     26.5±0.43µs        ? ?/sec

}
while end > start && bytes[end - 1].is_ascii_whitespace() {
end -= 1;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not creating a new string through trim function and just looping through whitespaces

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually wrong in an earlier review when I suggested stopping using trim. trim does just return a slice on a &str, not a new String.

It may be worth considering using trim_ascii instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look at using trim_ascii in a separate PR along with some other minor changes.

Comment on lines 2010 to 2012
if eval_mode == EvalMode::Legacy {
// truncate decimal in legacy mode
parse_sign_and_digits = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The eval_mode does not change for different rows. It would likely be more performant to have separate implementations for legacy vs other modes to avoid the conditional in the hot loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion . Let me go ahead and make separate paths for each eval mode to prevent hot loops and update this thread with benchmarks

@coderfender
Copy link
Contributor Author

coderfender commented Jan 5, 2026

In other notes, I was also experimenting in implementing a two pass fast algorithm using a switch fallthrough but the implementation became super complicated with diminishing returns largely due to rust's inability to not have a fallthrough switch statement :
https://github.com/CameronHosking/fast-atoi/blob/master/fast_atoi.h

@andygrove
Copy link
Member

In other notes, I was also experimenting in implementing a two pass fast algorithm using a switch fallthrough but the implementation became super complicated with diminishing returns largely due to rust's inability to not have a fallthrough switch statement : https://github.com/CameronHosking/fast-atoi/blob/master/fast_atoi.h

We could also look at https://crates.io/crates/atoi_simd (as a separate PR)

@coderfender
Copy link
Contributor Author

coderfender commented Jan 5, 2026

Sure @andygrove . If my understanding is right, we just want to be inspired by and implement a spark-like parser but not completely rely on that package for our string -> int parsing needs is it ?

@coderfender
Copy link
Contributor Author

Removed branching below are the latest bench numbers :

main - main branch
feature - feature branch (before separate implementations for each eval mode)
feature_remove_branch - after separate eval modes

group                                    feature                                feature_remove_branching               main
-----                                    -------                                ------------------------               ----
cast_string_to_int/cast_string_to_i16    1.12     20.0±0.17µs        ? ?/sec    1.00     17.8±1.92µs        ? ?/sec    1.53     27.3±0.21µs        ? ?/sec
cast_string_to_int/cast_string_to_i32    1.00     19.9±0.34µs        ? ?/sec    1.54    30.6±46.63µs        ? ?/sec    1.38     27.5±0.81µs        ? ?/sec
cast_string_to_int/cast_string_to_i64    1.08     26.3±2.55µs        ? ?/sec    1.00     24.2±0.35µs        ? ?/sec    1.35     32.8±0.26µs        ? ?/sec
cast_string_to_int/cast_string_to_i8     1.16     20.3±1.23µs        ? ?/sec    1.00     17.5±0.21µs        ? ?/sec    1.57     27.4±1.00µs        ? ?/sec

@coderfender
Copy link
Contributor Author

TODO : remove common methods to a utility function to follow DRY principle

@coderfender
Copy link
Contributor Author

group                                    feature                                feature_separate_eval_mode             main
-----                                    -------                                --------------------------             ----
cast_string_to_int/cast_string_to_i16    1.14     20.0±0.17µs        ? ?/sec    1.00     17.6±0.21µs        ? ?/sec    1.55     27.3±0.21µs        ? ?/sec
cast_string_to_int/cast_string_to_i32    1.14     19.9±0.34µs        ? ?/sec    1.00     17.4±0.51µs        ? ?/sec    1.58     27.5±0.81µs        ? ?/sec
cast_string_to_int/cast_string_to_i64    1.07     26.3±2.55µs        ? ?/sec    1.00     24.5±1.44µs        ? ?/sec    1.34     32.8±0.26µs        ? ?/sec
cast_string_to_int/cast_string_to_i8     1.15     20.3±1.23µs        ? ?/sec    1.00     17.6±0.28µs        ? ?/sec    1.55     27.4±1.00µs        ? ?/sec

@coderfender
Copy link
Contributor Author

@andygrove , I removed redundant processing of decimals (in try and ansi eval modes) and that improved the benchmarks as well

@andygrove
Copy link
Member

These are nice speedups. Thanks @coderfender. I will review again later today.

$ cargo bench --bench cast_from_string -- --baseline main
    Finished `bench` profile [optimized + debuginfo] target(s) in 0.17s
     Running benches/cast_from_string.rs (target/release/deps/cast_from_string-9c9caa2565779a11)
Gnuplot not found, using plotters backend
cast_string_to_int/cast_string_to_i8
                        time:   [10.269 µs 10.308 µs 10.356 µs]
                        change: [−44.807% −44.528% −44.214%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
cast_string_to_int/cast_string_to_i16
                        time:   [10.260 µs 10.295 µs 10.335 µs]
                        change: [−44.683% −44.444% −44.202%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  9 (9.00%) high mild
  4 (4.00%) high severe
cast_string_to_int/cast_string_to_i32
                        time:   [10.348 µs 10.380 µs 10.417 µs]
                        change: [−43.119% −42.940% −42.761%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  11 (11.00%) high mild
  3 (3.00%) high severe
cast_string_to_int/cast_string_to_i64
                        time:   [14.733 µs 14.773 µs 14.824 µs]
                        change: [−39.389% −38.514% −37.613%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

@coderfender
Copy link
Contributor Author

Thank you @andygrove

type_name: &str,
min_value: T,
) -> SparkResult<Option<T>> {
match eval_mode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @coderfender. This is a really nice improvement. I have some ideas for some additional improvements, so I will follow up with a smaller PR once this is merged.

@andygrove andygrove changed the title perf: Improve string to int perf perf: Improve performance of CAST from string to int Jan 6, 2026
@coderfender
Copy link
Contributor Author

Sure thank you very much @andygrove

@andygrove andygrove merged commit 069681a into apache:main Jan 6, 2026
120 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve performance of cast from string to integral types

3 participants