perf: Improve performance of CAST from string to int #3017

coderfender · 2025-12-31T06:11:19Z

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

coderfender · 2025-12-31T06:12:40Z

Initial benchmarks showed improvement to match with Spark's performance. I will continue to profile the code to see if we can squeeze in additional optimization

coderfender · 2026-01-03T02:58:05Z

native/spark-expr/src/conversion_funcs/cast.rs

+                    cast_array.append_value(cast_value);
+                } else {
+                    cast_array.append_null()
+                }


made null check conditional to remove unwanted branching

coderfender · 2026-01-03T02:58:19Z

native/spark-expr/src/conversion_funcs/cast.rs

+        if len == 1 {
+            return none_or_err(eval_mode, type_name, str);
+        }
+    }


Same (removed unwanted If branching)

coderfender · 2026-01-03T02:58:40Z

native/spark-expr/src/conversion_funcs/cast.rs

    let mut parse_sign_and_digits = true;

-    for (i, ch) in trimmed_str.char_indices() {
+    for &ch in &trimmed_bytes[idx..] {


Cleaner and faster approach to access the chars directly

coderfender · 2026-01-03T07:48:14Z

Results :

================================================================================================
Running benchmark cast operation from : StringType to : IntegerType
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
Cast function to : IntegerType , ansi mode enabled : false:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------------------------
Spark                                                                 668            683          14         15.7          63.8       1.0X
Comet (Scan)                                                          708            739          30         14.8          67.5       0.9X
Comet (Scan + Exec)                                                   816            829          18         12.9          77.8       0.8X


================================================================================================
Running benchmark cast operation from : StringType to : ShortType
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
Cast function to : ShortType , ansi mode enabled : false:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
----------------------------------------------------------------------------------------------------------------------------------------
Spark                                                               689            692           2         15.2          65.7       1.0X
Comet (Scan)                                                        639            658          21         16.4          60.9       1.1X
Comet (Scan + Exec)                                                 793            798           4         13.2          75.6       0.9X


================================================================================================
Running benchmark cast operation from : StringType to : ByteType
================================================================================================

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 16.0
Apple M2 Max
Cast function to : ByteType , ansi mode enabled : false:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
---------------------------------------------------------------------------------------------------------------------------------------
Spark                                                              659            694          39         15.9          62.9       1.0X
Comet (Scan)                                                       639            655          12         16.4          60.9       1.0X
Comet (Scan + Exec)                                                795            810          20         13.2          75.8       0.8X

coderfender · 2026-01-03T07:57:17Z

Proceeding with some rather unsafe options to see if we can squeeze in further optimizations

codecov-commenter · 2026-01-03T21:18:19Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.55%. Comparing base (f09f8af) to head (f1299eb).
⚠️ Report is 822 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #3017      +/-   ##
============================================
+ Coverage     56.12%   59.55%   +3.43%     
- Complexity      976     1379     +403     
============================================
  Files           119      167      +48     
  Lines         11743    15496    +3753     
  Branches       2251     2569     +318     
============================================
+ Hits           6591     9229    +2638     
- Misses         4012     4970     +958     
- Partials       1140     1297     +157

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderfender · 2026-01-05T05:42:13Z

cargo bench results (main vs feature branch)

  | Type | Before (main) | After (feature) | Improvement |
  |------|---------------|-----------------|-------------|
  | i8   | 26.5 µs       | 19.8 µs         | 1.34x (34%) |
  | i16  | 27.2 µs       | 21.3 µs         | 1.27x (27%) |
  | i32  | 26.8 µs       | 19.9 µs         | 1.34x (34%) |
  | i64  | 31.8 µs       | 25.6 µs         | 1.24x (24%) |

coderfender · 2026-01-05T16:55:55Z

Tried a bunch of unsafe , low level (SIMD) ops but the gains were diminishing and perhaps would make code maintenance difficult.

andygrove · 2026-01-05T17:09:49Z

Thanks @coderfender. I think it would be useful to add a criterion benchmark as well, so we can more easily measure the improvement compared to the main branch.

coderfender · 2026-01-05T17:27:23Z

@andygrove , sure

Here are the benchmarks compared through critcmp (we already have benchmarks cast_string_to_int functions). Please let me know if you think we need further info here ?

group                                    feature                                main
-----                                    -------                                ----
cast_string_to_int/cast_string_to_i16    1.00     21.3±1.92µs        ? ?/sec    1.27     27.2±2.95µs        ? ?/sec
cast_string_to_int/cast_string_to_i32    1.00     19.9±0.40µs        ? ?/sec    1.34     26.8±1.08µs        ? ?/sec
cast_string_to_int/cast_string_to_i64    1.00     25.6±0.85µs        ? ?/sec    1.24     31.8±0.35µs        ? ?/sec
cast_string_to_int/cast_string_to_i8     1.00     19.8±0.36µs        ? ?/sec    1.34     26.5±0.43µs        ? ?/sec

coderfender · 2026-01-05T17:46:10Z

native/spark-expr/src/conversion_funcs/cast.rs

+    }
+    while end > start && bytes[end - 1].is_ascii_whitespace() {
+        end -= 1;
+    }


not creating a new string through trim function and just looping through whitespaces

I was actually wrong in an earlier review when I suggested stopping using trim. trim does just return a slice on a &str, not a new String.

It may be worth considering using trim_ascii instead.

I will look at using trim_ascii in a separate PR along with some other minor changes.

andygrove · 2026-01-05T19:12:43Z

native/spark-expr/src/conversion_funcs/cast.rs

                if eval_mode == EvalMode::Legacy {
                    // truncate decimal in legacy mode
                    parse_sign_and_digits = false;


The eval_mode does not change for different rows. It would likely be more performant to have separate implementations for legacy vs other modes to avoid the conditional in the hot loop.

Great suggestion . Let me go ahead and make separate paths for each eval mode to prevent hot loops and update this thread with benchmarks

coderfender · 2026-01-05T19:22:51Z

In other notes, I was also experimenting in implementing a two pass fast algorithm using a switch fallthrough but the implementation became super complicated with diminishing returns largely due to rust's inability to not have a fallthrough switch statement :
https://github.com/CameronHosking/fast-atoi/blob/master/fast_atoi.h

andygrove · 2026-01-05T19:29:25Z

In other notes, I was also experimenting in implementing a two pass fast algorithm using a switch fallthrough but the implementation became super complicated with diminishing returns largely due to rust's inability to not have a fallthrough switch statement : https://github.com/CameronHosking/fast-atoi/blob/master/fast_atoi.h

We could also look at https://crates.io/crates/atoi_simd (as a separate PR)

coderfender · 2026-01-05T22:46:57Z

Sure @andygrove . If my understanding is right, we just want to be inspired by and implement a spark-like parser but not completely rely on that package for our string -> int parsing needs is it ?

coderfender · 2026-01-05T23:09:16Z

Removed branching below are the latest bench numbers :

main - main branch
feature - feature branch (before separate implementations for each eval mode)
feature_remove_branch - after separate eval modes

group                                    feature                                feature_remove_branching               main
-----                                    -------                                ------------------------               ----
cast_string_to_int/cast_string_to_i16    1.12     20.0±0.17µs        ? ?/sec    1.00     17.8±1.92µs        ? ?/sec    1.53     27.3±0.21µs        ? ?/sec
cast_string_to_int/cast_string_to_i32    1.00     19.9±0.34µs        ? ?/sec    1.54    30.6±46.63µs        ? ?/sec    1.38     27.5±0.81µs        ? ?/sec
cast_string_to_int/cast_string_to_i64    1.08     26.3±2.55µs        ? ?/sec    1.00     24.2±0.35µs        ? ?/sec    1.35     32.8±0.26µs        ? ?/sec
cast_string_to_int/cast_string_to_i8     1.16     20.3±1.23µs        ? ?/sec    1.00     17.5±0.21µs        ? ?/sec    1.57     27.4±1.00µs        ? ?/sec

coderfender · 2026-01-05T23:09:40Z

TODO : remove common methods to a utility function to follow DRY principle

coderfender · 2026-01-06T03:20:56Z

group                                    feature                                feature_separate_eval_mode             main
-----                                    -------                                --------------------------             ----
cast_string_to_int/cast_string_to_i16    1.14     20.0±0.17µs        ? ?/sec    1.00     17.6±0.21µs        ? ?/sec    1.55     27.3±0.21µs        ? ?/sec
cast_string_to_int/cast_string_to_i32    1.14     19.9±0.34µs        ? ?/sec    1.00     17.4±0.51µs        ? ?/sec    1.58     27.5±0.81µs        ? ?/sec
cast_string_to_int/cast_string_to_i64    1.07     26.3±2.55µs        ? ?/sec    1.00     24.5±1.44µs        ? ?/sec    1.34     32.8±0.26µs        ? ?/sec
cast_string_to_int/cast_string_to_i8     1.15     20.3±1.23µs        ? ?/sec    1.00     17.6±0.28µs        ? ?/sec    1.55     27.4±1.00µs        ? ?/sec

coderfender · 2026-01-06T03:21:57Z

@andygrove , I removed redundant processing of decimals (in try and ansi eval modes) and that improved the benchmarks as well

andygrove · 2026-01-06T17:31:14Z

These are nice speedups. Thanks @coderfender. I will review again later today.

$ cargo bench --bench cast_from_string -- --baseline main
    Finished `bench` profile [optimized + debuginfo] target(s) in 0.17s
     Running benches/cast_from_string.rs (target/release/deps/cast_from_string-9c9caa2565779a11)
Gnuplot not found, using plotters backend
cast_string_to_int/cast_string_to_i8
                        time:   [10.269 µs 10.308 µs 10.356 µs]
                        change: [−44.807% −44.528% −44.214%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
cast_string_to_int/cast_string_to_i16
                        time:   [10.260 µs 10.295 µs 10.335 µs]
                        change: [−44.683% −44.444% −44.202%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  9 (9.00%) high mild
  4 (4.00%) high severe
cast_string_to_int/cast_string_to_i32
                        time:   [10.348 µs 10.380 µs 10.417 µs]
                        change: [−43.119% −42.940% −42.761%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  11 (11.00%) high mild
  3 (3.00%) high severe
cast_string_to_int/cast_string_to_i64
                        time:   [14.733 µs 14.773 µs 14.824 µs]
                        change: [−39.389% −38.514% −37.613%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

coderfender · 2026-01-06T17:32:16Z

Thank you @andygrove

andygrove · 2026-01-06T18:53:54Z

native/spark-expr/src/conversion_funcs/cast.rs

+    type_name: &str,
+    min_value: T,
+) -> SparkResult<Option<T>> {
+    match eval_mode {


andygrove

Thanks @coderfender. This is a really nice improvement. I have some ideas for some additional improvements, so I will follow up with a smaller PR once this is merged.

coderfender · 2026-01-06T19:02:10Z

Sure thank you very much @andygrove

coderfender added 2 commits December 30, 2025 17:30

perf_string_to_int

42a6d64

perf_string_to_int

768a017

coderfender added 2 commits December 31, 2025 18:34

perf_string_to_int

ca90587

perf_string_to_int

2b331f0

coderfender changed the title ~~perf: Improve string to date perf~~ perf: Improve string to int perf Jan 1, 2026

coderfender marked this pull request as ready for review January 3, 2026 02:57

coderfender commented Jan 3, 2026

View reviewed changes

improve_benchmark

5abce5f

coderfender commented Jan 5, 2026

View reviewed changes

andygrove reviewed Jan 5, 2026

View reviewed changes

improve_benchmark_remove_unwanted_branching

94d0f32

improve_benchmark_remove_unwanted_branching_per_eval_mode

f1299eb

andygrove reviewed Jan 6, 2026

View reviewed changes

andygrove approved these changes Jan 6, 2026

View reviewed changes

andygrove changed the title ~~perf: Improve string to int perf~~ perf: Improve performance of CAST from string to int Jan 6, 2026

andygrove merged commit 069681a into apache:main Jan 6, 2026
120 checks passed

andygrove mentioned this pull request Jan 6, 2026

perf: Additional optimizations for cast from string to int #3048

Open

perf: Improve performance of CAST from string to int #3017

perf: Improve performance of CAST from string to int #3017

Uh oh!

Conversation

coderfender commented Dec 31, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

coderfender commented Dec 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderfender commented Jan 3, 2026

Uh oh!

coderfender commented Jan 3, 2026

Uh oh!

codecov-commenter commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderfender commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderfender commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andygrove commented Jan 5, 2026

Uh oh!

coderfender commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderfender commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andygrove commented Jan 5, 2026

Uh oh!

coderfender commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderfender commented Jan 5, 2026

Uh oh!

coderfender commented Jan 5, 2026

Uh oh!

coderfender commented Jan 6, 2026

Uh oh!

coderfender commented Jan 6, 2026

Uh oh!

andygrove commented Jan 6, 2026

Uh oh!

coderfender commented Jan 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

coderfender commented Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

codecov-commenter commented Jan 3, 2026 •

edited

Loading

coderfender commented Jan 5, 2026 •

edited

Loading

coderfender commented Jan 5, 2026 •

edited

Loading

coderfender commented Jan 5, 2026 •

edited

Loading

coderfender commented Jan 5, 2026 •

edited

Loading

coderfender commented Jan 5, 2026 •

edited

Loading