Speedup take_bytes (-35% -69%) by precalculating capacity #7422

Dandandan · 2025-04-18T16:01:50Z

Which issue does this PR close?

Rationale for this change

Performance improvements:

Benchmarking take str 512: Collecting 100 samples in estimated 5.0014 s (1.3M ittake str 512            time:   [3.9906 µs 3.9937 µs 3.9970 µs]
                        change: [-45.040% -44.844% -44.657%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Benchmarking take str 1024: Collecting 100 samples in estimated 5.0285 s (556k itake str 1024           time:   [9.0347 µs 9.0435 µs 9.0519 µs]
                        change: [-69.387% -69.048% -68.716%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking take str null indices 512: Collecting 100 samples in estimated 5.00take str null indices 512
                        time:   [2.4167 µs 2.4209 µs 2.4269 µs]
                        change: [-48.977% -48.536% -48.234%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  3 (3.00%) high mild
  6 (6.00%) high severe

Benchmarking take str null indices 1024: Collecting 100 samples in estimated 5.0take str null indices 1024
                        time:   [5.9829 µs 5.9940 µs 6.0151 µs]
                        change: [-57.375% -54.878% -52.153%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low severe
  2 (2.00%) high mild
  3 (3.00%) high severe

Benchmarking take str null values 1024: Collecting 100 samples in estimated 5.00take str null values 1024
                        time:   [6.1283 µs 6.1357 µs 6.1418 µs]
                        change: [-23.035% -22.635% -22.315%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 22 outliers among 100 measurements (22.00%)
  10 (10.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  9 (9.00%) high severe

Benchmarking take str null values null indices 1024: Collecting 100 samples in etake str null values null indices 1024
                        time:   [5.2781 µs 5.2905 µs 5.3024 µs]
                        change: [-9.2518% -8.9305% -8.6089%] (p = 0.00 < 0.05)
                        Performance has improved.

# What changes are included in this PR?

Are there any user-facing changes?

Dandandan · 2025-04-22T05:26:59Z

This is ready now

mbutrovich · 2025-04-23T14:30:01Z

Is it fair to say that in general you've done a few refactors recently that replace MutableBuffer with Vec or collecting directly into the target Buffer type? Is there a particular pattern with MutableBuffer that we should be avoiding?

Dandandan · 2025-04-23T17:47:08Z

Is it fair to say that in general you've done a few refactors recently that replace MutableBuffer with Vec or collecting directly into the target Buffer type? Is there a particular pattern with MutableBuffer that we should be avoiding?

I think at this point there is little point in using MutableBuffer over Vec as the latter provides more performant (specialization over T, better inlining), slightly more safe and more complete API. The same probably applies for a lot of Builder-type APIs probably.

It might be worth spending some time documenting the best and most performant way to construct / transform Arrow arrays and apply it accross the arrow-rs create (marking stuff as deprecated if needed).

mbutrovich · 2025-04-24T16:09:10Z

I will take a proper review pass this afternoon.

mbutrovich · 2025-04-24T16:59:10Z

Do you see the same performance regression for take stringview? I'm still just digging into the code to see if that is even possible.

Main:

take stringview 512     time:   [287.26 ns 287.75 ns 288.28 ns]

take stringview 1024    time:   [489.07 ns 489.99 ns 491.18 ns]

PR:

take stringview 512     time:   [360.36 ns 361.02 ns 361.95 ns]

take stringview 1024    time:   [633.66 ns 633.79 ns 633.98 ns]

Dandandan · 2025-04-25T06:05:48Z

Do you see the same performance regression for take stringview? I'm still just digging into the code to see if that is even possible.

Main:
take stringview 512     time:   [287.26 ns 287.75 ns 288.28 ns]

take stringview 1024    time:   [489.07 ns 489.99 ns 491.18 ns]
PR:
take stringview 512     time:   [360.36 ns 361.02 ns 361.95 ns]

take stringview 1024    time:   [633.66 ns 633.79 ns 633.98 ns]

I am not at a machine right now to test, but StringView doesn't use this code path.

mbutrovich · 2025-04-25T11:24:41Z

I am not at a machine right now to test, but StringView doesn't use this code path.

Generated code looks the same, so must be some sort of noise on my machine.

mbutrovich

I think at this point there is little point in using MutableBuffer over Vec as the latter provides more performant (specialization over T, better inlining), slightly more safe and more complete API. The same probably applies for a lot of Builder-type APIs probably.

It might be worth spending some time documenting the best and most performant way to construct / transform Arrow arrays and apply it accross the arrow-rs create (marking stuff as deprecated if needed).

This LGTM and is a great improvement. I also would love to see this documentation if you don't mind dumping your thoughts on the topic while they're fresh!

Dandandan added 3 commits April 18, 2025 15:50

Speedup take_bytes

a2dd882

Speedup take_bytes 2

8759828

Speedup take_bytes 3

0dbd482

github-actions bot added the arrow Changes to the arrow crate label Apr 18, 2025

Dandandan marked this pull request as draft April 18, 2025 16:09

Dandandan added 10 commits April 19, 2025 09:13

WIP

8155660

WIP

2b88c6d

WIP

24d7d4b

Refactor

8c68e9f

Refactor

3cedbe5

Refactor

62f0e7a

Refactor

3471085

Fix capacity

95d5e9d

Fix

aacb432

Fix

978140d

Dandandan marked this pull request as ready for review April 21, 2025 20:14

github-actions bot added the arrow-flight Changes to the arrow-flight crate label Apr 21, 2025

Dandandan changed the title ~~Speedup take_bytes (-35% -65%)~~ Speedup take_bytes (-35% -65%) by precalculating capacity Apr 21, 2025

Dandandan added 3 commits April 21, 2025 22:21

Fix

1dd2197

Refactor and speedup null indices / null values

865daa3

Fix

95e9041

github-actions bot removed the arrow-flight Changes to the arrow-flight crate label Apr 21, 2025

Fix

813a6a0

Dandandan changed the title ~~Speedup take_bytes (-35% -65%) by precalculating capacity~~ Speedup take_bytes (-35% -69%) by precalculating capacity Apr 21, 2025

Fmt

810a0af

mbutrovich approved these changes Apr 25, 2025

View reviewed changes

Dandandan merged commit 07093a4 into apache:main Apr 26, 2025
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup take_bytes (-35% -69%) by precalculating capacity #7422

Speedup take_bytes (-35% -69%) by precalculating capacity #7422

Dandandan commented Apr 18, 2025 •

edited

Loading

Dandandan commented Apr 22, 2025

mbutrovich commented Apr 23, 2025

Dandandan commented Apr 23, 2025

mbutrovich commented Apr 24, 2025

mbutrovich commented Apr 24, 2025 •

edited

Loading

Dandandan commented Apr 25, 2025

mbutrovich commented Apr 25, 2025

mbutrovich left a comment

Speedup take_bytes (-35% -69%) by precalculating capacity #7422

Speedup take_bytes (-35% -69%) by precalculating capacity #7422

Conversation

Dandandan commented Apr 18, 2025 • edited Loading

Which issue does this PR close?

Rationale for this change

Are there any user-facing changes?

Dandandan commented Apr 22, 2025

mbutrovich commented Apr 23, 2025

Dandandan commented Apr 23, 2025

mbutrovich commented Apr 24, 2025

mbutrovich commented Apr 24, 2025 • edited Loading

Dandandan commented Apr 25, 2025

mbutrovich commented Apr 25, 2025

mbutrovich left a comment

Choose a reason for hiding this comment

Dandandan commented Apr 18, 2025 •

edited

Loading

mbutrovich commented Apr 24, 2025 •

edited

Loading