Skip to content

Speedup take_bytes (-35% -69%) by precalculating capacity #7422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Apr 26, 2025

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Apr 18, 2025

Which issue does this PR close?

Closes #7432

Rationale for this change

Performance improvements:

Benchmarking take str 512: Collecting 100 samples in estimated 5.0014 s (1.3M ittake str 512            time:   [3.9906 µs 3.9937 µs 3.9970 µs]
                        change: [-45.040% -44.844% -44.657%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Benchmarking take str 1024: Collecting 100 samples in estimated 5.0285 s (556k itake str 1024           time:   [9.0347 µs 9.0435 µs 9.0519 µs]
                        change: [-69.387% -69.048% -68.716%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking take str null indices 512: Collecting 100 samples in estimated 5.00take str null indices 512
                        time:   [2.4167 µs 2.4209 µs 2.4269 µs]
                        change: [-48.977% -48.536% -48.234%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  3 (3.00%) high mild
  6 (6.00%) high severe

Benchmarking take str null indices 1024: Collecting 100 samples in estimated 5.0take str null indices 1024
                        time:   [5.9829 µs 5.9940 µs 6.0151 µs]
                        change: [-57.375% -54.878% -52.153%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low severe
  2 (2.00%) high mild
  3 (3.00%) high severe

Benchmarking take str null values 1024: Collecting 100 samples in estimated 5.00take str null values 1024
                        time:   [6.1283 µs 6.1357 µs 6.1418 µs]
                        change: [-23.035% -22.635% -22.315%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 22 outliers among 100 measurements (22.00%)
  10 (10.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  9 (9.00%) high severe

Benchmarking take str null values null indices 1024: Collecting 100 samples in etake str null values null indices 1024
                        time:   [5.2781 µs 5.2905 µs 5.3024 µs]
                        change: [-9.2518% -8.9305% -8.6089%] (p = 0.00 < 0.05)
                        Performance has improved.

# What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Apr 18, 2025
@Dandandan Dandandan marked this pull request as draft April 18, 2025 16:09
@Dandandan Dandandan marked this pull request as ready for review April 21, 2025 20:14
@github-actions github-actions bot added the arrow-flight Changes to the arrow-flight crate label Apr 21, 2025
@Dandandan Dandandan changed the title Speedup take_bytes (-35% -65%) Speedup take_bytes (-35% -65%) by precalculating capacity Apr 21, 2025
@github-actions github-actions bot removed the arrow-flight Changes to the arrow-flight crate label Apr 21, 2025
@Dandandan Dandandan changed the title Speedup take_bytes (-35% -65%) by precalculating capacity Speedup take_bytes (-35% -69%) by precalculating capacity Apr 21, 2025
@Dandandan
Copy link
Contributor Author

This is ready now

@mbutrovich
Copy link
Contributor

Is it fair to say that in general you've done a few refactors recently that replace MutableBuffer with Vec or collecting directly into the target Buffer type? Is there a particular pattern with MutableBuffer that we should be avoiding?

@Dandandan
Copy link
Contributor Author

Is it fair to say that in general you've done a few refactors recently that replace MutableBuffer with Vec or collecting directly into the target Buffer type? Is there a particular pattern with MutableBuffer that we should be avoiding?

I think at this point there is little point in using MutableBuffer over Vec as the latter provides more performant (specialization over T, better inlining), slightly more safe and more complete API. The same probably applies for a lot of Builder-type APIs probably.

It might be worth spending some time documenting the best and most performant way to construct / transform Arrow arrays and apply it accross the arrow-rs create (marking stuff as deprecated if needed).

@mbutrovich
Copy link
Contributor

I will take a proper review pass this afternoon.

@mbutrovich
Copy link
Contributor

mbutrovich commented Apr 24, 2025

Do you see the same performance regression for take stringview? I'm still just digging into the code to see if that is even possible.

Main:

take stringview 512     time:   [287.26 ns 287.75 ns 288.28 ns]

take stringview 1024    time:   [489.07 ns 489.99 ns 491.18 ns]

PR:

take stringview 512     time:   [360.36 ns 361.02 ns 361.95 ns]

take stringview 1024    time:   [633.66 ns 633.79 ns 633.98 ns]

@Dandandan
Copy link
Contributor Author

Do you see the same performance regression for take stringview? I'm still just digging into the code to see if that is even possible.

Main:

take stringview 512     time:   [287.26 ns 287.75 ns 288.28 ns]

take stringview 1024    time:   [489.07 ns 489.99 ns 491.18 ns]

PR:

take stringview 512     time:   [360.36 ns 361.02 ns 361.95 ns]

take stringview 1024    time:   [633.66 ns 633.79 ns 633.98 ns]

I am not at a machine right now to test, but StringView doesn't use this code path.

@mbutrovich
Copy link
Contributor

I am not at a machine right now to test, but StringView doesn't use this code path.

Generated code looks the same, so must be some sort of noise on my machine.

Copy link
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at this point there is little point in using MutableBuffer over Vec as the latter provides more performant (specialization over T, better inlining), slightly more safe and more complete API. The same probably applies for a lot of Builder-type APIs probably.

It might be worth spending some time documenting the best and most performant way to construct / transform Arrow arrays and apply it accross the arrow-rs create (marking stuff as deprecated if needed).

This LGTM and is a great improvement. I also would love to see this documentation if you don't mind dumping your thoughts on the topic while they're fresh!

@Dandandan Dandandan merged commit 07093a4 into apache:main Apr 26, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speedup take_bytes by precalculating capacity
2 participants