Skip to content

Polish better decoder (Testing) #7428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 65 commits into
base: main
Choose a base branch
from

Conversation

zhuqi-lucas
Copy link
Contributor

@zhuqi-lucas zhuqi-lucas commented Apr 19, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the parquet Changes to the parquet crate label Apr 19, 2025
@alamb
Copy link
Contributor

alamb commented Apr 29, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing polish_better_decoder (0c3aa9b) to cee5124 diff
BENCH_NAME=arrow_reader_row_filter
BENCH_COMMAND=cargo bench --all-features --bench arrow_reader_row_filter
BENCH_FILTER=
BENCH_BRANCH_NAME=polish_better_decoder
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Apr 29, 2025

🤖: Benchmark completed

Details

group                                                                                 main                                   polish_better_decoder
-----                                                                                 ----                                   ---------------------
arrow_reader_row_filter/Composite/all_columns/async                                   1.04      2.9±0.02ms        ? ?/sec    1.00      2.8±0.03ms        ? ?/sec
arrow_reader_row_filter/Composite/all_columns/sync                                    1.00      3.3±0.02ms        ? ?/sec    1.00      3.3±0.03ms        ? ?/sec
arrow_reader_row_filter/Composite/exclude_filter_column/async                         1.00      2.6±0.02ms        ? ?/sec    1.03      2.6±0.01ms        ? ?/sec
arrow_reader_row_filter/Composite/exclude_filter_column/sync                          1.00      2.8±0.02ms        ? ?/sec    1.00      2.8±0.01ms        ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveClustered/all_columns/async                1.03      3.1±0.02ms        ? ?/sec    1.00      3.0±0.01ms        ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveClustered/all_columns/sync                 1.00      3.3±0.02ms        ? ?/sec    1.01      3.3±0.02ms        ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveClustered/exclude_filter_column/async      1.00      2.9±0.01ms        ? ?/sec    1.01      2.9±0.01ms        ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveClustered/exclude_filter_column/sync       1.00      3.1±0.02ms        ? ?/sec    1.01      3.1±0.02ms        ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveUnclustered/all_columns/async              1.00      6.5±0.04ms        ? ?/sec    1.00      6.5±0.03ms        ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveUnclustered/all_columns/sync               1.00      6.5±0.02ms        ? ?/sec    1.03      6.7±0.06ms        ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveUnclustered/exclude_filter_column/async    1.00      5.7±0.03ms        ? ?/sec    1.00      5.7±0.04ms        ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveUnclustered/exclude_filter_column/sync     1.00      5.7±0.04ms        ? ?/sec    1.03      5.9±0.03ms        ? ?/sec
arrow_reader_row_filter/PointLookup/all_columns/async                                 1.00      2.1±0.01ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
arrow_reader_row_filter/PointLookup/all_columns/sync                                  1.00      2.2±0.01ms        ? ?/sec    1.01      2.2±0.01ms        ? ?/sec
arrow_reader_row_filter/PointLookup/exclude_filter_column/async                       1.00      2.0±0.01ms        ? ?/sec    1.00      2.0±0.01ms        ? ?/sec
arrow_reader_row_filter/PointLookup/exclude_filter_column/sync                        1.00      2.2±0.01ms        ? ?/sec    1.01      2.2±0.01ms        ? ?/sec
arrow_reader_row_filter/SelectiveUnclustered/all_columns/async                        1.02      3.2±0.03ms        ? ?/sec    1.00      3.1±0.03ms        ? ?/sec
arrow_reader_row_filter/SelectiveUnclustered/all_columns/sync                         1.00      3.4±0.04ms        ? ?/sec    1.01      3.4±0.02ms        ? ?/sec
arrow_reader_row_filter/SelectiveUnclustered/exclude_filter_column/async              1.00      2.8±0.01ms        ? ?/sec    1.03      2.9±0.02ms        ? ?/sec
arrow_reader_row_filter/SelectiveUnclustered/exclude_filter_column/sync               1.00      3.0±0.02ms        ? ?/sec    1.01      3.0±0.02ms        ? ?/sec
arrow_reader_row_filter/UnselectiveClustered/all_columns/async                        1.01      8.0±0.05ms        ? ?/sec    1.00      7.8±0.06ms        ? ?/sec
arrow_reader_row_filter/UnselectiveClustered/all_columns/sync                         1.01      8.3±0.07ms        ? ?/sec    1.00      8.1±0.07ms        ? ?/sec
arrow_reader_row_filter/UnselectiveClustered/exclude_filter_column/async              1.01      7.7±0.04ms        ? ?/sec    1.00      7.6±0.04ms        ? ?/sec
arrow_reader_row_filter/UnselectiveClustered/exclude_filter_column/sync               1.01      8.0±0.04ms        ? ?/sec    1.00      7.9±0.05ms        ? ?/sec
arrow_reader_row_filter/UnselectiveUnclustered/all_columns/async                      1.01      3.1±0.03ms        ? ?/sec    1.00      3.1±0.03ms        ? ?/sec
arrow_reader_row_filter/UnselectiveUnclustered/all_columns/sync                       1.00      3.4±0.01ms        ? ?/sec    1.02      3.5±0.03ms        ? ?/sec
arrow_reader_row_filter/UnselectiveUnclustered/exclude_filter_column/async            1.00      2.8±0.01ms        ? ?/sec    1.02      2.9±0.02ms        ? ?/sec
arrow_reader_row_filter/UnselectiveUnclustered/exclude_filter_column/sync             1.00      3.0±0.02ms        ? ?/sec    1.01      3.1±0.02ms        ? ?/sec
arrow_reader_row_filter/Utf8ViewNonEmpty/all_columns/async                            1.03     23.9±0.14ms        ? ?/sec    1.00     23.1±0.14ms        ? ?/sec
arrow_reader_row_filter/Utf8ViewNonEmpty/all_columns/sync                             1.00     23.9±0.13ms        ? ?/sec    1.00     23.8±0.12ms        ? ?/sec
arrow_reader_row_filter/Utf8ViewNonEmpty/exclude_filter_column/async                  1.00     15.2±0.06ms        ? ?/sec    1.00     15.2±0.11ms        ? ?/sec
arrow_reader_row_filter/Utf8ViewNonEmpty/exclude_filter_column/sync                   1.01     15.2±0.10ms        ? ?/sec    1.00     15.0±0.08ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Apr 29, 2025

🤖: Benchmark completed

Interesting, the decoder cache doesn't seem to help much on my test machine (which is some crappy gcp VM). I couldn't reproduce the results listed on #7363 (comment) 🤔

@zhuqi-lucas
Copy link
Contributor Author

🤖: Benchmark completed

Interesting, the decoder cache doesn't seem to help much on my test machine (which is some crappy gcp VM). I couldn't reproduce the results listed on #7363 (comment) 🤔

Thank you @alamb , it seems no obvious improvement compares to main. This branch only improve PointLookup for 1000000 line big data set comparing to original better-decode.

I agree, we need to find how to mock clickbench result from arrow-rs side.

@alamb
Copy link
Contributor

alamb commented Apr 30, 2025

I agree, we need to find how to mock clickbench result from arrow-rs side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants