-
Notifications
You must be signed in to change notification settings - Fork 916
Parquet decoder / decoded page Cache #7363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have run some performance tests, see:
|
Here are the current clickbench results for the code in #6921
|
I did some analysis on the queries that show the largest slowdown:
Query 24-26 Queries 30-31 SELECT "SearchEngineID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits WHERE "SearchPhrase" <> '' GROUP BY "SearchEngineID", "ClientIP" ORDER BY c DESC LIMIT 10;
SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits WHERE "SearchPhrase" <> '' GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10; Basically they all have the filter How selective is
|
I profiled this command:
And I see that about 15% of the time is being spent "skipping" records -- aka evaluating the RowSelection in the parquet reader This is likely the same observation that lead @XiangpengHao to propose adding bitmask support in I need to think about what the right next steps are |
Exactly, the predicate
Each selector is 16 bytes, causing a lot of memory overhead. This also means that for each small select and skip, we will call the corresponding The associated |
One potential item I can help is to make the reader a optional opt-in reader, and merge it in. With that, we will have a testbed for future optimization like better selection mechanisms. |
|
Actually, I think all we need to show is that the code in arrow-rs helps in some cases and doesn't degrade performance for others -- we don't need to show that it completely fixes the issue downstream in DataFusion I will try and gather evidence to determine if the decoded page cache can be always be used |
I am interested for this topic, if anything i can help the testing to compare the performance or code improvement as a follow-up? |
Thank you so much @zhuqi-lucas ! that is great news. I think the first thing we should do is
Does that make sense? |
It makes sense @alamb , thank you for the guide, i will try to do it. |
group better-decoder main
----- -------------- ----
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs 1.01 889.0±13.52µs ? ?/sec 1.00 879.2±31.80µs ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs 1.13 776.7±36.97µs ? ?/sec 1.00 687.5±31.70µs ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs 1.00 890.6±12.19µs ? ?/sec 1.00 887.0±22.95µs ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs 1.01 403.8±12.84µs ? ?/sec 1.00 399.8±7.63µs ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs 1.16 376.2±21.20µs ? ?/sec 1.00 323.7±14.98µs ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs 1.01 403.7±11.99µs ? ?/sec 1.00 398.3±5.42µs ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs 1.05 535.7±30.66µs ? ?/sec 1.00 509.6±21.13µs ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs 1.18 444.1±19.58µs ? ?/sec 1.00 377.1±21.79µs ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs 1.02 536.7±37.88µs ? ?/sec 1.00 527.5±39.52µs ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs 1.00 105.9±1.16µs ? ?/sec 1.18 125.0±6.83µs ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs 1.00 155.1±7.94µs ? ?/sec 1.08 168.0±7.36µs ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs 1.00 111.3±1.80µs ? ?/sec 1.18 130.8±7.56µs ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs 1.01 165.7±3.21µs ? ?/sec 1.00 164.7±2.88µs ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short string 1.01 228.8±5.38µs ? ?/sec 1.00 226.3±2.31µs ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs 1.02 186.4±7.74µs ? ?/sec 1.00 182.8±3.39µs ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs 1.01 170.8±3.93µs ? ?/sec 1.00 169.3±2.11µs ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs 1.03 576.2±22.74µs ? ?/sec 1.00 561.3±5.71µs ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, half NULLs 1.00 562.9±10.24µs ? ?/sec 1.04 586.0±27.51µs ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, no NULLs 1.03 587.3±16.88µs ? ?/sec 1.00 570.5±10.65µs ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs 1.03 268.0±10.85µs ? ?/sec 1.00 259.3±10.75µs ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs 1.00 415.7±8.55µs ? ?/sec 1.03 428.1±16.34µs ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs 1.04 275.0±8.26µs ? ?/sec 1.00 264.4±4.68µs ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, mandatory, no NULLs 1.02 57.8±2.04µs ? ?/sec 1.00 56.8±0.76µs ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, half NULLs 1.00 122.1±0.79µs ? ?/sec 1.01 123.5±2.70µs ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, no NULLs 1.00 61.9±0.67µs ? ?/sec 1.00 61.7±0.50µs ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, mandatory, no NULLs 1.00 19.9±0.23µs ? ?/sec 1.00 19.8±0.36µs ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, half NULLs 1.00 104.9±0.78µs ? ?/sec 1.00 104.8±3.11µs ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, no NULLs 1.00 24.9±0.17µs ? ?/sec 1.00 24.8±0.32µs ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, mandatory, no NULLs 1.03 347.9±4.17µs ? ?/sec 1.00 339.2±6.45µs ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, half NULLs 1.04 359.6±10.81µs ? ?/sec 1.00 345.4±7.48µs ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, no NULLs 1.02 352.9±3.67µs ? ?/sec 1.00 345.8±13.92µs ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, mandatory, no NULLs 1.05 40.7±4.30µs ? ?/sec 1.00 38.6±2.02µs ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, half NULLs 1.01 208.0±6.85µs ? ?/sec 1.00 206.2±9.71µs ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, no NULLs 1.01 45.4±2.26µs ? ?/sec 1.00 44.9±2.58µs ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, mandatory, no NULLs 1.00 45.7±0.53µs ? ?/sec 1.01 46.3±2.38µs ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, half NULLs 1.00 111.5±1.41µs ? ?/sec 1.00 111.3±1.94µs ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, no NULLs 1.03 50.9±2.17µs ? ?/sec 1.00 49.3±0.55µs ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, mandatory, no NULLs 1.01 8.4±0.32µs ? ?/sec 1.00 8.3±0.31µs ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, half NULLs 1.01 93.1±2.14µs ? ?/sec 1.00 92.1±1.41µs ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, no NULLs 1.02 12.6±0.37µs ? ?/sec 1.00 12.4±0.33µs ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, mandatory, no NULLs 1.02 88.9±0.74µs ? ?/sec 1.00 87.0±2.80µs ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, half NULLs 1.04 150.1±1.55µs ? ?/sec 1.00 144.1±1.36µs ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, no NULLs 1.03 93.2±3.52µs ? ?/sec 1.00 90.9±0.86µs ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, mandatory, no NULLs 1.02 13.5±0.18µs ? ?/sec 1.00 13.3±0.32µs ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, half NULLs 1.04 112.6±1.35µs ? ?/sec 1.00 107.7±3.41µs ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, no NULLs 1.02 18.1±0.52µs ? ?/sec 1.00 17.7±0.28µs ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, mandatory, no NULLs 1.00 172.9±1.77µs ? ?/sec 1.01 174.7±10.11µs ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, half NULLs 1.01 251.7±5.00µs ? ?/sec 1.00 249.7±5.55µs ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, no NULLs 1.02 178.4±6.38µs ? ?/sec 1.00 174.3±6.38µs ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, mandatory, no NULLs 1.00 22.8±0.76µs ? ?/sec 1.01 23.0±1.19µs ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, half NULLs 1.00 176.8±6.67µs ? ?/sec 1.01 178.2±4.92µs ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, no NULLs 1.01 27.7±0.66µs ? ?/sec 1.00 27.5±0.93µs ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, mandatory, no NULLs 1.01 70.8±0.53µs ? ?/sec 1.00 70.3±2.16µs ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, half NULLs 1.02 90.1±0.63µs ? ?/sec 1.00 88.4±0.85µs ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, no NULLs 1.03 74.4±2.08µs ? ?/sec 1.00 72.3±0.56µs ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, mandatory, no NULLs 1.00 101.8±0.71µs ? ?/sec 1.01 102.8±3.79µs ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, half NULLs 1.02 153.8±1.49µs ? ?/sec 1.00 150.9±2.13µs ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, no NULLs 1.03 107.5±2.69µs ? ?/sec 1.00 104.8±0.94µs ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs 1.00 45.2±0.38µs ? ?/sec 1.00 45.0±1.96µs ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, half NULLs 1.03 125.4±3.75µs ? ?/sec 1.00 122.1±1.83µs ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, no NULLs 1.00 50.5±0.50µs ? ?/sec 1.00 50.5±1.81µs ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, mandatory, no NULLs 1.00 80.7±0.65µs ? ?/sec 1.00 80.9±3.00µs ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, half NULLs 1.03 145.7±4.25µs ? ?/sec 1.00 141.4±1.62µs ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, no NULLs 1.01 85.6±0.74µs ? ?/sec 1.00 84.7±2.72µs ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, mandatory, no NULLs 1.02 47.3±0.47µs ? ?/sec 1.00 46.6±0.75µs ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, half NULLs 1.02 125.1±1.31µs ? ?/sec 1.00 122.3±1.26µs ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, no NULLs 1.03 53.7±2.12µs ? ?/sec 1.00 52.3±1.58µs ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, mandatory, no NULLs 1.01 67.6±0.66µs ? ?/sec 1.00 67.0±2.51µs ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, half NULLs 1.02 92.6±0.75µs ? ?/sec 1.00 90.4±0.72µs ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, no NULLs 1.04 70.9±2.17µs ? ?/sec 1.00 68.4±0.73µs ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, mandatory, no NULLs 1.00 100.1±0.94µs ? ?/sec 1.00 99.9±4.14µs ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, half NULLs 1.02 161.3±1.50µs ? ?/sec 1.00 157.6±1.58µs ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, no NULLs 1.03 105.9±3.80µs ? ?/sec 1.00 103.1±1.41µs ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs 1.01 105.6±1.79µs ? ?/sec 1.00 104.3±6.07µs ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, half NULLs 1.04 163.6±4.37µs ? ?/sec 1.00 157.6±1.60µs ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, no NULLs 1.05 112.4±4.12µs ? ?/sec 1.00 106.8±1.38µs ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, mandatory, no NULLs 1.02 88.0±1.14µs ? ?/sec 1.00 86.4±3.21µs ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, half NULLs 1.03 156.2±3.92µs ? ?/sec 1.00 151.5±1.50µs ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, no NULLs 1.04 93.4±3.51µs ? ?/sec 1.00 90.0±1.12µs ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, mandatory, no NULLs 1.02 60.5±0.61µs ? ?/sec 1.00 59.1±0.73µs ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, half NULLs 1.03 139.9±3.52µs ? ?/sec 1.00 136.2±1.57µs ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, no NULLs 1.06 68.2±3.92µs ? ?/sec 1.00 64.5±1.95µs ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, mandatory, no NULLs 1.00 56.3±0.45µs ? ?/sec 1.02 57.5±1.03µs ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, half NULLs 1.00 75.4±2.12µs ? ?/sec 1.01 76.0±0.52µs ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, no NULLs 1.01 59.9±2.22µs ? ?/sec 1.00 59.3±0.50µs ? ?/sec
arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs 1.00 73.9±0.52µs ? ?/sec 1.02 75.1±0.36µs ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, half NULLs 1.00 124.3±3.51µs ? ?/sec 1.03 127.8±1.18µs ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, no NULLs 1.00 78.5±0.85µs ? ?/sec 1.02 79.8±0.68µs ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, mandatory, no NULLs 1.00 18.4±0.15µs ? ?/sec 1.00 18.4±0.34µs ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, half NULLs 1.00 96.0±1.05µs ? ?/sec 1.01 97.0±2.01µs ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, no NULLs 1.00 23.5±0.72µs ? ?/sec 1.00 23.6±0.34µs ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, mandatory, no NULLs 1.00 53.1±0.47µs ? ?/sec 1.02 53.9±1.07µs ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, half NULLs 1.00 115.0±1.44µs ? ?/sec 1.01 115.9±1.38µs ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, no NULLs 1.00 58.3±2.06µs ? ?/sec 1.00 58.3±0.51µs ? ?/sec
arrow_array_reader/Int32Array/plain encoded, mandatory, no NULLs 1.00 19.7±0.71µs ? ?/sec 1.08 21.4±0.14µs ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, half NULLs 1.00 94.7±3.61µs ? ?/sec 1.05 99.7±1.06µs ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, no NULLs 1.00 24.1±0.34µs ? ?/sec 1.08 26.2±0.43µs ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, mandatory, no NULLs 1.02 54.0±1.51µs ? ?/sec 1.00 53.0±0.70µs ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, half NULLs 1.02 77.7±0.81µs ? ?/sec 1.00 76.5±0.94µs ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, no NULLs 1.01 55.5±0.35µs ? ?/sec 1.00 55.0±0.71µs ? ?/sec
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs 1.02 74.4±2.54µs ? ?/sec 1.00 72.8±0.63µs ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, half NULLs 1.00 131.5±0.87µs ? ?/sec 1.00 131.5±2.56µs ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, no NULLs 1.01 77.9±0.92µs ? ?/sec 1.00 77.1±0.87µs ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, mandatory, no NULLs 1.03 78.5±3.34µs ? ?/sec 1.00 76.1±1.67µs ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, half NULLs 1.02 133.7±1.21µs ? ?/sec 1.00 130.8±2.32µs ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, no NULLs 1.02 82.5±0.74µs ? ?/sec 1.00 80.6±0.95µs ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, mandatory, no NULLs 1.01 61.3±2.01µs ? ?/sec 1.00 60.4±1.18µs ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, half NULLs 1.01 126.5±0.93µs ? ?/sec 1.00 125.2±2.15µs ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, no NULLs 1.01 65.0±0.95µs ? ?/sec 1.00 64.3±0.92µs ? ?/sec
arrow_array_reader/Int64Array/plain encoded, mandatory, no NULLs 1.02 34.8±1.92µs ? ?/sec 1.00 34.2±0.84µs ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, half NULLs 1.00 110.5±1.13µs ? ?/sec 1.00 110.8±2.15µs ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, no NULLs 1.01 39.0±1.13µs ? ?/sec 1.00 38.9±0.39µs ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings half NULLs 1.01 4.2±0.04ms ? ?/sec 1.00 4.1±0.03ms ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings no NULLs 1.01 6.6±0.04ms ? ?/sec 1.00 6.5±0.06ms ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, mandatory, no NULLs 1.01 406.1±19.03µs ? ?/sec 1.00 403.2±7.59µs ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, half NULLs 1.12 391.0±24.22µs ? ?/sec 1.00 349.8±17.00µs ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, no NULLs 1.00 399.3±8.95µs ? ?/sec 1.02 406.4±11.74µs ? ?/sec
arrow_array_reader/StringArray/plain encoded, mandatory, no NULLs 1.01 591.7±43.01µs ? ?/sec 1.00 585.5±35.36µs ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, half NULLs 1.16 461.1±20.71µs ? ?/sec 1.00 396.9±20.56µs ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, no NULLs 1.02 595.6±27.25µs ? ?/sec 1.00 585.6±41.04µs ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, mandatory, no NULLs 1.03 292.0±6.07µs ? ?/sec 1.00 284.8±2.80µs ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, half NULLs 1.02 333.6±3.46µs ? ?/sec 1.00 326.1±4.68µs ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, no NULLs 1.02 294.6±3.21µs ? ?/sec 1.00 288.4±3.38µs ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs 1.00 108.4±3.67µs ? ?/sec 1.18 127.4±6.85µs ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs 1.00 154.6±8.05µs ? ?/sec 1.11 172.2±12.16µs ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs 1.00 113.2±3.89µs ? ?/sec 1.16 131.6±7.17µs ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs 1.00 223.4±5.59µs ? ?/sec 1.02 227.7±6.06µs ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs 1.01 220.0±8.45µs ? ?/sec 1.00 217.9±8.67µs ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs 1.00 233.2±7.53µs ? ?/sec 1.01 235.5±4.78µs ? ?/sec |
Thanks @zhuqi-lucas ! Is there any way you could look into creating a benchmark for evaluating filters? I can do so too if you prefer The idea is to create a benchmarks for evaluating row filters (what @XiangpengHao is trying to optimize) that captures the common use case and is what we are trying to optimize in DataFusion You add a row filter with this API: I suggest a benchmarks that
For the filters, it is important to capture both selective filters (that select a small number of contiguous ranges) as well as non selective filters (that select rows that are scattered througout the data). Here are suggestions Filters:
For the projections, it is important to capture both when the column with the predicate appears at the output as well as when the column wihtout the predicate appears. Here are suggestions Projections (which columns are selected out):
|
The result shows a little improvement for caching page, almost no difference between main branch, may be i am missing something? Or i need to increase batch size or page size for caching? critcmp main better_decode
group better_decode main
----- ------------- ----
arrow_reader_row_filter/filter_case: int64 = 0 project_case: all_columns/ 1.02 802.1±8.17µs ? ?/sec 1.00 788.9±12.42µs ? ?/sec
arrow_reader_row_filter/filter_case: int64 = 0 project_case: exclude_filter_column/ 1.02 731.3±6.36µs ? ?/sec 1.00 716.1±13.29µs ? ?/sec
arrow_reader_row_filter/filter_case: int64 even project_case: all_columns/ 1.02 18.5±0.56ms ? ?/sec 1.00 18.1±0.09ms ? ?/sec
arrow_reader_row_filter/filter_case: int64 even project_case: exclude_filter_column/ 1.02 13.6±0.26ms ? ?/sec 1.00 13.3±0.08ms ? ?/sec
arrow_reader_row_filter/filter_case: ts > 50_000 project_case: all_columns/ 1.00 1254.5±30.86µs ? ?/sec 1.01 1263.4±12.15µs ? ?/sec
arrow_reader_row_filter/filter_case: ts > 50_000 project_case: exclude_filter_column/ 1.00 1075.5±43.17µs ? ?/sec 1.00 1074.6±9.40µs ? ?/sec
arrow_reader_row_filter/filter_case: utf8View <> '' project_case: all_columns/ 1.00 18.4±0.22ms ? ?/sec 1.00 18.3±0.06ms ? ?/sec
arrow_reader_row_filter/filter_case: utf8View <> '' project_case: exclude_filter_column/ 1.00 15.2±0.18ms ? ?/sec 1.01 15.4±0.10ms ? ?/sec
arrow_reader_row_filter/filter_case: utf8View = 'const' project_case: all_columns/ 1.00 893.4±15.74µs ? ?/sec 1.07 959.3±8.81µs ? ?/sec
arrow_reader_row_filter/filter_case: utf8View = 'const' project_case: exclude_filter_column/ 1.00 851.5±7.54µs ? ?/sec 1.07 912.2±10.13µs ? ?/sec |
I may know the reason, the cache now only used by async read, so the sync read testing case will not hit the cache, i will try to add to the sync read and testing again, or find a good way to use async read. |
I am starting to check it out... |
Thank you @alamb , i already changed the benchmark testing to trigger cache logic, and the testing works well for main branch, but the testing seems stuck when change to the cache page branch. I am trying to debug it. |
I spent some more time today thinking about what filter patterns are important to test and came up with the following six patterns (to replace the 4 I suggested above). What do you think @zhuqi-lucas ? @XiangpengHao did I miss any important filter patterns?
|
Thank you @alamb , LGTM, i will try to address those cases, one more question: Do we need to change page size to test it and compare since we cache the page? |
I think what is important in this case is that that the test parquet file that is created has multiple pages for each column -- I think the default writer's limit is 20k rows per page, so I don't think any changes are needed however, we should double check the file that it has multiple pages -- maybe using @XiangpengHao 's great https://parquet-viewer.xiangpeng.systems/ |
Thank you @alamb , good to know the default writer's limit is 20k rows per page. And a good tool to know! |
I am also thinking it might make more sense to make specific microbenchmarks for |
@alamb @XiangpengHao And why main branch did not trigger it, because currently the skip dict page usage are not triggering besides the benchmark for cache page logic. |
When i increase the data set from 100000 to 1000000, there were regressions about the PointLookup: And i try to fix it with this commit: It seems it will fix the regression about PointLookup with big data set: critcmp better-decoder polish_better_decoder
group better-decoder polish_better_decoder
----- -------------- ---------------------
arrow_reader_row_filter/Composite/all_columns/async 1.03 12.9±0.14ms ? ?/sec 1.00 12.6±0.12ms ? ?/sec
arrow_reader_row_filter/Composite/all_columns/sync 1.00 14.5±0.10ms ? ?/sec 1.00 14.6±0.31ms ? ?/sec
arrow_reader_row_filter/Composite/exclude_filter_column/async 1.03 12.7±0.44ms ? ?/sec 1.00 12.4±0.20ms ? ?/sec
arrow_reader_row_filter/Composite/exclude_filter_column/sync 1.02 13.3±0.32ms ? ?/sec 1.00 13.1±0.21ms ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveClustered/all_columns/async 1.00 12.3±0.11ms ? ?/sec 1.00 12.3±0.16ms ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveClustered/all_columns/sync 1.00 13.1±0.12ms ? ?/sec 1.00 13.2±0.14ms ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveClustered/exclude_filter_column/async 1.01 12.3±0.44ms ? ?/sec 1.00 12.1±0.17ms ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveClustered/exclude_filter_column/sync 1.00 12.7±0.10ms ? ?/sec 1.01 12.8±0.17ms ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveUnclustered/all_columns/async 1.01 41.7±0.59ms ? ?/sec 1.00 41.4±0.51ms ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveUnclustered/all_columns/sync 1.00 41.7±0.34ms ? ?/sec 1.00 41.6±0.36ms ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveUnclustered/exclude_filter_column/async 1.00 33.9±0.28ms ? ?/sec 1.00 33.8±0.29ms ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveUnclustered/exclude_filter_column/sync 1.00 34.3±0.28ms ? ?/sec 1.00 34.2±0.33ms ? ?/sec
arrow_reader_row_filter/PointLookup/all_columns/async 3.89 10.2±0.07ms ? ?/sec 1.00 2.6±0.03ms ? ?/sec
arrow_reader_row_filter/PointLookup/all_columns/sync 1.02 2.9±0.14ms ? ?/sec 1.00 2.8±0.04ms ? ?/sec
arrow_reader_row_filter/PointLookup/exclude_filter_column/async 3.91 10.2±0.08ms ? ?/sec 1.00 2.6±0.04ms ? ?/sec
arrow_reader_row_filter/PointLookup/exclude_filter_column/sync 1.00 2.8±0.03ms ? ?/sec 1.00 2.8±0.05ms ? ?/sec
arrow_reader_row_filter/SelectiveUnclustered/all_columns/async 1.00 15.1±0.56ms ? ?/sec 1.01 15.2±0.23ms ? ?/sec
arrow_reader_row_filter/SelectiveUnclustered/all_columns/sync 1.00 16.3±0.21ms ? ?/sec 1.01 16.5±0.18ms ? ?/sec
arrow_reader_row_filter/SelectiveUnclustered/exclude_filter_column/async 1.00 14.0±0.16ms ? ?/sec 1.01 14.1±0.15ms ? ?/sec
arrow_reader_row_filter/SelectiveUnclustered/exclude_filter_column/sync 1.01 14.8±0.54ms ? ?/sec 1.00 14.6±0.13ms ? ?/sec
arrow_reader_row_filter/UnselectiveClustered/all_columns/async 1.00 20.6±0.14ms ? ?/sec 1.03 21.3±0.25ms ? ?/sec
arrow_reader_row_filter/UnselectiveClustered/all_columns/sync 1.00 21.8±0.19ms ? ?/sec 1.03 22.4±0.17ms ? ?/sec
arrow_reader_row_filter/UnselectiveClustered/exclude_filter_column/async 1.00 19.8±0.24ms ? ?/sec 1.02 20.2±0.18ms ? ?/sec
arrow_reader_row_filter/UnselectiveClustered/exclude_filter_column/sync 1.00 20.5±0.22ms ? ?/sec 1.03 21.1±0.16ms ? ?/sec
arrow_reader_row_filter/UnselectiveUnclustered/all_columns/async 1.00 15.1±0.13ms ? ?/sec 1.01 15.2±0.17ms ? ?/sec
arrow_reader_row_filter/UnselectiveUnclustered/all_columns/sync 1.00 16.5±0.15ms ? ?/sec 1.00 16.4±0.14ms ? ?/sec
arrow_reader_row_filter/UnselectiveUnclustered/exclude_filter_column/async 1.01 14.1±0.13ms ? ?/sec 1.00 14.0±0.18ms ? ?/sec
arrow_reader_row_filter/UnselectiveUnclustered/exclude_filter_column/sync 1.00 14.6±0.10ms ? ?/sec 1.00 14.6±0.18ms ? ?/sec
arrow_reader_row_filter/Utf8ViewNonEmpty/all_columns/async 1.00 95.9±0.73ms ? ?/sec 1.00 95.8±0.94ms ? ?/sec
arrow_reader_row_filter/Utf8ViewNonEmpty/all_columns/sync 1.01 104.4±0.78ms ? ?/sec 1.00 103.1±1.08ms ? ?/sec
arrow_reader_row_filter/Utf8ViewNonEmpty/exclude_filter_column/async 1.01 72.7±0.44ms ? ?/sec 1.00 71.8±0.57ms ? ?/sec
arrow_reader_row_filter/Utf8ViewNonEmpty/exclude_filter_column/sync 1.02 73.6±0.94ms ? ?/sec 1.00 72.1±0.74ms ? ?/sec |
Amazing -- thank you @zhuqi-lucas I suggest we start a new PR with the proposal (e.g. the contents of #6921 and your fixes). It seems like it is quite close now -- so we should start preparing a PR to merge. (edit -- I see you have done this in #7428) Also, I will unfortunately be offline until next weekend starting in a few hours so I will likely not be able to respond until then Maybe @XiangpengHao @Dandandan can help review the changes in the mean time and again thank you for all your work on this issue |
Thank you @alamb, i think following steps need:
critcmp main polish_better_decoder
group main polish_better_decoder
----- ---- ---------------------
arrow_reader_row_filter/Composite/all_columns/async 1.05 13.1±0.17ms ? ?/sec 1.00 12.6±0.12ms ? ?/sec
arrow_reader_row_filter/Composite/all_columns/sync 1.00 14.5±0.13ms ? ?/sec 1.01 14.6±0.31ms ? ?/sec
arrow_reader_row_filter/Composite/exclude_filter_column/async 1.02 12.6±0.55ms ? ?/sec 1.00 12.4±0.20ms ? ?/sec
arrow_reader_row_filter/Composite/exclude_filter_column/sync 1.01 13.2±0.17ms ? ?/sec 1.00 13.1±0.21ms ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveClustered/all_columns/async 1.00 11.7±0.09ms ? ?/sec 1.05 12.3±0.16ms ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveClustered/all_columns/sync 1.00 13.1±0.69ms ? ?/sec 1.00 13.2±0.14ms ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveClustered/exclude_filter_column/async 1.00 11.7±0.56ms ? ?/sec 1.03 12.1±0.17ms ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveClustered/exclude_filter_column/sync 1.00 12.5±0.11ms ? ?/sec 1.03 12.8±0.17ms ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveUnclustered/all_columns/async 1.01 41.8±1.50ms ? ?/sec 1.00 41.4±0.51ms ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveUnclustered/all_columns/sync 1.01 42.0±1.56ms ? ?/sec 1.00 41.6±0.36ms ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveUnclustered/exclude_filter_column/async 1.02 34.5±1.28ms ? ?/sec 1.00 33.8±0.29ms ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveUnclustered/exclude_filter_column/sync 1.01 34.4±0.67ms ? ?/sec 1.00 34.2±0.33ms ? ?/sec
arrow_reader_row_filter/PointLookup/all_columns/async 1.00 2.4±0.02ms ? ?/sec 1.09 2.6±0.03ms ? ?/sec
arrow_reader_row_filter/PointLookup/all_columns/sync 1.00 2.7±0.04ms ? ?/sec 1.07 2.8±0.04ms ? ?/sec
arrow_reader_row_filter/PointLookup/exclude_filter_column/async 1.00 2.5±0.05ms ? ?/sec 1.06 2.6±0.04ms ? ?/sec
arrow_reader_row_filter/PointLookup/exclude_filter_column/sync 1.00 2.7±0.05ms ? ?/sec 1.04 2.8±0.05ms ? ?/sec
arrow_reader_row_filter/SelectiveUnclustered/all_columns/async 1.01 15.4±0.91ms ? ?/sec 1.00 15.2±0.23ms ? ?/sec
arrow_reader_row_filter/SelectiveUnclustered/all_columns/sync 1.00 15.9±0.15ms ? ?/sec 1.04 16.5±0.18ms ? ?/sec
arrow_reader_row_filter/SelectiveUnclustered/exclude_filter_column/async 1.00 13.4±0.17ms ? ?/sec 1.05 14.1±0.15ms ? ?/sec
arrow_reader_row_filter/SelectiveUnclustered/exclude_filter_column/sync 1.00 14.3±0.66ms ? ?/sec 1.02 14.6±0.13ms ? ?/sec
arrow_reader_row_filter/UnselectiveClustered/all_columns/async 1.01 21.4±0.84ms ? ?/sec 1.00 21.3±0.25ms ? ?/sec
arrow_reader_row_filter/UnselectiveClustered/all_columns/sync 1.01 22.6±0.93ms ? ?/sec 1.00 22.4±0.17ms ? ?/sec
arrow_reader_row_filter/UnselectiveClustered/exclude_filter_column/async 1.00 20.0±0.20ms ? ?/sec 1.01 20.2±0.18ms ? ?/sec
arrow_reader_row_filter/UnselectiveClustered/exclude_filter_column/sync 1.00 21.0±0.19ms ? ?/sec 1.01 21.1±0.16ms ? ?/sec
arrow_reader_row_filter/UnselectiveUnclustered/all_columns/async 1.00 15.1±0.13ms ? ?/sec 1.00 15.2±0.17ms ? ?/sec
arrow_reader_row_filter/UnselectiveUnclustered/all_columns/sync 1.02 16.7±0.80ms ? ?/sec 1.00 16.4±0.14ms ? ?/sec
arrow_reader_row_filter/UnselectiveUnclustered/exclude_filter_column/async 1.00 13.6±0.15ms ? ?/sec 1.03 14.0±0.18ms ? ?/sec
arrow_reader_row_filter/UnselectiveUnclustered/exclude_filter_column/sync 1.00 14.5±0.13ms ? ?/sec 1.01 14.6±0.18ms ? ?/sec
arrow_reader_row_filter/Utf8ViewNonEmpty/all_columns/async 1.10 105.4±2.71ms ? ?/sec 1.00 95.8±0.94ms ? ?/sec
arrow_reader_row_filter/Utf8ViewNonEmpty/all_columns/sync 1.02 105.5±3.65ms ? ?/sec 1.00 103.1±1.08ms ? ?/sec
arrow_reader_row_filter/Utf8ViewNonEmpty/exclude_filter_column/async 1.04 74.9±1.85ms ? ?/sec 1.00 71.8±0.57ms ? ?/sec
arrow_reader_row_filter/Utf8ViewNonEmpty/exclude_filter_column/sync 1.03 73.9±2.76ms ? ?/sec 1.00 72.1±0.74ms ? ?/sec Updated: arrow_reader_row_filter/Utf8ViewNonEmpty/all_columns/async 1.10 105.4±2.71ms ? ?/sec 1.00 95.8±0.94ms ? ?/sec arrow_reader_row_filter/Utf8ViewNonEmpty/all_columns/async has 10% improvement, it seems in theory we are right, because all_columns/async are our page cache improvement!
|
It looks like the performance still need to be improved, tested locally for my mac, a lot of still slower compared to the older_push_down which without page cache: ./bench.sh compare older_push_down test_default_parquet_push_down
Comparing older_push_down and test_default_parquet_push_down
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ older_push_down ┃ test_default_parquet_push_down ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │ 0.29ms │ 0.29ms │ no change │
│ QQuery 1 │ 50.39ms │ 47.96ms │ no change │
│ QQuery 2 │ 77.56ms │ 78.10ms │ no change │
│ QQuery 3 │ 77.95ms │ 85.75ms │ 1.10x slower │
│ QQuery 4 │ 579.45ms │ 520.55ms │ +1.11x faster │
│ QQuery 5 │ 560.83ms │ 583.54ms │ no change │
│ QQuery 6 │ 0.33ms │ 0.37ms │ 1.10x slower │
│ QQuery 7 │ 63.95ms │ 57.39ms │ +1.11x faster │
│ QQuery 8 │ 718.15ms │ 730.24ms │ no change │
│ QQuery 9 │ 786.44ms │ 791.21ms │ no change │
│ QQuery 10 │ 206.47ms │ 178.59ms │ +1.16x faster │
│ QQuery 11 │ 213.81ms │ 205.09ms │ no change │
│ QQuery 12 │ 679.51ms │ 611.23ms │ +1.11x faster │
│ QQuery 13 │ 1002.85ms │ 915.22ms │ +1.10x faster │
│ QQuery 14 │ 763.19ms │ 655.43ms │ +1.16x faster │
│ QQuery 15 │ 649.33ms │ 642.34ms │ no change │
│ QQuery 16 │ 1390.12ms │ 1409.99ms │ no change │
│ QQuery 17 │ 1219.49ms │ 1296.90ms │ 1.06x slower │
│ QQuery 18 │ 2932.03ms │ 2748.39ms │ +1.07x faster │
│ QQuery 19 │ 63.44ms │ 72.62ms │ 1.14x slower │
│ QQuery 20 │ 752.05ms │ 687.70ms │ +1.09x faster │
│ QQuery 21 │ 927.06ms │ 724.69ms │ +1.28x faster │
│ QQuery 22 │ 1558.98ms │ 1284.78ms │ +1.21x faster │
│ QQuery 23 │ 3402.25ms │ 3753.29ms │ 1.10x slower │
│ QQuery 24 │ 471.04ms │ 378.47ms │ +1.24x faster │
│ QQuery 25 │ 420.43ms │ 317.64ms │ +1.32x faster │
│ QQuery 26 │ 522.57ms │ 405.79ms │ +1.29x faster │
│ QQuery 27 │ 1375.58ms │ 1144.70ms │ +1.20x faster │
│ QQuery 28 │ 8767.85ms │ 8608.97ms │ no change │
│ QQuery 29 │ 458.33ms │ 451.06ms │ no change │
│ QQuery 30 │ 689.41ms │ 662.65ms │ no change │
│ QQuery 31 │ 758.38ms │ 755.04ms │ no change │
│ QQuery 32 │ 2981.93ms │ 2424.51ms │ +1.23x faster │
│ QQuery 33 │ 2886.54ms │ 2747.83ms │ no change │
│ QQuery 34 │ 3298.54ms │ 3322.28ms │ no change │
│ QQuery 35 │ 834.81ms │ 876.42ms │ no change │
│ QQuery 36 │ 40.41ms │ 94.12ms │ 2.33x slower │
│ QQuery 37 │ 35.59ms │ 50.69ms │ 1.42x slower │
│ QQuery 38 │ 39.18ms │ 92.59ms │ 2.36x slower │
│ QQuery 39 │ 35.07ms │ 133.18ms │ 3.80x slower │
│ QQuery 40 │ 36.55ms │ 43.06ms │ 1.18x slower │
│ QQuery 41 │ 36.18ms │ 42.14ms │ 1.16x slower │
│ QQuery 42 │ 37.00ms │ 40.48ms │ 1.09x slower │
└──────────────┴─────────────────┴────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary ┃ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (older_push_down) │ 42401.30ms │
│ Total Time (test_default_parquet_push_down) │ 40673.29ms │
│ Average Time (older_push_down) │ 986.08ms │
│ Average Time (test_default_parquet_push_down) │ 945.89ms │
│ Queries Faster │ 15 │
│ Queries Slower │ 12 │
│ Queries with No Change │ 16 │
└───────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ older_push_down ┃ test_default_parquet_push_down ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │ 1.12ms │ 1.16ms │ no change │
│ QQuery 1 │ 25.44ms │ 27.52ms │ 1.08x slower │
│ QQuery 2 │ 54.65ms │ 60.26ms │ 1.10x slower │
│ QQuery 3 │ 56.82ms │ 61.16ms │ 1.08x slower │
│ QQuery 4 │ 484.52ms │ 482.47ms │ no change │
│ QQuery 5 │ 524.38ms │ 551.00ms │ 1.05x slower │
│ QQuery 6 │ 1.24ms │ 1.25ms │ no change │
│ QQuery 7 │ 43.00ms │ 44.17ms │ no change │
│ QQuery 8 │ 632.15ms │ 676.51ms │ 1.07x slower │
│ QQuery 9 │ 658.82ms │ 692.69ms │ 1.05x slower │
│ QQuery 10 │ 154.14ms │ 149.79ms │ no change │
│ QQuery 11 │ 183.86ms │ 173.48ms │ +1.06x faster │
│ QQuery 12 │ 664.23ms │ 619.06ms │ +1.07x faster │
│ QQuery 13 │ 916.31ms │ 897.86ms │ no change │
│ QQuery 14 │ 717.15ms │ 650.81ms │ +1.10x faster │
│ QQuery 15 │ 561.85ms │ 579.71ms │ no change │
│ QQuery 16 │ 1430.77ms │ 1388.65ms │ no change │
│ QQuery 17 │ 1249.19ms │ 1150.01ms │ +1.09x faster │
│ QQuery 18 │ 2739.07ms │ 2768.37ms │ no change │
│ QQuery 19 │ 41.63ms │ 53.35ms │ 1.28x slower │
│ QQuery 20 │ 724.72ms │ 717.24ms │ no change │
│ QQuery 21 │ 819.39ms │ 728.06ms │ +1.13x faster │
│ QQuery 22 │ 1551.09ms │ 1232.23ms │ +1.26x faster │
│ QQuery 23 │ 3565.46ms │ 3089.17ms │ +1.15x faster │
│ QQuery 24 │ 457.95ms │ 365.63ms │ +1.25x faster │
│ QQuery 25 │ 381.44ms │ 306.09ms │ +1.25x faster │
│ QQuery 26 │ 526.80ms │ 409.19ms │ +1.29x faster │
│ QQuery 27 │ 1415.36ms │ 1278.68ms │ +1.11x faster │
│ QQuery 28 │ 8528.35ms │ 8521.98ms │ no change │
│ QQuery 29 │ 401.68ms │ 389.67ms │ no change │
│ QQuery 30 │ 728.25ms │ 654.12ms │ +1.11x faster │
│ QQuery 31 │ 744.53ms │ 784.18ms │ 1.05x slower │
│ QQuery 32 │ 3257.23ms │ 2953.12ms │ +1.10x faster │
│ QQuery 33 │ 3187.75ms │ 2755.04ms │ +1.16x faster │
│ QQuery 34 │ 3735.45ms │ 3459.67ms │ +1.08x faster │
│ QQuery 35 │ 877.66ms │ 885.68ms │ no change │
│ QQuery 36 │ 26.23ms │ 83.86ms │ 3.20x slower │
│ QQuery 37 │ 23.04ms │ 39.59ms │ 1.72x slower │
│ QQuery 38 │ 21.55ms │ 80.23ms │ 3.72x slower │
│ QQuery 39 │ 22.02ms │ 124.36ms │ 5.65x slower │
│ QQuery 40 │ 21.54ms │ 32.89ms │ 1.53x slower │
│ QQuery 41 │ 21.64ms │ 30.26ms │ 1.40x slower │
│ QQuery 42 │ 23.77ms │ 28.83ms │ 1.21x slower │
└──────────────┴─────────────────┴────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary ┃ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (older_push_down) │ 42203.20ms │
│ Total Time (test_default_parquet_push_down) │ 39979.07ms │
│ Average Time (older_push_down) │ 981.47ms │
│ Average Time (test_default_parquet_push_down) │ 929.75ms │
│ Queries Faster │ 15 │
│ Queries Slower │ 15 │
│ Queries with No Change │ 13 │
└───────────────────────────────────────────────┴────────────┘ And compared with no filter push down, it's more slower cases: ./bench.sh compare main test_default_parquet_push_down
Comparing main and test_default_parquet_push_down
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ main ┃ test_default_parquet_push_down ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │ 0.32ms │ 0.29ms │ +1.09x faster │
│ QQuery 1 │ 46.96ms │ 47.96ms │ no change │
│ QQuery 2 │ 75.59ms │ 78.10ms │ no change │
│ QQuery 3 │ 74.12ms │ 85.75ms │ 1.16x slower │
│ QQuery 4 │ 556.03ms │ 520.55ms │ +1.07x faster │
│ QQuery 5 │ 563.52ms │ 583.54ms │ no change │
│ QQuery 6 │ 0.31ms │ 0.37ms │ 1.18x slower │
│ QQuery 7 │ 52.23ms │ 57.39ms │ 1.10x slower │
│ QQuery 8 │ 720.15ms │ 730.24ms │ no change │
│ QQuery 9 │ 741.10ms │ 791.21ms │ 1.07x slower │
│ QQuery 10 │ 171.95ms │ 178.59ms │ no change │
│ QQuery 11 │ 187.66ms │ 205.09ms │ 1.09x slower │
│ QQuery 12 │ 597.16ms │ 611.23ms │ no change │
│ QQuery 13 │ 877.71ms │ 915.22ms │ no change │
│ QQuery 14 │ 605.11ms │ 655.43ms │ 1.08x slower │
│ QQuery 15 │ 630.66ms │ 642.34ms │ no change │
│ QQuery 16 │ 1422.47ms │ 1409.99ms │ no change │
│ QQuery 17 │ 1221.90ms │ 1296.90ms │ 1.06x slower │
│ QQuery 18 │ 2773.23ms │ 2748.39ms │ no change │
│ QQuery 19 │ 66.30ms │ 72.62ms │ 1.10x slower │
│ QQuery 20 │ 682.62ms │ 687.70ms │ no change │
│ QQuery 21 │ 800.86ms │ 724.69ms │ +1.11x faster │
│ QQuery 22 │ 1521.09ms │ 1284.78ms │ +1.18x faster │
│ QQuery 23 │ 4223.95ms │ 3753.29ms │ +1.13x faster │
│ QQuery 24 │ 286.83ms │ 378.47ms │ 1.32x slower │
│ QQuery 25 │ 274.47ms │ 317.64ms │ 1.16x slower │
│ QQuery 26 │ 320.45ms │ 405.79ms │ 1.27x slower │
│ QQuery 27 │ 945.72ms │ 1144.70ms │ 1.21x slower │
│ QQuery 28 │ 8206.32ms │ 8608.97ms │ no change │
│ QQuery 29 │ 459.59ms │ 451.06ms │ no change │
│ QQuery 30 │ 493.35ms │ 662.65ms │ 1.34x slower │
│ QQuery 31 │ 585.82ms │ 755.04ms │ 1.29x slower │
│ QQuery 32 │ 2436.43ms │ 2424.51ms │ no change │
│ QQuery 33 │ 2916.52ms │ 2747.83ms │ +1.06x faster │
│ QQuery 34 │ 2975.16ms │ 3322.28ms │ 1.12x slower │
│ QQuery 35 │ 866.75ms │ 876.42ms │ no change │
│ QQuery 36 │ 104.22ms │ 94.12ms │ +1.11x faster │
│ QQuery 37 │ 62.50ms │ 50.69ms │ +1.23x faster │
│ QQuery 38 │ 107.57ms │ 92.59ms │ +1.16x faster │
│ QQuery 39 │ 167.64ms │ 133.18ms │ +1.26x faster │
│ QQuery 40 │ 46.49ms │ 43.06ms │ +1.08x faster │
│ QQuery 41 │ 45.49ms │ 42.14ms │ +1.08x faster │
│ QQuery 42 │ 42.51ms │ 40.48ms │ no change │
└──────────────┴───────────┴────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary ┃ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main) │ 39956.84ms │
│ Total Time (test_default_parquet_push_down) │ 40673.29ms │
│ Average Time (main) │ 929.23ms │
│ Average Time (test_default_parquet_push_down) │ 945.89ms │
│ Queries Faster │ 12 │
│ Queries Slower │ 15 │
│ Queries with No Change │ 16 │
└───────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ main ┃ test_default_parquet_push_down ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │ 1.10ms │ 1.16ms │ 1.05x slower │
│ QQuery 1 │ 24.27ms │ 27.52ms │ 1.13x slower │
│ QQuery 2 │ 58.03ms │ 60.26ms │ no change │
│ QQuery 3 │ 58.91ms │ 61.16ms │ no change │
│ QQuery 4 │ 478.56ms │ 482.47ms │ no change │
│ QQuery 5 │ 549.38ms │ 551.00ms │ no change │
│ QQuery 6 │ 1.18ms │ 1.25ms │ 1.06x slower │
│ QQuery 7 │ 40.05ms │ 44.17ms │ 1.10x slower │
│ QQuery 8 │ 645.79ms │ 676.51ms │ no change │
│ QQuery 9 │ 671.10ms │ 692.69ms │ no change │
│ QQuery 10 │ 133.89ms │ 149.79ms │ 1.12x slower │
│ QQuery 11 │ 159.81ms │ 173.48ms │ 1.09x slower │
│ QQuery 12 │ 561.74ms │ 619.06ms │ 1.10x slower │
│ QQuery 13 │ 750.14ms │ 897.86ms │ 1.20x slower │
│ QQuery 14 │ 525.03ms │ 650.81ms │ 1.24x slower │
│ QQuery 15 │ 553.88ms │ 579.71ms │ no change │
│ QQuery 16 │ 1417.77ms │ 1388.65ms │ no change │
│ QQuery 17 │ 1104.70ms │ 1150.01ms │ no change │
│ QQuery 18 │ 3037.46ms │ 2768.37ms │ +1.10x faster │
│ QQuery 19 │ 45.81ms │ 53.35ms │ 1.16x slower │
│ QQuery 20 │ 733.71ms │ 717.24ms │ no change │
│ QQuery 21 │ 789.70ms │ 728.06ms │ +1.08x faster │
│ QQuery 22 │ 1299.84ms │ 1232.23ms │ +1.05x faster │
│ QQuery 23 │ 3952.24ms │ 3089.17ms │ +1.28x faster │
│ QQuery 24 │ 273.40ms │ 365.63ms │ 1.34x slower │
│ QQuery 25 │ 274.14ms │ 306.09ms │ 1.12x slower │
│ QQuery 26 │ 320.12ms │ 409.19ms │ 1.28x slower │
│ QQuery 27 │ 900.06ms │ 1278.68ms │ 1.42x slower │
│ QQuery 28 │ 7812.82ms │ 8521.98ms │ 1.09x slower │
│ QQuery 29 │ 390.07ms │ 389.67ms │ no change │
│ QQuery 30 │ 420.68ms │ 654.12ms │ 1.55x slower │
│ QQuery 31 │ 571.58ms │ 784.18ms │ 1.37x slower │
│ QQuery 32 │ 2585.80ms │ 2953.12ms │ 1.14x slower │
│ QQuery 33 │ 2621.59ms │ 2755.04ms │ 1.05x slower │
│ QQuery 34 │ 3144.83ms │ 3459.67ms │ 1.10x slower │
│ QQuery 35 │ 855.71ms │ 885.68ms │ no change │
│ QQuery 36 │ 80.10ms │ 83.86ms │ no change │
│ QQuery 37 │ 35.71ms │ 39.59ms │ 1.11x slower │
│ QQuery 38 │ 79.06ms │ 80.23ms │ no change │
│ QQuery 39 │ 123.78ms │ 124.36ms │ no change │
│ QQuery 40 │ 29.03ms │ 32.89ms │ 1.13x slower │
│ QQuery 41 │ 27.67ms │ 30.26ms │ 1.09x slower │
│ QQuery 42 │ 25.98ms │ 28.83ms │ 1.11x slower │
└──────────────┴───────────┴────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary ┃ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main) │ 38166.21ms │
│ Total Time (test_default_parquet_push_down) │ 39979.07ms │
│ Average Time (main) │ 887.59ms │
│ Average Time (test_default_parquet_push_down) │ 929.75ms │
│ Queries Faster │ 4 │
│ Queries Slower │ 24 │
│ Queries with No Change │ 15 │
└───────────────────────────────────────────────┴────────────┘ |
I am trying to pick up this PR #6624 , and to apply and then to see the performance. |
Thanks @zhuqi-lucas -- I am back and plan to spend time on this project this week |
Thank you @alamb , welcome back! |
I created a draft poc of bitmap/range enum select PR, it's not ready for review and try to make all clickbench no regression, but it's independent of page cache PR. |
Since our benchmark doesn't seem to track the actual performance of ClickBench in datafusion I am going to try and figure out why: |
@alamb @XiangpengHao I found some of the regression comes from the page cache missing, so it will cause more time to decode page even we enable page cache, for example our default batch size for the clickbench is 8192, in Q 27 clickbench benchmark result, it will cause more than 20% page cache missing due to some batch > one page size , with above commit, it's performance will not have regression. The original flamegraph for q27 clickbench: Explanation details: Most cases:
Assumption & observation: each page consists multiple batches.
Then our pipeline looks like this:
Load Page 1
Load batch 1 -> evaluate predicates -> filter 1 -> load & emit batch 1
Load batch 2 -> evaluate predicates -> filter 2 -> load & emit batch 2
Load batch 3 -> evaluate predicates -> filter 3 -> load & emit batch 3
Load Page 2
Load batch 4 -> evaluate predicates -> filter 4 -> load & emit batch 4
Load batch 5 -> evaluate predicates -> filter 5 -> load & emit batch 5
But some cases:
Load Page1
Load Page2
Load Page3
Load batch 1 -> evaluate predicates -> filter 1 -> load & emit batch 1
When we load Page2, the cache will update to Page2, and next time we access the Page1, it will miss. |
This is a really nice finding, maybe we should cache 2 data pages instead of one? |
Thank you @XiangpengHao for double check, i tried 2 data pages, it's still regression, and 3 data pages it's ok, 3 pages caching is the smallest for Q27 with no regression, so i add 3 data pages to the above commit. I think it's enough for most cases. |
Maybe we can change the cache so it is based on row count (aka caches enough pages for the total output record batch) rather than some fixed number |
Thank you @alamb for the suggestion, i agree, this is the perfect solution, but it seems no row count information available for the parquet default V1 version page it only has num_values, i will investigate if we can do it. #[derive(Clone)]
pub enum Page {
/// Data page Parquet format v1.
DataPage {
/// The underlying data buffer
buf: Bytes,
/// Number of values in this page
num_values: u32,
/// Encoding for values in this page
encoding: Encoding,
/// Definition level encoding
def_level_encoding: Encoding,
/// Repetition level encoding
rep_level_encoding: Encoding,
/// Optional statistics for this page
statistics: Option<Statistics>,
},
/// Data page Parquet format v2.
DataPageV2 {
/// The underlying data buffer
buf: Bytes,
/// Number of values in this page
num_values: u32,
/// Encoding for values in this page
encoding: Encoding,
/// Number of null values in this page
num_nulls: u32,
/// Number of rows in this page
num_rows: u32,
/// Length of definition levels
def_levels_byte_len: u32,
/// Length of repetition levels
rep_levels_byte_len: u32,
/// Is this page compressed
is_compressed: bool,
/// Optional statistics for this page
statistics: Option<Statistics>,
},
/// Dictionary page.
DictionaryPage {
/// The underlying data buffer
buf: Bytes,
/// Number of values in this page
num_values: u32,
/// Encoding for values in this page
encoding: Encoding,
/// Is dictionary page sorted
is_sorted: bool,
},
} |
I Isn't num_values enough? You could cache pages until you have read the |
@alamb Let me try num_values, i can see all page types have this field, if it works, it's good! |
I tried it now, and it seems some regression from testing result, the 3 page caches have better performance, i am not sure why, need to further investigate... commit 3321f73128022e2868af91a7fe87799a2a0b06ad
Author: zhuqi-lucas <821684824@qq.com>
Date: Thu May 8 14:42:27 2025 +0800
Add dynamic page cache based on target batch size
diff --git a/parquet/src/arrow/async_reader/arrow_reader.rs b/parquet/src/arrow/async_reader/arrow_reader.rs
index 92e585756..4cfec5552 100644
--- a/parquet/src/arrow/async_reader/arrow_reader.rs
+++ b/parquet/src/arrow/async_reader/arrow_reader.rs
@@ -19,7 +19,6 @@ use std::collections::hash_map::Entry;
use std::collections::HashMap;
use std::sync::{Mutex, MutexGuard};
use std::{collections::VecDeque, sync::Arc};
-
use arrow_array::ArrayRef;
use arrow_array::{cast::AsArray, Array, RecordBatch, RecordBatchReader};
use arrow_schema::{ArrowError, DataType, Schema, SchemaRef};
@@ -220,53 +219,95 @@ impl RecordBatchReader for FilteredParquetRecordBatchReader {
struct CachedPage {
dict: Option<(usize, Page)>, // page offset -> page
- data: Option<(usize, Page)>, // page offset -> page
+ data: VecDeque<(usize, Page)>, // page offset -> page, use dynamic pages according to the batch size
+ total_data_rows: usize,
}
struct PredicatePageCacheInner {
pages: HashMap<usize, CachedPage>, // col_id (Parquet's leaf column index) -> CachedPage
+ /// How many rows the reader is currently asking for per batch
+ batch_size: usize,
}
impl PredicatePageCacheInner {
pub(crate) fn get_page(&self, col_id: usize, offset: usize) -> Option<Page> {
self.pages.get(&col_id).and_then(|pages| {
- pages
- .dict
- .iter()
- .chain(pages.data.iter())
- .find(|(page_offset, _)| *page_offset == offset)
- .map(|(_, page)| page.clone())
+
+ if let Some((off, page)) = &pages.dict {
+ if *off == offset {
+ return Some(page.clone());
+ }
+ }
+
+ pages.data.iter().find(|(off, _)| *off == offset).map(|(_, page)| page.clone())
})
}
- /// Insert a page into the cache.
- /// Inserting a page will override the existing page, if any.
- /// This is because we only need to cache 2 pages per column, see below.
+ /// Insert or refresh a page in the per-column cache.
+ ///
+ /// This cache maintains:
+ /// 1. **One dictionary page** (`PageType::DICTIONARY_PAGE`), replacing any prior dict page.
+ /// 2. **A dynamic set of data pages**, each tracked by its file offset and row count,
+ /// stored in a FIFO `VecDeque<(offset, Page)>`.
+ ///
+ /// As you process batches across multiple pages, we must keep every page
+ /// needed to emit one full batch. After inserting a new data page, we:
+ /// 1. Increase `total_data_rows` by that page’s `num_values()`.
+ /// 2. **Evict the oldest page** only while:
+ /// - Removing it still leaves **at least** `batch_size` rows in cache, **and**
+ /// - More than **one** data page remains.
+ ///
+ /// This policy ensures:
+ /// - **Complete coverage**: all pages required to serve one batch stay cached.
+ /// - **Memory boundedness**: total cached rows ≥ `batch_size`, never far above it.
+ /// - **At least one page retained**: even if a single page exceeds `batch_size`.
+ ///
+ /// # Parameters
+ /// - `col_id`: Parquet leaf column index, used as the cache key.
+ /// - `offset`: File offset for this page (unique identifier).
+ /// - `page`: The decoded `Page` to insert or refresh.
pub(crate) fn insert_page(&mut self, col_id: usize, offset: usize, page: Page) {
let is_dict = page.page_type() == PageType::DICTIONARY_PAGE;
+ let rows_in_page = page.num_values() as usize;
- let cached_pages = self.pages.entry(col_id);
- match cached_pages {
- Entry::Occupied(mut entry) => {
+ match self.pages.entry(col_id) {
+ Entry::Occupied(mut occ) => {
+ let cp = occ.get_mut();
if is_dict {
- entry.get_mut().dict = Some((offset, page));
+ // refresh dictionary page
+ cp.dict = Some((offset, page));
} else {
- entry.get_mut().data = Some((offset, page));
+ // add new data page
+ cp.data.push_back((offset, page.clone()));
+ cp.total_data_rows += rows_in_page;
+
+ // evict only while the oldest page is not needed to cover batch_size
+ while cp.data.len() > 1 {
+ // look at the front (oldest) page’s row count
+ let &(_, ref oldest) = cp.data.front().unwrap();
+ let oldest_rows = oldest.num_values() as usize;
+ // if removing it still leaves enough rows for one batch, pop it
+ if cp.total_data_rows - oldest_rows >= self.batch_size {
+ cp.data.pop_front();
+ cp.total_data_rows -= oldest_rows;
+ } else {
+ break;
+ }
+ }
}
}
- Entry::Vacant(entry) => {
- let cached_page = if is_dict {
- CachedPage {
- dict: Some((offset, page)),
- data: None,
- }
+
+ Entry::Vacant(vac) => {
+ // first insertion for this column
+ let mut data = VecDeque::new();
+ let dict = if is_dict {
+ Some((offset, page))
} else {
- CachedPage {
- dict: None,
- data: Some((offset, page)),
- }
+ data.push_back((offset, page.clone()));
+ None
};
- entry.insert(cached_page);
+ let total_data_rows = if is_dict { 0 } else { rows_in_page };
+ vac.insert(CachedPage { dict, data, total_data_rows });
}
}
}
@@ -328,10 +369,11 @@ pub(crate) struct PredicatePageCache {
}
impl PredicatePageCache {
- pub(crate) fn new(capacity: usize) -> Self {
+ pub(crate) fn new(capacity: usize, batch_size: usize) -> Self {
Self {
inner: Mutex::new(PredicatePageCacheInner {
pages: HashMap::with_capacity(capacity),
+ batch_size
}),
}
}
@@ -513,7 +555,7 @@ mod tests {
fn test_predicate_page_cache_basic_operations() {
use super::*;
- let cache = PredicatePageCache::new(2);
+ let cache = PredicatePageCache::new(2, 8192);
let page1 = Page::dummy_page(PageType::DATA_PAGE, 100);
let page2 = Page::dummy_page(PageType::DICTIONARY_PAGE, 200);
@@ -538,7 +580,7 @@ mod tests {
fn test_predicate_page_cache_replacement() {
use super::*;
- let cache = PredicatePageCache::new(2);
+ let cache = PredicatePageCache::new(2, 8192);
let data_page1 = Page::dummy_page(PageType::DATA_PAGE, 100);
let data_page2 = Page::dummy_page(PageType::DATA_PAGE_V2, 200);
@@ -556,7 +598,7 @@ mod tests {
fn test_predicate_page_cache_multiple_columns() {
use super::*;
- let cache = PredicatePageCache::new(2);
+ let cache = PredicatePageCache::new(2, 8192);
let page1 = Page::dummy_page(PageType::DATA_PAGE, 100);
let page2 = Page::dummy_page(PageType::DATA_PAGE_V2, 200);
diff --git a/parquet/src/arrow/async_reader/mod.rs b/parquet/src/arrow/async_reader/mod.rs
index a034e927f..3f2936826 100644
--- a/parquet/src/arrow/async_reader/mod.rs
+++ b/parquet/src/arrow/async_reader/mod.rs
@@ -599,6 +599,7 @@ where
self.metadata.as_ref(),
offset_index,
projection_to_cache,
+ batch_size
);
let mut selection =
@@ -902,6 +903,7 @@ impl<'a> InMemoryRowGroup<'a> {
metadata: &'a ParquetMetaData,
offset_index: Option<&'a [OffsetIndexMetaData]>,
projection_to_cache: Option<ProjectionMask>,
+ batch_size: usize,
) -> Self {
let rg_metadata = metadata.row_group(row_group_idx);
let to_cache_column_cnt = projection_to_cache
@@ -920,7 +922,7 @@ impl<'a> InMemoryRowGroup<'a> {
row_count: rg_metadata.num_rows() as usize,
metadata,
row_group_idx,
- cache: Arc::new(PredicatePageCache::new(to_cache_column_cnt)),
+ cache: Arc::new(PredicatePageCache::new(to_cache_column_cnt, batch_size)),
projection_to_cache,
}
} |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
There are some times where enabling row filter actually slows down the parquet decoding and one reason for this is having to decompress (e.g. ZSTD) the pages twice
Describe the solution you'd like
#6921 has a cache to avoid this second decode
I would like to get that PR merged
Steps:
Describe alternatives you've considered
Additional context
I am filing this as a separate ticket as there are a lot of other ideas on #6921 that make it kind of hard to follow
The text was updated successfully, but these errors were encountered: