Pax/support bloom filter pushdown #1333

gongxun0928 · 2025-08-27T09:27:17Z

this pr is base on #1324

feat: pushdown bloom filter to pax table am

This optimization pushes down Bloom Filter conditions for runtime filters
to the Pax Table AM layer.

By applying the filter earlier than the SeqNext() function, it eliminates
the overhead of converting data from columnar format to TableTupleSlot,
resulting in faster query execution

CREATE TABLE t2(c1 int, c2 int, c3 int, c4 int, c5 int) with (appendonly=true, orientation=column) distributed REPLICATED;
INSERT INTO t2 VALUES (1,1,1,1,1), (2,2,2,2,2), (3,3,3,3,3), (4,4,4,4,4);
INSERT INTO t2 select * FROM t2;
INSERT INTO t2 select * FROM t2;
INSERT INTO t2 select * FROM t2;
CREATE TABLE t3(c1 int, c2 int, c3 int, c4 int, c5 int) using pax;


gpadmin=# insert into t3 select b,i,i,i,i from generate_series(1,5) b,generate_series(1,2000000) i;

INSERT 0 10000000
gpadmin=#
gpadmin=# analyze t3;
ANALYZE
gpadmin=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF)
SELECT t3.c3 FROM t3, t2 WHERE t3.c2 = t2.c4;
                                        QUERY PLAN
-------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3) (actual time=355.000..1482.000 rows=160 loops=1)
   ->  Hash Join (actual time=0.000..1482.000 rows=96 loops=1)
         Hash Cond: (t3.c2 = t2.c4)
         Extra Text: (seg0)   Hash chain length 8.0 avg, 8 max, using 4 of 524288 buckets.
         ->  Seq Scan on t3 (actual time=0.000..691.000 rows=6000000 loops=1)
         ->  Hash (actual time=0.000..0.000 rows=32 loops=1)
               Buckets: 524288  Batches: 1  Memory Usage: 4098kB
               ->  Seq Scan on t2 (actual time=0.000..0.000 rows=32 loops=1)
 Optimizer: GPORCA
(9 rows)

gpadmin=# set gp_enable_runtime_filter_pushdown to on;
SET
gpadmin=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF)
SELECT t3.c3 FROM t3, t2 WHERE t3.c2 = t2.c4;
                                        QUERY PLAN
-------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3) (actual time=99.000..303.000 rows=160 loops=1)
   ->  Hash Join (actual time=1.000..303.000 rows=96 loops=1)
         Hash Cond: (t3.c2 = t2.c4)
         Extra Text: (seg0)   Hash chain length 8.0 avg, 8 max, using 4 of 524288 buckets.
         ->  Seq Scan on t3 (actual time=1.000..301.000 rows=12 loops=1)
               Rows Removed by Pushdown Runtime Filter: 5999988
         ->  Hash (actual time=0.000..0.000 rows=32 loops=1)
               Buckets: 524288  Batches: 1  Memory Usage: 4098kB
               ->  Seq Scan on t2 (actual time=0.000..0.000 rows=32 loops=1)
 Optimizer: GPORCA
(10 rows)

gpadmin=# set pax_enable_row_filter to on;
SET
gpadmin=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF)
SELECT t3.c3 FROM t3, t2 WHERE t3.c2 = t2.c4;
                                        QUERY PLAN
-------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3) (actual time=48.000..138.000 rows=160 loops=1)
   ->  Hash Join (actual time=0.000..138.000 rows=96 loops=1)
         Hash Cond: (t3.c2 = t2.c4)
         Extra Text: (seg0)   Hash chain length 8.0 avg, 8 max, using 4 of 524288 buckets.
         ->  Seq Scan on t3 (actual time=0.000..137.000 rows=12 loops=1)
               Rows Removed by Pushdown Runtime Filter: 5999988
         ->  Hash (actual time=0.000..0.000 rows=32 loops=1)
               Buckets: 524288  Batches: 1  Memory Usage: 4098kB
               ->  Seq Scan on t2 (actual time=0.000..0.000 rows=32 loops=1)
 Optimizer: GPORCA
(10 rows)

What does this PR do?

Type of Change

Bug fix (non-breaking change)
New feature (non-breaking change)
Breaking change (fix or feature with breaking changes)
Documentation update

Breaking Changes

Test Plan

Unit tests added/updated
Integration tests added/updated
Passed make installcheck
Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Followed contribution guide
Added/updated documentation
Reviewed code for security implications
Requested review from cloudberry committers

Additional Context

CI Skip Instructions

contrib/pax_storage/src/cpp/storage/filter/pax_row_filter.cc

contrib/pax_storage/src/cpp/storage/filter/pax_sparse_pg_path.cc

contrib/pax_storage/src/cpp/storage/micro_partition_row_filter_reader.cc

src/include/nodes/execnodes.h

contrib/pax_storage/src/cpp/storage/micro_partition_row_filter_reader.cc

contrib/pax_storage/src/test/regress/sql/gp_runtime_filter.sql

This optimization pushes down Bloom Filter conditions for runtime filters to the Pax Table AM layer. By applying the filter earlier than the SeqNext() function, it eliminates the overhead of converting data from columnar format to TableTupleSlot, resulting in faster query execution

after calculating the null_counts array in advance, there is no need to call GetColumnDatum to continue updating null_counts. We can directly read the datum.

gongxun0928 force-pushed the pax/support-bloom-filter-pushdown branch from a02e2c3 to 1fb74ea Compare August 27, 2025 10:31

gongxun0928 force-pushed the pax/support-bloom-filter-pushdown branch from 1fb74ea to f83db6e Compare September 10, 2025 17:19

gongxun0928 requested a review from gfphoenix78 September 11, 2025 01:40

gongxun0928 force-pushed the pax/support-bloom-filter-pushdown branch 2 times, most recently from 52f7483 to 8b62803 Compare October 10, 2025 06:33

my-ship-it reviewed Oct 11, 2025

View reviewed changes

contrib/pax_storage/src/cpp/storage/filter/pax_row_filter.cc Show resolved Hide resolved

contrib/pax_storage/src/cpp/storage/filter/pax_sparse_pg_path.cc Show resolved Hide resolved

gfphoenix78 reviewed Oct 12, 2025

View reviewed changes

contrib/pax_storage/src/cpp/storage/micro_partition_row_filter_reader.cc Show resolved Hide resolved

contrib/pax_storage/src/cpp/storage/micro_partition_row_filter_reader.cc Show resolved Hide resolved

src/include/nodes/execnodes.h Outdated Show resolved Hide resolved

gongxun0928 force-pushed the pax/support-bloom-filter-pushdown branch 3 times, most recently from 38ed61a to 4e85665 Compare October 22, 2025 16:38

gongxun0928 requested review from gfphoenix78 and my-ship-it October 23, 2025 10:34

gongxun0928 force-pushed the pax/support-bloom-filter-pushdown branch from 4e85665 to 11367ec Compare October 28, 2025 13:42

gfphoenix78 reviewed Oct 29, 2025

View reviewed changes

contrib/pax_storage/src/cpp/storage/micro_partition_row_filter_reader.cc Outdated Show resolved Hide resolved

contrib/pax_storage/src/test/regress/sql/gp_runtime_filter.sql Outdated Show resolved Hide resolved

gongxun0928 force-pushed the pax/support-bloom-filter-pushdown branch from 11367ec to cb135ad Compare October 29, 2025 08:07

gongxun0928 requested a review from gfphoenix78 October 30, 2025 01:40

gongxun0928 added 2 commits October 31, 2025 17:45

performance: eliminate unnecessary null_counts calculations

1ca52f8

after calculating the null_counts array in advance, there is no need to call GetColumnDatum to continue updating null_counts. We can directly read the datum.

gongxun0928 force-pushed the pax/support-bloom-filter-pushdown branch from cb135ad to 1ca52f8 Compare October 31, 2025 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pax/support bloom filter pushdown #1333

Pax/support bloom filter pushdown #1333

Uh oh!

gongxun0928 commented Aug 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Pax/support bloom filter pushdown #1333

Are you sure you want to change the base?

Pax/support bloom filter pushdown #1333

Uh oh!

Conversation

gongxun0928 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of Change

Breaking Changes

Test Plan

Impact

Checklist

Additional Context

CI Skip Instructions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gongxun0928 commented Aug 27, 2025 •

edited

Loading