Commit 1d8df4f
committed
[SPARK-45606][SQL] Release restrictions on multi-layer runtime filter
### What changes were proposed in this pull request?
Before #39170, Spark only supports insert runtime filter for application side of shuffle join on single-layer. Considered it's not worth to insert more runtime filter if the column already exists runtime filter, Spark restricts it at https://github.com/apache/spark/blob/7057952f6bc2c5cf97dd408effd1b18bee1cb8f4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala#L346
For example
`select * from bf1 join bf2 on bf1.c1 = bf2.c2 and bf1.c1 = bf2.b2 where bf2.a2 = 62`
This SQL have two join conditions. There will insert two runtime filter on `bf1.c1` if haven't the restriction mentioned above.
At that time, it was reasonable.
After #39170, Spark supports insert runtime filter for one side of any shuffle join on multi-layer. But the restrictions on multi-layer runtime filter mentioned above looks outdated.
For example
`select * from bf1 join bf2 join bf3 on bf1.c1 = bf2.c2 and bf3.c3 = bf1.c1 where bf2.a2 = 5`
Assume bf2 as the build side and insert a runtime filter for bf1. We can't insert the same runtime filter for bf3 due to there are already a runtime filter on `bf1.c1`.
The behavior is different from the origin and is unexpected.
The change of the PR doesn't affect the restriction mentioned above.
### Why are the changes needed?
Release restrictions on multi-layer runtime filter.
Expand optimization surface.
### Does this PR introduce _any_ user-facing change?
'No'.
New feature.
### How was this patch tested?
Test cases updated.
Micro benchmark for q9 in TPC-H.
**TPC-H 100**
Query | Master(ms) | PR(ms) | Difference(ms) | Percent
-- | -- | -- | -- | --
q9 | 26491 | 20725 | 5766| 27.82%
### Was this patch authored or co-authored using generative AI tooling?
'No'.
Closes #43449 from beliefer/SPARK-45606.
Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Jiaan Geng <[email protected]>1 parent a912706 commit 1d8df4f
File tree
2 files changed
+18
-23
lines changed- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer
- core/src/test/scala/org/apache/spark/sql
2 files changed
+18
-23
lines changedLines changed: 15 additions & 18 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
247 | 247 | | |
248 | 248 | | |
249 | 249 | | |
250 | | - | |
251 | | - | |
252 | | - | |
253 | | - | |
254 | | - | |
255 | | - | |
256 | | - | |
257 | | - | |
258 | | - | |
| 250 | + | |
259 | 251 | | |
260 | 252 | | |
261 | 253 | | |
| |||
277 | 269 | | |
278 | 270 | | |
279 | 271 | | |
280 | | - | |
281 | | - | |
| 272 | + | |
282 | 273 | | |
283 | 274 | | |
284 | | - | |
285 | 275 | | |
286 | 276 | | |
287 | 277 | | |
288 | | - | |
289 | | - | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
290 | 283 | | |
291 | | - | |
| 284 | + | |
| 285 | + | |
292 | 286 | | |
293 | 287 | | |
294 | 288 | | |
295 | 289 | | |
296 | 290 | | |
297 | 291 | | |
298 | | - | |
299 | | - | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
300 | 297 | | |
301 | | - | |
| 298 | + | |
302 | 299 | | |
303 | 300 | | |
304 | 301 | | |
| |||
Lines changed: 3 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
335 | 335 | | |
336 | 336 | | |
337 | 337 | | |
338 | | - | |
339 | | - | |
340 | 338 | | |
341 | | - | |
| 339 | + | |
342 | 340 | | |
343 | | - | |
| 341 | + | |
344 | 342 | | |
345 | | - | |
| 343 | + | |
346 | 344 | | |
347 | 345 | | |
348 | 346 | | |
| |||
0 commit comments