-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Introduce selection vector repartitioning #15423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This starts to look nice @goldmedal |
87a84f9
to
3a151dd
Compare
After rebasing to the main, the hash join no longer uses |
#15339 It looks like the join plan is being changed. |
also c.c. @zebsme |
You should be able to get the test back by also setting |
Thanks. It works. I also added the test for the normal hash partition for RepartitionExec. |
I'm working on HashAggregate goldmedal#3 based on this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! @goldmedal
Makes sense to me 👍 |
# TODO: The selection vector partitioning should be used for the hash join. | ||
# After fix https://github.com/apache/datafusion/issues/15382 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't implement the planner for the hash join to avoid making this PR huge and complex. I think #15382 will implement the required parts.
Isn't it always better partitioning on this selection vectors in case of hash-rep 🤔 What is the reason of keeping the old strategy ? |
I think to support this selection vector, the executors need to be updated to interpret an additional metadata column. However, since executors are part of the public interface that some downstream projects might depend on directly, it's better to ensure backward compatibility. |
I agree with @2010YOUY01. It would be a breaking change for |
The CI failure isn't related to this PR. It could be fixed by #15493 |
ab5a0e3
to
0a0055d
Compare
Which issue does this PR close?
Rationale for this change
It's a pre-work of #15382 and #15383.
What changes are included in this PR?
datafusion.optimizer.prefer_hash_selection_vector_partitioning
RepartitionExec
HashPartitionMode
to decide the hash partition behavior when planning physical plans.Are these changes tested?
Are there any user-facing changes?