Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multi-stage] [optimization] Move inequi join out of hashjoin when there is no join key #9728

Open
61yao opened this issue Nov 4, 2022 · 4 comments · May be fixed by #14942
Open

[multi-stage] [optimization] Move inequi join out of hashjoin when there is no join key #9728

61yao opened this issue Nov 4, 2022 · 4 comments · May be fixed by #14942
Labels
enhancement multi-stage Related to the multi-stage query engine

Comments

@61yao
Copy link
Contributor

61yao commented Nov 4, 2022

JoinOperator is executed using for (row: broadcastTable) { join(left, right)} and then apply inequijoin condition.

equi is applied before join condition.

See the testInequi join failure in HashJoinOperatorTest:

#9743

@61yao
Copy link
Contributor Author

61yao commented Nov 6, 2022

@walterddr Reproduced this one in unit test

@walterddr walterddr added the multi-stage Related to the multi-stage query engine label Nov 7, 2022
@61yao
Copy link
Contributor Author

61yao commented Nov 9, 2022

I did a little bit investigation. This is how I imagine this will be fixed.

  1. Have a BroadcastNestedLoopJoin class to handle inequi join.
    When we check inequi join condition, hash table is probably not useful
  2. Have a JoinFilterFactory then registers JoinFilterFunction

JoinFilterFunction should take (leftPos, rightPos, leftObj, rightObj)

This is because we don't want to pre-join the rows to have a copy and then apply the filter.

@61yao 61yao changed the title [multi-stage] [bug] Inequi join wrong result [multi-stage] [optimization] Move Inequi join out of hashJoin Nov 10, 2022
@61yao 61yao changed the title [multi-stage] [optimization] Move Inequi join out of hashJoin [multi-stage] [bug] Inequi join produce wrong result when there is a mix of equi join and inequijoin Nov 10, 2022
@61yao
Copy link
Contributor Author

61yao commented Nov 10, 2022

EDIT: I am wrong. Let me actually get the test case and update this.

@61yao 61yao changed the title [multi-stage] [bug] Inequi join produce wrong result when there is a mix of equi join and inequijoin [multi-stage] [optimization] Move inequi join out to hashjoin Nov 10, 2022
@61yao
Copy link
Contributor Author

61yao commented Nov 10, 2022

@walterddr Sorry for the confusion. It is actually working because empty key gives the same hash, which is 1. Let's move this out of hashJoin in the future.

@61yao 61yao changed the title [multi-stage] [optimization] Move inequi join out to hashjoin [multi-stage] [optimization] Move inequi join out of hashjoin when there is no join key Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement multi-stage Related to the multi-stage query engine
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants