-
Notifications
You must be signed in to change notification settings - Fork 873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arrow-ord: add support for nested types to partition
#7131
Conversation
cc @tustvold since it seems like you've worked in this code recently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @asubiotto -- this looks great to me.
I think it just needs one more test and it would be good to merge
if !v.data_type().is_nested() { | ||
return Ok(distinct(&v1, &v2)?.values().clone()); | ||
} | ||
// Given that we're only comparing values, null ordering in the input or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if using eq
would be faster 🤔
https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.eq.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean for non-nested types? eq
doesn't support nested types similarly to distinct
and given they both shell out to compare_op
I don't think there should be much of a perf difference between distinct
and eq
+ mapping nulls to booleans (which would be necessary).
partition
This support is currently incorrectly assumed by `BoundedWindowAggExec`, so partitioning on a nested type (e.g. struct) causes a nested comparison failure on execution. This commit adds a check to use distinct on non-nested types and falls back to using make_comparator on nested types.
07a5285
to
536082c
Compare
Thanks for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @asubiotto
This support is currently incorrectly assumed by `BoundedWindowAggExec`, so partitioning on a nested type (e.g. struct) causes a nested comparison failure on execution. This commit adds a check to use distinct on non-nested types and falls back to using make_comparator on nested types.
This support is currently incorrectly assumed by
BoundedWindowAggExec
, so partitioning on a nested type (e.g. struct) causes a nested comparison failure on execution.This commit adds a check to use distinct on non-nested types and falls back to using make_comparator on nested types.
Which issue does this PR close?
Rationale for this change
Please see #7130 for more in depth explanation and alternatives considered.
What changes are included in this PR?
If statement to use the old path on non-nested types and a fallback path to use
make_comparator
to check for value distinctness.Are there any user-facing changes?
Previously failing use cases are now supported.