Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrow-ord: add support for nested types to partition #7131

Merged
merged 1 commit into from
Feb 19, 2025

Conversation

asubiotto
Copy link
Contributor

@asubiotto asubiotto commented Feb 13, 2025

This support is currently incorrectly assumed by BoundedWindowAggExec, so partitioning on a nested type (e.g. struct) causes a nested comparison failure on execution.

This commit adds a check to use distinct on non-nested types and falls back to using make_comparator on nested types.

Which issue does this PR close?

Rationale for this change

Please see #7130 for more in depth explanation and alternatives considered.

What changes are included in this PR?

If statement to use the old path on non-nested types and a fallback path to use make_comparator to check for value distinctness.

Are there any user-facing changes?

Previously failing use cases are now supported.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Feb 13, 2025
@asubiotto
Copy link
Contributor Author

cc @tustvold since it seems like you've worked in this code recently

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @asubiotto -- this looks great to me.

I think it just needs one more test and it would be good to merge

if !v.data_type().is_nested() {
return Ok(distinct(&v1, &v2)?.values().clone());
}
// Given that we're only comparing values, null ordering in the input or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if using eq would be faster 🤔

https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.eq.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean for non-nested types? eq doesn't support nested types similarly to distinct and given they both shell out to compare_op I don't think there should be much of a perf difference between distinct and eq + mapping nulls to booleans (which would be necessary).

@alamb alamb changed the title arrow-ord: add support for partitioning nested types arrow-ord: add support for nested types to partition Feb 15, 2025
This support is currently incorrectly assumed by `BoundedWindowAggExec`, so
partitioning on a nested type (e.g. struct) causes a nested comparison failure
on execution.

This commit adds a check to use distinct on non-nested types and falls back to
using make_comparator on nested types.
@asubiotto
Copy link
Contributor Author

Thanks for the review!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @asubiotto

@alamb alamb merged commit f6ac87e into apache:main Feb 19, 2025
17 checks passed
@asubiotto asubiotto deleted the asubiotto-pnested branch February 19, 2025 12:03
friendlymatthew pushed a commit to friendlymatthew/arrow-rs that referenced this pull request Feb 21, 2025
This support is currently incorrectly assumed by `BoundedWindowAggExec`, so
partitioning on a nested type (e.g. struct) causes a nested comparison failure
on execution.

This commit adds a check to use distinct on non-nested types and falls back to
using make_comparator on nested types.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

arrow-ord: support partitioning on nested types
2 participants