-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-53425][PYTHON][TESTS] Add more table argument tests for Arrow Python UDTFs #52170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-53425][PYTHON][TESTS] Add more table argument tests for Arrow Python UDTFs #52170
Conversation
# TODO(SPARK-53426): Support named table argument with DataFrame API | ||
# input_df = self.spark.range(3) # [0, 1, 2] | ||
# result_df = NamedArgsUDTF(table_data=input_df.asTable(), multiplier=lit(5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix: #52171
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ueshin for the fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks the test can pass now!
4b03475
to
709ab71
Compare
709ab71
to
d13c7ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, LGTM.
result_df = self.spark.sql( | ||
""" | ||
SELECT * FROM partition_sum_udtf( | ||
TABLE(partition_test_data) PARTITION BY category | ||
) ORDER BY partition_key | ||
""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also potentially flaky as same as tests in the previous PR. Use terminate
to be more stable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank for pointing this out. Fixed.
result_df = self.spark.sql( | ||
""" | ||
SELECT * FROM dept_status_count_udtf( | ||
TABLE(SELECT * FROM employee_data) | ||
PARTITION BY (department, status) | ||
) ORDER BY dept, status | ||
""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
Thanks! Merging to master |
late LGTM |
What changes were proposed in this pull request?
This PR adds more tests for various table argument support for Arrow Python UDTFs.
It also exposed some existing issues that need to be fixed:
Why are the changes needed?
To improve test coverage
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit tests
Was this patch authored or co-authored using generative AI tooling?
Yes