-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1918055: Update agg error for unsupported aggregation functions #3133
base: main
Are you sure you want to change the base?
SNOW-1918055: Update agg error for unsupported aggregation functions #3133
Conversation
Signed-off-by: Labanya Mukhopadhyay <[email protected]>
Signed-off-by: Labanya Mukhopadhyay <[email protected]>
Signed-off-by: Labanya Mukhopadhyay <[email protected]>
Signed-off-by: Labanya Mukhopadhyay <[email protected]>
bool | ||
True if all functions in the list are snowflake supported aggregation functions, otherwise, | ||
return False | ||
list | ||
The list of unsupported functions used for aggregation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: usually I see docstrings for functions like this explicitly mention tuple[bool, list]
as the return type, and describe what each member of the tuple means rather than separating out the values.
for value in agg_func.values() | ||
) | ||
if not is_supported_func: | ||
supported_flag = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just return here? Is your intent to combine the unsupported_arguments
lists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first function that's unsupported will be returned. The case with multiple unsupported functions needs to be handled which will require returning in this loop, so I'll make those changes as well as using repr_aggregate_function(agg_func, agg_kwargs)!
""" | ||
# validate agg_func, only snowflake builtin agg function or dict of snowflake builtin agg | ||
# function can be implemented in distributed way. | ||
unsupported_arguments: list[str] = [] | ||
supported_flag = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the name "is_supported
" so there's 0 ambiguity about the meaning of the flag's T/F value.
) = check_is_aggregation_supported_in_snowflake(agg_func, agg_kwargs, axis) | ||
if not is_supported: | ||
raise AttributeError( | ||
f"'SeriesGroupBy' object has no attribute '{unsupported_arguments}'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if this is an aggregation that native pandas supports but we do not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be returning False since we check if there is a corresponding Snowflake aggregation function via get_snowflake_agg_func(). The overall checking logic for if a function is supported should not be changing here
basic_snowpark_pandas_df = pd.DataFrame( | ||
data=8 * [range(3)], columns=["a", "b", "c"] | ||
) | ||
# basic_snowpark_pandas_df = basic_snowpark_pandas_df.groupby(['a', 'b']).sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean to delete this?
@@ -54,12 +54,13 @@ | |||
- Fixed a bug where creating a Dataframe with large number of values raised `Unsupported feature 'SCOPED_TEMPORARY'.` error if thread-safe session was disabled. | |||
- Fixed a bug where `df.describe` raised internal SQL execution error when the dataframe is created from reading a stage file and CTE optimization is enabled. | |||
- Fixed a bug where `df.order_by(A).select(B).distinct()` would generate invalid SQL when simplified query generation was enabled using `session.conf.set("use_simplified_query_generation", True)`. | |||
- Disabled simplified query generation by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
@@ -851,40 +851,50 @@ def _is_supported_snowflake_agg_func( | |||
The value can be different for different aggregation functions. | |||
Returns: | |||
is_valid: bool. Whether it is valid to implement with snowflake or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspected this comment is the result of copy-pasting , Whether it is valid to implement with snowflake or not.
is not consistent of the semantic of this function (to check "check if the aggregation function is supported with snowflake")
""" | ||
if isinstance(agg_func, tuple) and len(agg_func) == 2: | ||
# For named aggregations, like `df.agg(new_col=("old_col", "sum"))`, | ||
# take the second part of the named aggregation. | ||
agg_func = agg_func[0] | ||
return get_snowflake_agg_func(agg_func, agg_kwargs, axis, _is_df_agg) is not None | ||
if get_snowflake_agg_func(agg_func, agg_kwargs, axis, _is_df_agg) is None: | ||
return False, agg_func |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is agg_func
guaranteed to be a list?
unsupported_list: list[str] = [] | ||
for func in agg_funcs: | ||
is_supported, unsupported_list = _is_supported_snowflake_agg_func( | ||
func, agg_kwargs, axis, _is_df_agg | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the unsupported_list
need to be appended to in the for loop ? it seems like it has been replaced/overwritten every time here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we please add a unit test for this function _are_all_agg_funcs_supported_by_snowflake ? we could use some functions that for sure is not going to be supported. I suspect the current code change has a bug. Thanks
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1918055
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Updating groupby.agg and agg error for unsupported aggregation functions to match pandas. It will now return
'SeriesGroupBy' object has no attribute 'COUNT'