-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1952124 decimal coercion #3159
base: main
Are you sure you want to change the base?
Conversation
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses SNOW-1952124 by enhancing the coercion of int and float values into Decimal columns for the fillna and replace DataFrame methods. Key changes include:
- Adding new tests in tests/integ/test_dataframe.py to verify decimal coercion behavior when using the include_decimal flag.
- Updating the implementation in src/snowflake/snowpark/dataframe_na_functions.py to support the include_decimal flag in functions fill and replace.
- Enhancing type utility functions and adding a helper function (null_data4) in tests/utils.py to support the new decimal coercion behavior.
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
tests/integ/test_dataframe.py | Added tests verifying decimal coercion with include_decimal flag. |
src/snowflake/snowpark/dataframe_na_functions.py | Updated fill and replace functions, including _is_value_type_matching_for_na_function with a new include_decimal parameter. |
tests/utils.py | Introduced null_data4 to support testing of Decimal columns. |
src/snowflake/snowpark/_internal/type_utils.py | Minor code style and import updates related to type handling. |
CHANGELOG.md | Documented the new behavior in fillna and replace functions. |
Comments suppressed due to low confidence (1)
src/snowflake/snowpark/dataframe_na_functions.py:60
- Consider flattening the tuple for int_types by concatenating DecimalType instead of nesting tuples (e.g., int_types = int_types + (DecimalType,)). This will ensure the isinstance check behaves as expected when include_decimal is True.
int_types = (int_types, DecimalType)
int_types = (int_types, DecimalType) | ||
float_types = (float_types, DecimalType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider flattening the tuple for float_types by concatenating DecimalType instead of nesting tuples (e.g., float_types = float_types + (DecimalType,)). This adjustment will allow isinstance checks to correctly include DecimalType when include_decimal is True.
int_types = (int_types, DecimalType) | |
float_types = (float_types, DecimalType) | |
int_types = int_types + (DecimalType,) | |
float_types = float_types + (DecimalType,) |
Copilot is powered by AI, so mistakes are possible. Review output carefully before use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://docs.python.org/3/library/functions.html#isinstance
If classinfo is a tuple of type objects (or recursively, other such tuples)
So the code should work as expected. Unless a maintainer would like me to change this I won't
# Fill Decimal columns with int | ||
Utils.check_answer( | ||
TestData.null_data4(session).fillna(123, include_decimal=True), | ||
[ | ||
Row(decimal.Decimal(1), decimal.Decimal(123)), | ||
Row(decimal.Decimal(123), 2), | ||
], | ||
sort=False, | ||
) | ||
# Fill Decimal columns with float | ||
Utils.check_answer( | ||
TestData.null_data4(session).fillna(123.0, include_decimal=True), | ||
[ | ||
Row(decimal.Decimal(1), decimal.Decimal(123)), | ||
Row(decimal.Decimal(123), 2), | ||
], | ||
sort=False, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just occurred to me that, for symmetry, we should support filling int/long/float/double columns with decimals... But for some reason Spark does not allow that, so I don't know what the right answer is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's because int/float can be safely fit into decimal, but not the other way around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with the type conversion. And spark explicitly mentioned that decimal
can't be the input for replace
api.
@@ -94,6 +94,8 @@ | |||
) | |||
|
|||
if TYPE_CHECKING: | |||
import snowflake.snowpark.column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have this import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type checking. My editor was telling me that the type-hint at some part of this file was broken
field.name | ||
if context._should_use_structured_type_semantics() | ||
else quote_name(field.name, keep_case=True), | ||
( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding for better readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My editor has black formatting built-in on save, it was done by it automatically. I can revert this if people would like me to!
# Replace Decimal value with int | ||
Utils.check_answer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems related to coercion in localtesting, you can temporarily skip the local testing by:
def test_replace_with_coercion(session, local_testing_mode):
# existing code
# ...
if local_testing_mode:
# SNOW-xxxx local test gap
return
# your new test code
c6e04b4
to
a6e8153
Compare
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1952124
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
We found a problem where
int
s andfloat
s should be able to fit intoDecimal
columns without problems. We have seen this discrepancy infillna
andreplace
functionalities.In this PR I fix this behavior, but place it behind a flag argument and test these new/old behaviours.