-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: Filter out StringDType
even when the backing array is not NumpyExtensionArray
#10559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I've added the |
object
for pd.StringDtype
in safe_cast_to_index
StringDType
even when the backing array is not NumpyExtensionArray
@@ -7293,7 +7293,7 @@ def from_dataframe(cls, dataframe: pd.DataFrame, sparse: bool = False) -> Self: | |||
arrays = [] | |||
extension_arrays = [] | |||
for k, v in dataframe.items(): | |||
if not is_extension_array_dtype(v) or isinstance( | |||
if not is_allowed_extension_array(v) or isinstance( | |||
v.array, UNSUPPORTED_EXTENSION_ARRAY_TYPES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't add UNSUPPORTED_EXTENSION_ARRAY_TYPES
to the new is_allowed_extension_array
function because we do allow them as backing arrays to Index
object, I think. Maybe should get a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should merge these checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to account for internal duck array support i.e., that which allows preserving the dtype
of extension array indices that are in this UNSUPPORTED_EXTENSION_ARRAY_TYPES
whitelist. See the note here:
xarray/xarray/core/extension_array.py
Lines 101 to 103 in 5ce69b2
# This does not use the UNSUPPORTED_EXTENSION_ARRAY_TYPES whitelist because | |
# we do support extension arrays from datetime, for example, that need | |
# duck array support internally via this class. |
Sadly these upstream failures seem related. @kmuehlbauer may be able to help reason through them |
@dcherian I think they're only related in so far as they come from xarray/xarray/core/variable.py Lines 215 to 216 in 5ce69b2
It appears (I have no experience with this, so correct me if I'm wrong), that the "real" type is encoded in dtype("O").metadata for certain netcdf4 variables. But this dtype.metadata is not preserved after a pd.Series call. I added a fix for it, but made it a separate PR as I'd like to ensure this PR does what it advertises first
|
Co-authored-by: Deepak Cherian <[email protected]>
…o ig/fix_string_dtype
The linked issue is resolved by this PR but more broadly, the issue underlying it (letting in string dtypes that are not
NumpyExtensionArray
) was bound to come up at some point so this PR more comprehensively fixes that issue and tries to centralize our "whitelist"whats-new.rst
api.rst