Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to_pandas writable bug for datetime and timedelta types #17913

Open
wants to merge 1 commit into
base: branch-25.04
Choose a base branch
from

Conversation

galipremsagar
Copy link
Contributor

Description

Fixes: #17743

This PR fixes writable flag for numpy arrays. This bug is only specific to these two types.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@galipremsagar galipremsagar added bug Something isn't working non-breaking Non-breaking change labels Feb 4, 2025
@galipremsagar galipremsagar self-assigned this Feb 4, 2025
@galipremsagar galipremsagar requested a review from a team as a code owner February 4, 2025 17:44
@github-actions github-actions bot added the Python Affects Python cuDF API. label Feb 4, 2025
@mroeschke
Copy link
Contributor

Curious how #17743 as a bug affects subsequent operations (i.e. is this a bug we really need to worry about?)

pandas.Index is not supposed to be publicly mutable, so the underlying numpy array not being writable doesn't seem that big of a deal, especially since the fix requires making a CPU copy of the data

@galipremsagar
Copy link
Contributor Author

galipremsagar commented Feb 4, 2025

Curious how #17743 as a bug affects subsequent operations (i.e. is this a bug we really need to worry about?)

pandas.Index is not supposed to be publicly mutable, so the underlying numpy array not being writable doesn't seem that big of a deal, especially since the fix requires making a CPU copy of the data

It's not serious. But AFAIK it is because ._data yields a DatetimeArray and that is being mutated in pandas pytests leading to failures in the test suite. If not this fix, we will need to try to address this for cudf.pandas case at least. But since users can run into this issue with cudf classic aswell I went ahead and opened a short-term fix until the arrow bug is fixed.

@mroeschke
Copy link
Contributor

mroeschke commented Feb 5, 2025

But since users can run into this issue with cudf classic

Yeah so I'm curious what subsequent issue a user would run into if a pandas.Index returned from to_pandas() if the underlying numpy array was not writable. (i.e. if no public APIs break when using this Index I'm just tempted to stay this is a won't fix to avoid the CPU copy)

@galipremsagar
Copy link
Contributor Author

But since users can run into this issue with cudf classic

Yeah so I'm curious what subsequent issue a user would run into if a pandas.Index returned from to_pandas() if the underlying numpy array was not writable. (i.e. if no public APIs break when using this Index I'm just tempted to stay this is a won't fix to avoid the CPU copy)

This is the user-facing case I'm talking about:

In [1]: import pandas as pd
i 
In [2]: i = pd.Index([1, 2, 3], dtype="datetime64[ns]")

In [4]: i.to_numpy()
Out[4]: 
array(['1970-01-01T00:00:00.000000001', '1970-01-01T00:00:00.000000002',
       '1970-01-01T00:00:00.000000003'], dtype='datetime64[ns]')

In [5]: x = i.to_numpy()

In [6]: x[0] = 1

In [7]: x
Out[7]: 
array(['1970-01-01T00:00:00.000000001', '1970-01-01T00:00:00.000000002',
       '1970-01-01T00:00:00.000000003'], dtype='datetime64[ns]')

In [8]: x[0] = 2

In [9]: x
Out[9]: 
array(['1970-01-01T00:00:00.000000002', '1970-01-01T00:00:00.000000002',
       '1970-01-01T00:00:00.000000003'], dtype='datetime64[ns]')

In [10]: import cudf

In [11]: i = cudf.Index([1, 2, 3], dtype="datetime64[ns]")

In [13]: x = i.to_pandas().to_numpy()

In [14]: x
Out[14]: 
array(['1970-01-01T00:00:00.000000001', '1970-01-01T00:00:00.000000002',
       '1970-01-01T00:00:00.000000003'], dtype='datetime64[ns]')

In [15]: x[0] = 2
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[15], line 1
----> 1 x[0] = 2

ValueError: assignment destination is read-only

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

[BUG] DatetimeArray is not writable in cudf.pandas
2 participants