-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Issue with decoding dataset datetime that include NaN values (cftime
and linux bug)
#648
Comments
Thanks for opening this issue @ezekiel-lemur! We'll take a look at it. |
It looks like the data used here does not have standard or conventional time axis. @ezekiel-lemur would it be possible to share your data so one can try reproduce your error? |
Thank you @lee1043 and @tomvothecoder ! |
@ezekiel-lemur thanks for the data. I was able to reproduce the error. I think the issue might be related to the 2-dimensional time coordinate. In most case xcdat presumes dataArray for coordinate or axis to be 1-dimensional (@tomvothecoder please correct me if I am wrong), but your time dimension has 2 dimensions. And in the dataset, It looks like there are different observation points and each of them has different trajectory (not sure trajectory here means time or not). I also noticed that the time coordinate repeated values for different trajectories, which I got confused for the structure of the data. Can you say little bit more about your data? e.g., expected dimensions for each variables? |
Thanks for helping debug, Jiwoo! Yes this is correct, xCDAT expects 1-D coordinates. |
Why can't I reproduce the error (just tried 0.6.1 and 0.7.0)? # imports
import xcdat as xc
# I/0
fn = 'subset.nc'
ds = xc.open_dataset(fn, decode_times=False)
dsd = xc.decode_time(ds)
dsd.time.values
|
@pochedls that's interesting... I still get the same error when running your code. My xcdat version is 0.6.1. |
Thank you for looking into it.Indeed the data is from a Lagrangian backtracking experiment, where along one particle trajectory observations are made (at given times). Do you think it would be possible to convert each dimension separately with xcdat?
|
This is working for me on an M2 Mac (updated xarray and xcdat), but not linux. The problem on linux traces to cftime:
Both Mac and Linux are using cftime 1.6.2. On Linux the error I get is: # imports
import xcdat as xc
import cftime
# I/0
fn = 'subset.nc'
ds = xc.open_dataset(fn, decode_times=False)
cftime.num2date(ds.time.values, ds.time.units, 'standard', only_use_cftime_datetimes=True)
If someone else could verify this on Mac, then I think this might be isolated to a cftime + linux issue. |
I created a fresh environment with I ran both versions of the code here and here on my M2 Mac and both worked, although I get a xcdat.__version__
'0.7.0'
xarray.__version__
'2024.3.0' # Also worked with '2024.2.0' and '2023.11.0'
cftime.__version__
'1.6.3'. # Also worked with '1.6.2' <ipython-input-2-858b1454e938>:5: RuntimeWarning: invalid value encountered in cast
cftime.num2date(ds.time.values, ds.time.units, 'standard', only_use_cftime_datetimes=True)
array([[cftime.DatetimeGregorian(499, 2, 8, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(499, 2, 7, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(499, 2, 6, 0, 0, 0, 0, has_year_zero=False),
...,
cftime.DatetimeGregorian(496, 8, 5, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(496, 8, 4, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(488, 2, 1, 0, 0, 0, 0, has_year_zero=False)],
[cftime.DatetimeGregorian(499, 2, 8, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(499, 2, 7, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(499, 2, 6, 0, 0, 0, 0, has_year_zero=False),
...,
cftime.DatetimeGregorian(496, 8, 5, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(496, 8, 4, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(488, 2, 1, 0, 0, 0, 0, has_year_zero=False)],
[cftime.DatetimeGregorian(499, 2, 8, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(499, 2, 7, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(499, 2, 6, 0, 0, 0, 0, has_year_zero=False),
...,
cftime.DatetimeGregorian(496, 8, 5, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(496, 8, 4, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(488, 2, 1, 0, 0, 0, 0, has_year_zero=False)],
...,
[cftime.DatetimeGregorian(499, 2, 8, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(499, 2, 7, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(499, 2, 6, 0, 0, 0, 0, has_year_zero=False),
...,
cftime.DatetimeGregorian(496, 8, 5, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(496, 8, 4, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(488, 2, 1, 0, 0, 0, 0, has_year_zero=False)],
[cftime.DatetimeGregorian(499, 2, 8, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(499, 2, 7, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(499, 2, 6, 0, 0, 0, 0, has_year_zero=False),
...,
cftime.DatetimeGregorian(496, 8, 5, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(496, 8, 4, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(488, 2, 1, 0, 0, 0, 0, has_year_zero=False)],
[cftime.DatetimeGregorian(499, 2, 8, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(499, 2, 7, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(499, 2, 6, 0, 0, 0, 0, has_year_zero=False),
...,
cftime.DatetimeGregorian(496, 8, 5, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(496, 8, 4, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(488, 2, 1, 0, 0, 0, 0, has_year_zero=False)]],
dtype=object) |
I ran the same sets of code in my above comment on Linux (RH7) and I get the same error. I think this confirms it is a UPDATE: Just opened a ticket on the ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File [/home/vo13/xCDAT/xcdat/648_qa.py:5](https://vscode-remote+ssh-002dremote-002bacme1.vscode-resource.vscode-cdn.net/home/vo13/xCDAT/xcdat/648_qa.py:5)
[3](https://vscode-remote+ssh-002dremote-002bacme1.vscode-resource.vscode-cdn.net/home/vo13/xCDAT/xcdat/648_qa.py:3) fn = 'subset.nc'
[4](https://vscode-remote+ssh-002dremote-002bacme1.vscode-resource.vscode-cdn.net/home/vo13/xCDAT/xcdat/648_qa.py:4) ds = xc.open_dataset(fn, decode_times=False)
----> [5](https://vscode-remote+ssh-002dremote-002bacme1.vscode-resource.vscode-cdn.net/home/vo13/xCDAT/xcdat/648_qa.py:5) cftime.num2date(ds.time.values, ds.time.units, 'standard', only_use_cftime_datetimes=True)
File src/cftime/_cftime.pyx:630, in cftime._cftime.num2date()
File src/cftime/_cftime.pyx:499, in cftime._cftime.decode_dates_from_array()
TypeError: unsupported operand type(s) for +: 'cftime._cftime.DatetimeGregorian' and 'NoneType' |
Thanks for following up with cftime :) |
@ezekiel-lemur Of course! Thanks for working with us through this issue. Hopefully it can be resolved soon. The (inconvenient) alternative is to try your code on a non-Linux based machine for now. |
cftime
and linux bug)
Just an update about why this bug only happens on Linux (comment):
|
Hey! I see the issue was closed on Nctime, but I am not sure what the solution is for now? Would you be so kind and let me know please :) |
Hi @ezekiel-lemur, here's a quick, temporary, hacky workaround to this issue on Linux. It involves converting import cftime
import numpy as np
import xarray as xr
import xcdat as xc
# Open the dataset.
filepath = "subset.nc"
ds = xr.open_dataset(filepath, decode_times=False)
# _FillValue is `nan`, so we need to use another value (I choose 0)
print(ds.time.encoding["_FillValue"])
# Make sure no values are 0, which we use to represent missing values.
# If 0's are present, we will inadvertently drop actual time coordinate values
# and instead need to use another fill value.
# The print shows the shape is (100, 920), which is the same as before.
# We can use 0 to represent missing values.
print(ds.time.where(ds.time != 0, drop=True).shape)
# Convert `np.nan` to 0 to represent missing values.
ds["time"] = ds.time.fillna(0)
ds["time"].attrs["axis"] = "time"
ds["time"].attrs["long_name"] = "time"
ds = ds.set_coords(("time"))
# Decode the time coordinates.
ds_decoded = xc.decode_time(ds)
# Convert the cftime.datetime value 0 back to `np.nan`.
fill_value = cftime.num2date([0], ds.time.units, calendar=ds.time.calendar)
ds_decoded["time"] = ds_decoded.time.where(
ds_decoded.time != fill_value, np.nan, drop=False
) Alternatively, you can wait for a new release of |
Just a note...it's not clear to me that things will work correctly after the cftime PR is merged (I'm not sure how xarray is going to handle masked arrays from cftime). |
I'm also not sure how Xarray handles masked arrays within |
FYI @ezekiel-lemur. I will close this issue now.
My theory above seems to be correct based on @pochedls's assessment:
import numpy as np
import xarray as xr
x = np.array([0, 1, 2, 3, 4, 5])
xm = np.ma.masked_where(x==2, x)
xr.DataArray(data=xm, dims=['x'])
# <xarray.DataArray (x: 6)> Size: 48B
# array([ 0., 1., nan, 3., 4., 5.])
# Dimensions without coordinates: x
|
What happened?
Issue with decoding datetime that include NaN values
What did you expect to happen? Are there are possible answers you came across?
No response
Minimal Complete Verifiable Example (MVCE)
Relevant log output
Anything else we need to know?
No response
Environment
version('xcdat'): '0.7.0'
INSTALLED VERSIONS
commit: None
python: 3.12.0 | packaged by conda-forge | (main, Oct 3 2023, 08:43:22) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-513.24.1.el8_9.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.2
xarray: 2023.11.0
pandas: 2.1.4
numpy: 1.26.3
scipy: 1.11.4
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.1
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.3.5
dask: 2023.12.0
distributed: 2023.12.0
matplotlib: 3.8.0
cartopy: 0.22.0
seaborn: None
numbagg: None
fsspec: 2023.10.0
cupy: None
pint: None
sparse: 0.15.1
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: 7.4.3
mypy: None
IPython: 8.20.0
sphinx: None
The text was updated successfully, but these errors were encountered: