You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When creating a dataset from an xarray source, the code inside of src/anemoi/datasets/create/functions/sources/xarray/grid.py uses np.meshgrid to create latitudes and longitudes (e.g. these lines). Note that latitudes and longitudes vectors get created the same way, no matter what the dimension order of the underlying data is.
The problem is that this bit of numpy code might create latitudes and longitudes arrays that could be in the wrong order relative to the original dataset. This becomes a problem when forcings get added to the dataset, since they are often computed based on the lat/lon coordinate values.
Version number
v0.5.16
To Reproduce
Use the following script to create small local zarr datasets via xarray (requires the package pooch to get the data, can be installed with either conda install -c conda-forge pooch or pip install pooch).
This script will create two zarr stores with the same data: one is the "latlon.zarr" dataset, which has dims (time, latitude, longitude), and the second is the "lonlat.zarr" dataset, which has dims (time, longitude, latitude)
Now we can run the following python script to see that the latitudes, longitudes, and forcings that are functions of latitudes and longitudes (here just simple functions) are equal, whereas the original data are not equal. The actual data is the same, but it is just laid out differently in the anemoi dataset's flattened grid space because of the different dimension order. However, since anemoi creates its own latitude and longitude grid, the latitudes and longitudes (and any forcings) are the same. This means that we could have forcings and coordinate information that is inconsistent with the underlying data.
test.py
importxarrayasxrimportnumpyasnpif__name__=="__main__":
print(f"First 10 values in {key}")
print(f"\t in lonlat = {lonlat[key].values[:10]}")
print(f"\t in latlon = {latlon[key].values[:10]}")
lonlat=xr.open_zarr("anemoi.lonlat.zarr")
latlon=xr.open_zarr("anemoi.latlon.zarr")
forkeyin ["latitudes", "longitudes"]:
isequal= (lonlat[key] ==latlon[key]).all().valuesprint(f"{key} is equal = {isequal}")
forkeyin ["cos_latitude", "cos_longitude", "temperature"]:
ilonlat=lonlat.attrs["variables"].index(key)
ilatlon=latlon.attrs["variables"].index(key)
isclose=np.allclose(
lonlat["data"].sel(variable=ilonlat),
latlon["data"].sel(variable=ilatlon),
)
print(f"{key} is close = {isclose}")
Expected behavior
None of the latitudes, longitudes, forcings, or the temperature data should be equal or close due to the dimension orderings (we should get "False" for everything).
Screenshots
Here's the output I get...
Additional context
This requires the fix in #244 in order for the example to work.
The text was updated successfully, but these errors were encountered:
Describe the bug
When creating a dataset from an xarray source, the code inside of src/anemoi/datasets/create/functions/sources/xarray/grid.py uses
np.meshgrid
to create latitudes and longitudes (e.g. these lines). Note that latitudes and longitudes vectors get created the same way, no matter what the dimension order of the underlying data is.The problem is that this bit of numpy code might create
latitudes
andlongitudes
arrays that could be in the wrong order relative to the original dataset. This becomes a problem when forcings get added to the dataset, since they are often computed based on the lat/lon coordinate values.Version number
v0.5.16
To Reproduce
conda install -c conda-forge pooch
orpip install pooch
).This script will create two zarr stores with the same data: one is the "latlon.zarr" dataset, which has dims
(time, latitude, longitude)
, and the second is the "lonlat.zarr" dataset, which has dims(time, longitude, latitude)
pull_data.py
recipe.lonlat.yaml
recipe.latlon.yaml
test.py
Expected behavior
None of the latitudes, longitudes, forcings, or the temperature data should be equal or close due to the dimension orderings (we should get "False" for everything).
Screenshots
Here's the output I get...
Additional context
This requires the fix in #244 in order for the example to work.
The text was updated successfully, but these errors were encountered: