Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential issue with xarray grid creation #245

Open
timothyas opened this issue Mar 21, 2025 · 0 comments · May be fixed by #249
Open

Potential issue with xarray grid creation #245

timothyas opened this issue Mar 21, 2025 · 0 comments · May be fixed by #249

Comments

@timothyas
Copy link
Contributor

timothyas commented Mar 21, 2025

Describe the bug

When creating a dataset from an xarray source, the code inside of src/anemoi/datasets/create/functions/sources/xarray/grid.py uses np.meshgrid to create latitudes and longitudes (e.g. these lines). Note that latitudes and longitudes vectors get created the same way, no matter what the dimension order of the underlying data is.

The problem is that this bit of numpy code might create latitudes and longitudes arrays that could be in the wrong order relative to the original dataset. This becomes a problem when forcings get added to the dataset, since they are often computed based on the lat/lon coordinate values.

Version number

v0.5.16

To Reproduce

  1. Use the following script to create small local zarr datasets via xarray (requires the package pooch to get the data, can be installed with either conda install -c conda-forge pooch or pip install pooch).
    This script will create two zarr stores with the same data: one is the "latlon.zarr" dataset, which has dims (time, latitude, longitude), and the second is the "lonlat.zarr" dataset, which has dims (time, longitude, latitude)
pull_data.py
import xarray as xr
import numpy as np

import pooch # needed for tutorial dataset

if __name__ == "__main__":

    ds = xr.tutorial.load_dataset("air_temperature")

    ds = ds.rename({"lat": "latitude", "lon": "longitude"})

    latlon = ds.air.transpose("time", "latitude", "longitude")
    lonlat = ds.air.transpose("time", "longitude", "latitude")

    latlon.to_dataset(name="temperature").to_zarr("latlon.zarr")
    lonlat.to_dataset(name="temperature").to_zarr("lonlat.zarr")
  1. Using the following two recipe files, create anemoi datasets based on the zarr stores:
recipe.lonlat.yaml
dates:
  start: 2013-01-01T00:00:00
  end: 2013-01-04T18:00:00
  frequency: 6h

input:
  join:
      - xarray-zarr:
          url: "lonlat.zarr"
          param:
              - temperature

      - forcings:
          template: ${input.join.0.xarray-zarr}
          param:
            - cos_latitude
            - cos_longitude
recipe.latlon.yaml
dates:
  start: 2013-01-01T00:00:00
  end: 2013-01-04T18:00:00
  frequency: 6h

input:
  join:
      - xarray-zarr:
          url: "latlon.zarr"
          param:
              - temperature

      - forcings:
          template: ${input.join.0.xarray-zarr}
          param:
            - cos_latitude
            - cos_longitude
$ anemoi-datasets create recipe.lonlat.yaml anemoi.lonlat.zarr --overwrite
$ anemoi-datasets create recipe.latlon.yaml anemoi.latlon.zarr --overwrite
  1. Now we can run the following python script to see that the latitudes, longitudes, and forcings that are functions of latitudes and longitudes (here just simple functions) are equal, whereas the original data are not equal. The actual data is the same, but it is just laid out differently in the anemoi dataset's flattened grid space because of the different dimension order. However, since anemoi creates its own latitude and longitude grid, the latitudes and longitudes (and any forcings) are the same. This means that we could have forcings and coordinate information that is inconsistent with the underlying data.
test.py
import xarray as xr
import numpy as np

if __name__ == "__main__":

    print(f"First 10 values in {key}")
    print(f"\t in lonlat = {lonlat[key].values[:10]}")
    print(f"\t in latlon = {latlon[key].values[:10]}")
    lonlat = xr.open_zarr("anemoi.lonlat.zarr")
    latlon = xr.open_zarr("anemoi.latlon.zarr")

    for key in ["latitudes", "longitudes"]:

        isequal = (lonlat[key] == latlon[key]).all().values
        print(f"{key} is equal = {isequal}")

    for key in ["cos_latitude", "cos_longitude", "temperature"]:

        ilonlat = lonlat.attrs["variables"].index(key)
        ilatlon = latlon.attrs["variables"].index(key)

        isclose = np.allclose(
            lonlat["data"].sel(variable=ilonlat),
            latlon["data"].sel(variable=ilatlon),
        )
        print(f"{key} is close = {isclose}")

Expected behavior

None of the latitudes, longitudes, forcings, or the temperature data should be equal or close due to the dimension orderings (we should get "False" for everything).

Screenshots

Here's the output I get...

Image

Additional context

This requires the fix in #244 in order for the example to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant