Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Recompute cumulated fields when combining datasets with different frequency #222

Open
mpvginde opened this issue Feb 28, 2025 · 3 comments

Comments

@mpvginde
Copy link
Contributor

If I understand correctly, the current behavior when combining e.g. a dataset with 6h and 3h temporal frequency is that all the fields at the intersecting dates (i.e. every 6h) are kept without any additional computations.
This means that the adjusted 3h-freq dataset know has a new 6h-frequency, but e.g. the tp field still represents 3h-accumulation.

It would be useful introduce a feature that allows for the automatic recalculation of the cumulated fields, so that the adjusted datasets represent the same accumulations.

I'm willing to give it a try, but I was wondering what information is present about the fields in the dataset.
Does the dataset know which fields are accumulated and is there information about the accumulation period in the dataset?

BR,
Michiel

@flyIchtus
Copy link
Contributor

flyIchtus commented Mar 18, 2025

Hello @mpvginde did you get an answer to your questions ?

I am very interested by the topic too, and I am willing to contribute to a PR if you want to start one (maybe with the help of Baudouin).

If the cumulation information is anywhere (layman here), it should be in the datasets metadata ?
These can come either from the earthkit Field objects the dataset uses to build the .zarr (when building the .zarr from source data files), or from the .zarr itself after build (should be like ds = open_dataset(my_zarr) ; mtd = ds.metadata() ; mtd.keys()).

Probably we might want this information to lie in the .zarr anyway, so that we can get it to work on the use-case you describe (merging datasets with different frequencies, when already having them as .zarr's). This might be starting point to either check if cumulation params are in the .zarr, or build a small filter (anemoi-transforms ?) to set them in the recipe.

I'd like to give this a try on local datasets of mine.

Then for the recomputation, do you already have an entrypoint ? I guess concat.py would do the job, if we can add kwargs to recompute some fields (and then provide the cumulation routines under compute).

@mpvginde
Copy link
Contributor Author

mpvginde commented Mar 19, 2025

Hi @flyIchtus,
I don't have an answer yet on whether the metadata has information on accumulated fields (my guess is no).
But I had brief chat with @b8raoult about it. He proposed we start simple.
e.g. If you would want to merge two .zarrs with different frequencies:

How I would do it:

  • Create a class that adds N consecutive dates
  • Select the accumulations only from the 3h dataset
  • Apply that class
  • The resulting dataset should have a frequency of (original frequency * N)
    Then join:
  • 6h dataset
  • subset of 3h dataset (- accumulations) to 6h
  • summed accumulations with N=2

The class that sums the consecutive dates should inherit from Forward and following methods should be overloaded:

  • dates
  • missing
  • frequency
  • _getitem_ (probably the hardest)
  • _len_

As inputs it would need N and a list of fields that he needs to sums.

I probably will start working on it end of march, but you can go ahead if you want ofcourse.

@flyIchtus
Copy link
Contributor

flyIchtus commented Mar 21, 2025

Great Michiel I think this is a good starting point.
Fyi, when opening a dataset conaining variable 'tp', one has access to:

>>> ds = open_dataset('aro_nearlyfull.zarr')
>>> mtd = ds.metadata()
>>> vm = mtd['variables_metadata']['tp']
>>> vm
{'mars': {'date': 20230620, 'levtype': 'sfc', 'param': 'tp', 'step': 1, 'time': 0}, 'period': [0, 1], 'process': 'accumulation'}

So I guess this is part of what we want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants