Feature Request: Recompute cumulated fields when combining datasets with different frequency #222

mpvginde · 2025-02-28T08:54:11Z

If I understand correctly, the current behavior when combining e.g. a dataset with 6h and 3h temporal frequency is that all the fields at the intersecting dates (i.e. every 6h) are kept without any additional computations.
This means that the adjusted 3h-freq dataset know has a new 6h-frequency, but e.g. the tp field still represents 3h-accumulation.

It would be useful introduce a feature that allows for the automatic recalculation of the cumulated fields, so that the adjusted datasets represent the same accumulations.

I'm willing to give it a try, but I was wondering what information is present about the fields in the dataset.
Does the dataset know which fields are accumulated and is there information about the accumulation period in the dataset?

BR,
Michiel

The text was updated successfully, but these errors were encountered:

flyIchtus · 2025-03-18T19:46:26Z

Hello @mpvginde did you get an answer to your questions ?

I am very interested by the topic too, and I am willing to contribute to a PR if you want to start one (maybe with the help of Baudouin).

If the cumulation information is anywhere (layman here), it should be in the datasets metadata ?
These can come either from the earthkit Field objects the dataset uses to build the .zarr (when building the .zarr from source data files), or from the .zarr itself after build (should be like ds = open_dataset(my_zarr) ; mtd = ds.metadata() ; mtd.keys()).

Probably we might want this information to lie in the .zarr anyway, so that we can get it to work on the use-case you describe (merging datasets with different frequencies, when already having them as .zarr's). This might be starting point to either check if cumulation params are in the .zarr, or build a small filter (anemoi-transforms ?) to set them in the recipe.

I'd like to give this a try on local datasets of mine.

Then for the recomputation, do you already have an entrypoint ? I guess concat.py would do the job, if we can add kwargs to recompute some fields (and then provide the cumulation routines under compute).

mpvginde · 2025-03-19T08:26:42Z

Hi @flyIchtus,
I don't have an answer yet on whether the metadata has information on accumulated fields (my guess is no).
But I had brief chat with @b8raoult about it. He proposed we start simple.
e.g. If you would want to merge two .zarrs with different frequencies:

How I would do it:

Create a class that adds N consecutive dates

Select the accumulations only from the 3h dataset

Apply that class

The resulting dataset should have a frequency of (original frequency * N)
Then join:

6h dataset

subset of 3h dataset (- accumulations) to 6h

summed accumulations with N=2

The class that sums the consecutive dates should inherit from Forward and following methods should be overloaded:

dates
missing
frequency
_getitem_ (probably the hardest)
_len_

As inputs it would need N and a list of fields that he needs to sums.

I probably will start working on it end of march, but you can go ahead if you want ofcourse.

flyIchtus · 2025-03-21T10:46:11Z

Great Michiel I think this is a good starting point.
Fyi, when opening a dataset conaining variable 'tp', one has access to:

>>> ds = open_dataset('aro_nearlyfull.zarr')
>>> mtd = ds.metadata()
>>> vm = mtd['variables_metadata']['tp']
>>> vm
{'mars': {'date': 20230620, 'levtype': 'sfc', 'param': 'tp', 'step': 1, 'time': 0}, 'period': [0, 1], 'process': 'accumulation'}

So I guess this is part of what we want.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Recompute cumulated fields when combining datasets with different frequency #222

Feature Request: Recompute cumulated fields when combining datasets with different frequency #222

mpvginde commented Feb 28, 2025

flyIchtus commented Mar 18, 2025 •

edited

Loading

mpvginde commented Mar 19, 2025 •

edited

Loading

flyIchtus commented Mar 21, 2025 •

edited

Loading

Feature Request: Recompute cumulated fields when combining datasets with different frequency #222

Feature Request: Recompute cumulated fields when combining datasets with different frequency #222

Comments

mpvginde commented Feb 28, 2025

flyIchtus commented Mar 18, 2025 • edited Loading

mpvginde commented Mar 19, 2025 • edited Loading

flyIchtus commented Mar 21, 2025 • edited Loading

flyIchtus commented Mar 18, 2025 •

edited

Loading

mpvginde commented Mar 19, 2025 •

edited

Loading

flyIchtus commented Mar 21, 2025 •

edited

Loading