feat: Kcrps #182

ssmmnn11 · 2025-03-11T15:41:44Z

Description

AIFS-CRPS

@mishooax, @MartinLeutbecher, @jakob-schloer, @Rilwan-Adewoyin, @mc4117, @JesperDramsch

to test use debug_ens config. Work in progress ...

mchantry · 2025-03-13T21:15:46Z

training/src/anemoi/training/config/debug_ens.yaml

+    validation: 8
+
+training:
+  forecaster: anemoi.training.train.forecaster.GraphEnsForecaster # refactor to use _target_ etc?


This is called task in the interpolator PR. Thinking about future downscaling, I think task maybe a more futureproof name. Thoughts?

How significant is the refactor to use _target_?

If everyone is happy with the current form then we can leave it as it is. I mainly wanted other opinions on this.

Would it make sense to also move the task specific args under task, i.e. multistep_input?

training/src/anemoi/training/train/train.py

training/src/anemoi/training/data/datamodule.py

training/src/anemoi/training/train/forecaster.py

training/src/anemoi/training/losses/kcrps.py

anaprietonem · 2025-03-14T13:37:56Z

Just to flag, that this PR will need some docs update. The APIs will be updated automatically with sphinx, but it would be good to have some info to explain the new model interface and some of the major changes. Thank you!

anaprietonem · 2025-03-14T13:28:21Z

training/src/anemoi/training/train/train.py

        """Provide the model instance."""
        kwargs = {
            "config": self.config,
            "data_indices": self.data_indices,
            "graph_data": self.graph_data,
+            "truncation_data": self.truncation_data,


With this I guess we make sure the truncation data matrix is available at inference time right? My other question is in the same way we track the datasets paths/files in the catalogue, it could be worth tracking the files too. Have you discussed this already with Baudouin, @ssmmnn11 ? (in terms of keeping full traceability and for back up of the files)

yes, same as graph data. I have not spoken to Baudouin about that yet

I think we only need to store this in the supporting_arrays property

anaprietonem · 2025-03-14T14:14:37Z

training/src/anemoi/training/config/training/default.yaml

+forecaster: anemoi.training.train.forecaster.GraphForecaster
+
+# select strategy
+strategy:


In the init of the strategy we include the read_group_size why don't specify this at config level?

training/src/anemoi/training/config/small_ens_test.yaml

for more information, see https://pre-commit.ci

…' into kcrps

HCookie · 2025-03-25T16:27:47Z

training/src/anemoi/training/config/debug.yaml

A number of the config files seem to be left over from testing, and need to be removed. Some even contain local paths

theissenhelen · 2025-03-28T13:45:41Z

training/src/anemoi/training/train/forecaster/ensforecaster.py

+        batch[0] = self.allgather_batch(batch[0])
+        if len(batch) == 2:
+            batch[1] = self.allgather_batch(batch[1])


Under what circumstances do we have len(batch) ==2?

ssmmnn11 and others added 30 commits February 7, 2025 13:14

debug settings

b7f4f87

add qk normalisation

bee506a

qk normalisation graphtransformer

58b4062

mapper init changes

1fc272d

for kcrps training

68adf1e

interp data passed through now

1dfd0ef

config files for ens testing

9f1e740

towards kcrps training

d76c948

seeding and other fixes

42e7b39

todo

7f1c3e6

towards debugging

bea435b

debug changes

acefa6b

further fixes

921496e

fix sharding

ddb9e28

ens shardig improvement

4c8c0f3

Merge remote-tracking branch 'refs/remotes/origin/main' into kcrps

b4798f1

fix

84d18ef

refactor

ac89640

rename interp_data

d66b060

qk normalisation

bacd970

revise input and output combination

3e03ce3

support for other layer flavours

65ea593

cleanup of strategy

475b7e1

enable truncation in single model

f374774

Make query and key normalization kernels configurable.

92894d5

Add field_truncation keyword to training config.

f194e09

Adding initialization tests for KCRPS and AlmostFairKCRPS loss.

509a2a7

Update processor tests.

9a3a2ec

Adapt existing tests of models.

573a473

Fix errors of pre-commit hooks.

4867a0f

This was referenced Mar 13, 2025

feat: GraphtransformerProcessor chunking #66

Open

feat(models,training): Shard everything #121

Open