[1037] Cleanup engines (torch.nn.modules and fowards functions) #1080

yperugachidiaz · 2025-10-13T12:40:41Z

Description

Clean up of engines.py file by implementing forward pass for classes:

EmbeddingEngine
LocalAssimilationEngine
Local2GlobalAssimilationEngine
GlobalAssimilationEngine
ForecastingEngine

Plus cleanup embeddings.py by implementing forward pass for class:

StreamEmbedTransformer

Fixed sharding when modules are called in trainer.py.

Work in progress, issue should not be closed yet.
Upcoming: new issue .... to fix the now broken checkpoints (new modules change the structure).

Issue Number

Closes #1037

cleanup engines

Is this PR a draft? Mark it as draft.

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

clessig · 2025-10-13T13:03:46Z

I understand the issue with the other engines that are in engines.py and that do not have a forward function at all. But here there was a forward function. Was there an issue with it?

yperugachidiaz · 2025-10-13T17:49:47Z

I understand the issue with the other engines that are in engines.py and that do not have a forward function at all. But here there was a forward function. Was there an issue with it?

Before starting on the engines I wanted to start with something smaller. Therefore, in embeddings, I made the forward definition naming explicit instead of setting it via an attribute. This makes it easier to find.

…dingly

…el accordingly

…or model accordingly

…del accordingly

…ordingly

clessig · 2025-10-16T08:53:31Z

@yperugachidiaz : For the review, it will be important to have some before-PR/after-PR loss curves. We had some issues before with gradient checkpointing where the code was formally correct (as far as we could tell) but still behaved differently under training.

sophie-xhonneux

This looks great! I like the changes. I am happy to approve this after we check that convergence is unchanged because as Christian mentioned we have had some surprising differences after functionally equivalent refactors!

sophie-xhonneux · 2025-10-16T09:18:16Z

src/weathergen/model/embeddings.py


        return out.to(torch.float16)

+    def forward(self, x_in, centroids):


nice clean up

clessig · 2025-10-17T14:36:12Z

I think the PR has an incorrect name now: it should be cleanup engines and not cleanup embeddings

clessig

Many thanks for the cleanup. Looks good but could we add a docstring to all forward functions. This will help us a lot going forward. I am happy to help with filling in some details but maybe you can start with it. Please also make sure to use our conventions (some of the docstring in engines.py use a different convention).

Implement forward method for StreamEmbedTransformer

f8f0897

github-project-automation bot added this to WeatherGen-dev Oct 13, 2025

Merge branch 'develop' into ypd/dev/1037_cleanup_engines

4792e93

yperugachidiaz marked this pull request as draft October 13, 2025 13:27

perugachidiaz1 added 3 commits October 14, 2025 15:14

Implemented forward pass for EmbeddingEngine and refactor model accor…

bb11053

…dingly

removed line that shouldn't be there

a63a179

Minor fixes

3cb72d4

clessig self-requested a review October 14, 2025 15:26

perugachidiaz1 and others added 7 commits October 16, 2025 08:32

First step check create function and put everything in init

c93dda3

Implemented forward pass for LocalAssimilationEngine and refactor mod…

6a2a9ac

…el accordingly

Cleanup and minor fixes

eacf4f7

Implemented forward pass for Local2GlobalAssimiationEngine and refact…

0a1a5ac

…or model accordingly

Cleanup model

297f8cf

Implemented forward pass for GlobalAssimilationEngine and refactor mo…

869be8f

…del accordingly

Implemented forward pass for ForecastingEngine and refactor model acc…

1d273dd

…ordingly

sophie-xhonneux reviewed Oct 16, 2025

View reviewed changes

src/weathergen/model/embeddings.py

return out.to(torch.float16)

def forward(self, x_in, centroids):

Copy link

Contributor

sophie-xhonneux Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice clean up

perugachidiaz1 and others added 2 commits October 16, 2025 13:30

Merge branch 'develop' into ypd/dev/1037_cleanup_engines

8868a46

Fixed sharding when module is called

7234b4b

clessig reviewed Oct 19, 2025

View reviewed changes

clessig changed the title ~~[1037] cleanup embeddings file~~ [1037] Cleanup engines (torch.nn.modules and fowards functions) Oct 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[1037] Cleanup engines (torch.nn.modules and fowards functions) #1080

[1037] Cleanup engines (torch.nn.modules and fowards functions) #1080

yperugachidiaz commented Oct 13, 2025 •

edited

Loading

Uh oh!

clessig commented Oct 13, 2025

Uh oh!

yperugachidiaz commented Oct 13, 2025

Uh oh!

clessig commented Oct 16, 2025

Uh oh!

sophie-xhonneux left a comment

Uh oh!

sophie-xhonneux Oct 16, 2025

Uh oh!

clessig commented Oct 17, 2025

Uh oh!

clessig left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		return out.to(torch.float16)

		def forward(self, x_in, centroids):

[1037] Cleanup engines (torch.nn.modules and fowards functions) #1080

Are you sure you want to change the base?

[1037] Cleanup engines (torch.nn.modules and fowards functions) #1080

Conversation

yperugachidiaz commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue Number

Checklist before asking for review

Uh oh!

clessig commented Oct 13, 2025

Uh oh!

yperugachidiaz commented Oct 13, 2025

Uh oh!

clessig commented Oct 16, 2025

Uh oh!

sophie-xhonneux left a comment

Choose a reason for hiding this comment

Uh oh!

sophie-xhonneux Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

clessig commented Oct 17, 2025

Uh oh!

clessig left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yperugachidiaz commented Oct 13, 2025 •

edited

Loading