Skip to content

Conversation

yperugachidiaz
Copy link

@yperugachidiaz yperugachidiaz commented Oct 13, 2025

Description

Clean up of engines.py file by implementing forward pass for classes:

  • EmbeddingEngine
  • LocalAssimilationEngine
  • Local2GlobalAssimilationEngine
  • GlobalAssimilationEngine
  • ForecastingEngine

Plus cleanup embeddings.py by implementing forward pass for class:

  • StreamEmbedTransformer

Fixed sharding when modules are called in trainer.py.

Work in progress, issue should not be closed yet.
Upcoming: new issue .... to fix the now broken checkpoints (new modules change the structure).

Issue Number

Closes #1037

  • cleanup engines

Is this PR a draft? Mark it as draft.

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

@clessig
Copy link
Collaborator

clessig commented Oct 13, 2025

I understand the issue with the other engines that are in engines.py and that do not have a forward function at all. But here there was a forward function. Was there an issue with it?

@yperugachidiaz yperugachidiaz marked this pull request as draft October 13, 2025 13:27
@yperugachidiaz
Copy link
Author

I understand the issue with the other engines that are in engines.py and that do not have a forward function at all. But here there was a forward function. Was there an issue with it?

Before starting on the engines I wanted to start with something smaller. Therefore, in embeddings, I made the forward definition naming explicit instead of setting it via an attribute. This makes it easier to find.

@clessig clessig self-requested a review October 14, 2025 15:26
@clessig
Copy link
Collaborator

clessig commented Oct 16, 2025

@yperugachidiaz : For the review, it will be important to have some before-PR/after-PR loss curves. We had some issues before with gradient checkpointing where the code was formally correct (as far as we could tell) but still behaved differently under training.

Copy link
Contributor

@sophie-xhonneux sophie-xhonneux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I like the changes. I am happy to approve this after we check that convergence is unchanged because as Christian mentioned we have had some surprising differences after functionally equivalent refactors!


return out.to(torch.float16)

def forward(self, x_in, centroids):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice clean up

@clessig
Copy link
Collaborator

clessig commented Oct 17, 2025

I think the PR has an incorrect name now: it should be cleanup engines and not cleanup embeddings

Copy link
Collaborator

@clessig clessig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the cleanup. Looks good but could we add a docstring to all forward functions. This will help us a lot going forward. I am happy to help with filling in some details but maybe you can start with it. Please also make sure to use our conventions (some of the docstring in engines.py use a different convention).

@clessig clessig changed the title [1037] cleanup embeddings file [1037] Cleanup engines (torch.nn.modules and fowards functions) Oct 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Clean up of engines

3 participants