Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Kcrps fixes for Pydantic schemas #189

Open
wants to merge 21 commits into
base: kcrps
Choose a base branch
from

Conversation

anaprietonem
Copy link
Collaborator

Description

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Issue Number

Code Compatibility

  • I have performed a self-review of my code

Code Performance and Testing

  • I have added tests that prove my fix is effective or that my feature works
  • I ran the complete Pytest test suite locally, and they pass
  • I have tested the changes on a single GPU
  • I have tested the changes on multiple GPUs / multi-node setups
  • I have run the Benchmark Profiler against the old version of the code
  • If the new feature introduces modifications at the config level, I have made sure to update Pydantic Schemas and default configs accordingly

Dependencies

  • I have ensured that the code is still pip-installable after the changes and runs
  • I have tested that new dependencies themselves are pip-installable.
  • I have not introduced new dependencies in the inference portion of the pipeline

Documentation

  • My code follows the style guidelines of this project
  • I have updated the documentation and docstrings to reflect the changes
  • I have added comments to my code, particularly in hard-to-understand areas

Additional Notes

@anaprietonem anaprietonem changed the title fixes to default config and shemas fix: Kcrps fixes for Pydantic schemas Mar 14, 2025
@anaprietonem anaprietonem marked this pull request as ready for review March 20, 2025 08:52
@theissenhelen theissenhelen self-requested a review March 24, 2025 11:05
theissenhelen
theissenhelen previously approved these changes Mar 28, 2025
num_gpus_per_model: PositiveInt = Field(example=2)
"Number of GPUs per model."
# TODO(Ana): check why the read_group_size is not passed in the config
kwargs: dict[str, Any] = Field(default_factory=dict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as note, we'll have to keep in mind that the kwargs will not be validated. We might want to look at including them in the config validation at a later stage.

"Sanity check runs n batches of val before starting the training routine."
gradient_clip: GradientClip
"Config for gradient clipping."
forecaster: Any # TODO: Fix this
forecaster: Any # TODO(Simon): Fix this
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan to create a schema for the forecaster at some point?

@theissenhelen theissenhelen dismissed their stale review March 28, 2025 13:56

Test failing.

@@ -25,7 +27,7 @@ model:
grid_skip: 0 # Which of the input indices to use as residual connection, null if none.

training:
task: anemoi.training.train.interpolator.GraphInterpolator
model_class: anemoi.training.train.interpolator.GraphInterpolator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think model_task is clearer, would need changing across multiple configs.

@@ -58,7 +58,7 @@ processor:
mlp_hidden_ratio: 4 # GraphTransformer or Transformer only
num_heads: 16 # GraphTransformer or Transformer only
cpu_offload: ${model.cpu_offload}
qk_norm: False
qk_norm: True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the behaviour?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check this with Simon and Helen. Looking at the settings, the qv_norm is kept to true for the processor in both the transformer (ens and single) and graph_transformer (ens). So that's why I thought this could be an error and needed correction. I will check with them before merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Now In Progress
Development

Successfully merging this pull request may close these issues.

4 participants