Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: cpu memory savings of sharded dataloader #83

Merged
merged 1 commit into from
Jan 30, 2025

Conversation

japols
Copy link
Member

@japols japols commented Jan 22, 2025

This PR brings back the cpu memory savings from sharded dataloading (#76). At 9km, the max cpu memory usage goes down from ~70% to ~30%.

The sharded reading clashed with the recent changes to Support masking of unconnected nodes (LAM). Since anemoi-datasets currently doesn't support slicing and indexing in one operation like

x = self.data[start : end : self.timeincrement, :, :, grid_shard_indices]

the new version checks for the type of grid_shard_indices and does it in 2 steps if necessary.

@Rilwan-Adewoyin @JPXKQX @b8raoult @floriankrb

@japols japols self-assigned this Jan 22, 2025
Copy link
Member

@JPXKQX JPXKQX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LAM works.

Copy link
Member

@HCookie HCookie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go for the merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants