Function for spatiotemporal train/validation/test split from a large xarray dataset #10841

CoenvdE · 2025-10-10T09:56:46Z

CoenvdE
Oct 10, 2025

Hi everyone,

I’d like to start a discussion about adding functionality to create static spatiotemporal train, validation, and test splits from a single large xarray dataset (e.g., global-scale data).

The goal is to generate these splits based on:
• The spatial and temporal size of each sample
• The stride between samples
• A list of user-defined validation and test regions (as static spatiotemporal holdouts)

The expected output would be, for each split, a list of dictionaries containing the coordinates (start and end) of each slice. This structure would make it straightforward to iterate over samples in a dataloader.

Keeping the validation and test regions static is important to ensure consistent model evaluation and comparability across experiments.

I might already be working on a prototype for this, but I wanted to open the discussion first to gather feedback on the design and identify possible integration points.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Function for spatiotemporal train/validation/test split from a large xarray dataset #10841

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Function for spatiotemporal train/validation/test split from a large xarray dataset #10841

Uh oh!

CoenvdE Oct 10, 2025

Replies: 0 comments

CoenvdE
Oct 10, 2025