anemoi-dataloader-microbenchmark

Small code which extracts just the dataloader logic from anemoi training, and adds wrapper code to benchmark it

Running the benchmark

Currently you need to have graphs in the format

graphs/$RES.graph
# e.g. graphs/n320.graph

python main.py

there are a few different benchmarking tests defined in main.py ('single worker, carying rollout', 'increasing workers') Currently the benchmark only supports a single driver process (equivalent to running on a single node with 1 GPU) You can run the code over multiple processes/nodes by launching with mpirun. This will change the IO behaviour, as currently all processes are added to the same read group and the batch is split across processes.

srun --mem=0 --qos=np -N 1 --ntasks-per-node=4 --cpus-per-task=32 --time=2:00:00 --pty bash
mpirun -np 4 python main.py

By default, the memory monitor will run in the background for each benchmark, and plot will be produced under 'out/'.

Example output

Creating Dataloader (batch size: 1, num_workers: 2, prefetch_factor 1, pin_mem=True, seed=1742057691)...
Dataloader created.
Proc 0: Starting 8 loads with r=1, bs=1, pf=1, nw=2, pm=True
Iteration 0
p0: 3.7GB batch loaded in 26.68s (0.00s in barrier). 
p0: dataloader throughput: 143.0MB/s
Iteration 1
p0: 3.7GB batch loaded in 0.00s (0.00s in barrier). 
Iteration 2
p0: 3.7GB batch loaded in 29.03s (0.00s in barrier). 
p0: dataloader throughput: 131.4MB/s
Iteration 3
p0: 3.7GB batch loaded in 8.32s (8.07s in barrier). 
p0: dataloader throughput: 458.6MB/s
Iteration 4
p0: 3.7GB batch loaded in 27.35s (4.75s in barrier). 
p0: dataloader throughput: 139.5MB/s
Iteration 5
p0: 3.7GB batch loaded in 9.60s (9.60s in barrier). 
p0: dataloader throughput: 397.5MB/s
Iteration 6
p0: 3.7GB batch loaded in 27.20s (11.18s in barrier). 
p0: dataloader throughput: 140.2MB/s
Iteration 7
p0: 3.7GB batch loaded in 9.55s (9.55s in barrier). 
p0: dataloader throughput: 399.4MB/s
Av time: 17.21s, Total time for 8 runs: 137.72s
per-worker BW: 110.8MB/s,  per-process BW: 221.6MB/s, global BW: 443.1MB/s
Compute throughput must be >= 0.058it/s (221.6MB/s / 3.7GB) to avoid starvation
Est. latency to load the initial batch: 34.43s

Errors

If you encounter this error:

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "No permission" (-17) instead of "Success" (0)

you can fix it with

export PMIX_MCA_gds=hash

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
darshan		darshan
slurm		slurm
.gitignore		.gitignore
README.md		README.md
main.py		main.py
memory_monitor.py		memory_monitor.py
misc.py		misc.py
plot.py		plot.py
requirements.txt		requirements.txt
setup_anemoi.py		setup_anemoi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

anemoi-dataloader-microbenchmark

Running the benchmark

Example output

Errors

About

Uh oh!

Releases

Packages

Languages

cbovalo/anemoi-dataloader-microbenchmark

Folders and files

Latest commit

History

Repository files navigation

anemoi-dataloader-microbenchmark

Running the benchmark

Example output

Errors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages