22
33These examples are based on the ROCm container provided to you at:
44```
5- /appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1 .sif
5+ /appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0 .sif
66```
77
88To avoid running into any storage issues, we recomment running the examples from a folder you create in the scratch file system, e.g.:
@@ -31,7 +31,7 @@ The difference is that it gives you a mechanism to just allocate the nodes witho
3131With the allocation and container set we can do a quick smoke test to make sure Pytorch can detect the GPUs available in a node:
3232```
3333srun singularity exec \
34- /appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1 .sif \
34+ /appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0 .sif \
3535 bash -c '$WITH_CONDA ; \
3636 python -c "import torch; print(torch.cuda.device_count())"'
3737```
@@ -58,7 +58,7 @@ mkdir -p torch-cache hf-cache
5858
5959srun -n1 singularity exec \
6060 -B .:/workdir \
61- /appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1 .sif\
61+ /appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0 .sif\
6262 bash -c '$WITH_CONDA ; cd /workdir ; \
6363 HIP_VISIBLE_DEVICES=0 \
6464 TORCH_HOME=/workdir/torch-cache \
@@ -86,7 +86,7 @@ squeue --me
8686```
8787* Start interactive parallel session:
8888```
89- srun --jobid 7100665 --interactive --pty /bin/bash
89+ srun --jobid 7100665 --overlap --pty /bin/bash
9090```
9191* Use ` rocm-smi ` to monitor GPU activity:
9292```
@@ -118,7 +118,7 @@ So, running the following:
118118```
119119srun -n1 singularity exec \
120120 -B .:/workdir \
121- /appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1 .sif\
121+ /appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0 .sif\
122122 bash -c '$WITH_CONDA ; cd /workdir ; \
123123 HIP_VISIBLE_DEVICES=0 \
124124 AMD_LOG_LEVEL=4 \
@@ -148,7 +148,7 @@ Another way to check for GPU activity is to use a profiler. There is a GPU profi
148148```
149149srun -n1 singularity exec \
150150 -B .:/workdir \
151- /appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1 .sif\
151+ /appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0 .sif\
152152 rocprof --help
153153```
154154Given that Pytorch uses the HIP runtime in its implementation, one of the most relevant options is ` --hip-trace ` to instruct the profiler to collect the HIP runtime activity. Another option that is convinient is ` --stats ` that generates some statistics on the usage of the GPU.
@@ -171,7 +171,7 @@ Now we can just run the profiler by preceding our original command with `rocprof
171171```
172172srun -n1 singularity exec \
173173 -B .:/workdir \
174- /appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1 .sif\
174+ /appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0 .sif\
175175 bash -c '$WITH_CONDA ; cd /workdir ; \
176176 HIP_VISIBLE_DEVICES=0 \
177177 TORCH_HOME=/workdir/torch-cache \
@@ -219,7 +219,7 @@ Run as before:
219219```
220220srun -n1 singularity exec \
221221 -B .:/workdir \
222- /appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1 .sif\
222+ /appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0 .sif\
223223 bash -c '$WITH_CONDA ; cd /workdir ; \
224224 HIP_VISIBLE_DEVICES=0 \
225225 TORCH_HOME=/workdir/torch-cache \
0 commit comments