Skip to content

Commit 17d6fba

Browse files
authored
Use pytorch 2.6.0 for lesson 4.
1 parent ea1da16 commit 17d6fba

File tree

1 file changed

+8
-8
lines changed
  • 04_Understanding_GPU_activity_and_checking_jobs

1 file changed

+8
-8
lines changed

04_Understanding_GPU_activity_and_checking_jobs/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
These examples are based on the ROCm container provided to you at:
44
```
5-
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1.sif
5+
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif
66
```
77

88
To avoid running into any storage issues, we recomment running the examples from a folder you create in the scratch file system, e.g.:
@@ -31,7 +31,7 @@ The difference is that it gives you a mechanism to just allocate the nodes witho
3131
With the allocation and container set we can do a quick smoke test to make sure Pytorch can detect the GPUs available in a node:
3232
```
3333
srun singularity exec \
34-
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1.sif \
34+
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif \
3535
bash -c '$WITH_CONDA ; \
3636
python -c "import torch; print(torch.cuda.device_count())"'
3737
```
@@ -58,7 +58,7 @@ mkdir -p torch-cache hf-cache
5858
5959
srun -n1 singularity exec \
6060
-B .:/workdir \
61-
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1.sif\
61+
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif\
6262
bash -c '$WITH_CONDA ; cd /workdir ; \
6363
HIP_VISIBLE_DEVICES=0 \
6464
TORCH_HOME=/workdir/torch-cache \
@@ -86,7 +86,7 @@ squeue --me
8686
```
8787
* Start interactive parallel session:
8888
```
89-
srun --jobid 7100665 --interactive --pty /bin/bash
89+
srun --jobid 7100665 --overlap --pty /bin/bash
9090
```
9191
* Use `rocm-smi` to monitor GPU activity:
9292
```
@@ -118,7 +118,7 @@ So, running the following:
118118
```
119119
srun -n1 singularity exec \
120120
-B .:/workdir \
121-
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1.sif\
121+
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif\
122122
bash -c '$WITH_CONDA ; cd /workdir ; \
123123
HIP_VISIBLE_DEVICES=0 \
124124
AMD_LOG_LEVEL=4 \
@@ -148,7 +148,7 @@ Another way to check for GPU activity is to use a profiler. There is a GPU profi
148148
```
149149
srun -n1 singularity exec \
150150
-B .:/workdir \
151-
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1.sif\
151+
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif\
152152
rocprof --help
153153
```
154154
Given that Pytorch uses the HIP runtime in its implementation, one of the most relevant options is `--hip-trace` to instruct the profiler to collect the HIP runtime activity. Another option that is convinient is `--stats` that generates some statistics on the usage of the GPU.
@@ -171,7 +171,7 @@ Now we can just run the profiler by preceding our original command with `rocprof
171171
```
172172
srun -n1 singularity exec \
173173
-B .:/workdir \
174-
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1.sif\
174+
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif\
175175
bash -c '$WITH_CONDA ; cd /workdir ; \
176176
HIP_VISIBLE_DEVICES=0 \
177177
TORCH_HOME=/workdir/torch-cache \
@@ -219,7 +219,7 @@ Run as before:
219219
```
220220
srun -n1 singularity exec \
221221
-B .:/workdir \
222-
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1.sif\
222+
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif\
223223
bash -c '$WITH_CONDA ; cd /workdir ; \
224224
HIP_VISIBLE_DEVICES=0 \
225225
TORCH_HOME=/workdir/torch-cache \

0 commit comments

Comments
 (0)