Skip to content

Commit 7086752

Browse files
authored
Use Pytorch 2.6.0 for lesson 9.
1 parent 17d6fba commit 7086752

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

09_Extreme_scale_AI/README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
These examples are based on the ROCm container provided to you at:
44
```
5-
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1.sif
5+
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif
66
```
77

88
The examples also assume there is an allocation in place to be used for one or more nodes. That could be accomplished with, e.g.:
@@ -13,7 +13,7 @@ The examples also assume there is an allocation in place to be used for one or m
1313
With the allocation and container set we can do a quick smoke test to make sure Pytorch can detect the GPUs available in a node:
1414
```
1515
srun singularity exec \
16-
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1.sif \
16+
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif \
1717
bash -c '$WITH_CONDA ; \
1818
python -c "import torch; print(torch.cuda.device_count())"'
1919
```
@@ -110,7 +110,7 @@ srun -N1 -n8 --gpus 8 \
110110
--cpu-bind=mask_cpu=0x00fe000000000000,0xfe00000000000000,0x0000000000fe0000,0x00000000fe000000,0x00000000000000fe,0x000000000000fe00,0x000000fe00000000,0x0000fe0000000000\
111111
singularity exec \
112112
-B .:/workdir \
113-
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1.sif \
113+
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif \
114114
/workdir/run.sh \
115115
python -u /workdir/GPT-neo-IMDB-finetuning-mp.py \
116116
--model-name gpt-imdb-model \
@@ -128,7 +128,7 @@ srun -N2 -n16 --gpus 16 \
128128
-B /opt/cray \
129129
-B /usr/lib64/libcxi.so.1 \
130130
-B .:/workdir \
131-
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1.sif\
131+
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif\
132132
/workdir/run.sh \
133133
python -u /workdir/GPT-neo-IMDB-finetuning-mp.py \
134134
--model-name gpt-imdb-model \
@@ -162,7 +162,7 @@ srun -N2 -n16 --gpus 16 \
162162
-B /opt/cray \
163163
-B /usr/lib64/libcxi.so.1 \
164164
-B .:/workdir \
165-
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1.sif \
165+
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif \
166166
/workdir/run-profile.sh \
167167
python -u /workdir/GPT-neo-IMDB-finetuning-mp.py \
168168
--model-name gpt-imdb-model \
@@ -225,7 +225,7 @@ srun -N $N -n $((N*8)) --gpus $((N*8)) \
225225
-B /usr/lib64/libcxi.so.1 \
226226
-B .:/workdir \
227227
-B /flash -B /pfs \
228-
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1.sif \
228+
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif \
229229
/workdir/run.sh \
230230
python -u /workdir/cv_example.py \
231231
-a resnet50 \
@@ -280,7 +280,7 @@ https://github.com/microsoft/DeepSpeedExamples/raw/master/training/imagenet/conf
280280
Parse the files to create some understanding of the differences.
281281

282282
### 2. Running DeepSpeed with required dependencies
283-
This container has DeepSpeed already installed so we will leverage it: `/appl/local/containers/sif-images/lumi-pytorch-rocm-6.1.3-python-3.12-pytorch-v2.4.1.sif`.
283+
This container has DeepSpeed already installed so we will leverage it: `/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif`.
284284

285285
You can run the example like the following, however some dependencies might be missing. Can you install those? Can you setup the `spawn` multiprocessing mode?
286286
```
@@ -294,7 +294,7 @@ srun -N $N -n $((N*8)) --gpus $((N*8)) \
294294
-B /usr/lib64/libcxi.so.1 \
295295
-B .:/workdir \
296296
-B /flash -B /pfs \
297-
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1.sif \
297+
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif \
298298
/workdir/run.sh \
299299
python -u /workdir/cv_example_ds.py \
300300
--deepspeed \
@@ -336,7 +336,7 @@ srun -N $N -n $((N*8)) --gpus $((N*8)) \
336336
-B /usr/lib64/libcxi.so.1 \
337337
-B .:/workdir \
338338
-B /flash -B /pfs \
339-
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.7.1.sif \
339+
/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif \
340340
/workdir/run.sh \
341341
python -u /workdir/cv_example.py \
342342
-a resnet50 \

0 commit comments

Comments
 (0)