Skip to content

Commit 5c2de47

Browse files
authored
Add instructions for Scylla (#733)
Scylla is in a preliminary state and so needs some specific instructions to help install SmartSim with CUDA support. The directions included here are preliminary and will be updated as needed. [ committed by @ashao ] [ reviewed by @MattToast @amandarichardsonn ]
1 parent 342a67f commit 5c2de47

File tree

3 files changed

+101
-0
lines changed

3 files changed

+101
-0
lines changed

doc/changelog.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,21 @@ Jump to:
99

1010
## SmartSim
1111

12+
### Develop
13+
14+
To be released at some point in the future
15+
16+
Description
17+
18+
- Add instructions for installing SmartSim on PML's Scylla
19+
20+
Detailed Notes
21+
- PML's Scylla is still under development. The usual SmartSim
22+
build instructions do not apply because the GPU dependencies
23+
have yet to be installed at a system-wide level. Scylla has
24+
its own entry in the documentation.
25+
([SmartSim-PR733](https://github.com/CrayLabs/SmartSim/pull/733))
26+
1227
### 0.8.0
1328

1429
Released on 27 September, 2024

doc/installation_instructions/platform.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ that SmartSim may be used on.
2020

2121
.. include:: platform/olcf-summit.rst
2222

23+
.. include:: platform/pml-scylla.rst
24+
2325
.. _site_installation:
2426

2527
.. include:: site-install.rst
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
PML Scylla
2+
==========
3+
4+
.. warning::
5+
As of September 2024, the software stack on Scylla is still being finalized.
6+
Therefore, please consider these instructions as preliminary for now.
7+
8+
One-time Setup
9+
--------------
10+
11+
To install SmartSim on Scylla, follow these steps:
12+
13+
**Step 1:** Create and activate a Python virtual environment for SmartSim:
14+
15+
.. code:: bash
16+
17+
module use module use /scyllapfs/hpe/ashao/smartsim_dependencies/modulefiles
18+
module load cudatoolkit cudnn git
19+
python -m venv /scyllafps/scratch/$USER/venvs/smartsim
20+
source /scyllafps/scratch/$USER/venvs/smartsim/bin/activate
21+
22+
**Step 2:** Build the SmartRedis C++ and Fortran libraries:
23+
24+
.. code:: bash
25+
26+
git clone https://github.com/CrayLabs/SmartRedis.git
27+
cd SmartRedis
28+
make lib-with-fortran
29+
pip install .
30+
cd ..
31+
32+
**Step 3:** Install SmartSim in the conda environment:
33+
34+
.. code:: bash
35+
36+
pip install git+https://github.com/CrayLabs/SmartSim.git
37+
38+
**Step 4:** Build Redis, RedisAI, the backends, and all the Python packages:
39+
40+
.. code:: bash
41+
42+
export TORCH_CUDA_ARCH_LIST="8.0 8.6 8.9 9.0" # Workaround for a PyTorch problem
43+
smart build --device=cuda-12
44+
module unload cudnn # Workaround for a PyTorch problem
45+
46+
47+
.. note::
48+
The first workaround is needed because for some reason the autodetection
49+
of CUDA architectures is not consistent internally with one of PyTorch's
50+
dependencies. This seems to be unique to this machine as we do not see
51+
this on other platforms.
52+
53+
The second workaround is needed because PyTorch 2.3 (and possibly 2.2)
54+
will attempt to load the version of cuDNN that is in the LD_LIBRARY_PATH
55+
instead of the version shipped with PyTorch itself. This results in
56+
unfound symbols.
57+
58+
**Step 5:** Check that SmartSim has been installed and built correctly:
59+
60+
.. code:: bash
61+
62+
srun -n 1 -p gpu --gpus=1 --pty smart validate --device gpu
63+
64+
The following output indicates a successful install:
65+
66+
.. code:: bash
67+
68+
[SmartSim] INFO Verifying Tensor Transfer
69+
[SmartSim] INFO Verifying Torch Backend
70+
[SmartSim] INFO Verifying ONNX Backend
71+
[SmartSim] INFO Verifying TensorFlow Backend
72+
16:26:35 login SmartSim[557020:MainThread] INFO Success!
73+
74+
Post-installation
75+
-----------------
76+
77+
After completing the above steps to install SmartSim in a conda environment, you
78+
can reload the conda environment by running the following commands:
79+
80+
.. code:: bash
81+
82+
module load cudatoolkit/12.4.1 git # cudnn should NOT be loaded
83+
source /scyllafps/scratch/$USER/venvs/smartsim/bin/activate
84+

0 commit comments

Comments
 (0)