|
| 1 | +PML Scylla |
| 2 | +========== |
| 3 | + |
| 4 | +.. warning:: |
| 5 | + As of September 2024, the software stack on Scylla is still being finalized. |
| 6 | + Therefore, please consider these instructions as preliminary for now. |
| 7 | + |
| 8 | +One-time Setup |
| 9 | +-------------- |
| 10 | + |
| 11 | +To install SmartSim on Scylla, follow these steps: |
| 12 | + |
| 13 | +**Step 1:** Create and activate a Python virtual environment for SmartSim: |
| 14 | + |
| 15 | +.. code:: bash |
| 16 | +
|
| 17 | + module use module use /scyllapfs/hpe/ashao/smartsim_dependencies/modulefiles |
| 18 | + module load cudatoolkit cudnn git |
| 19 | + python -m venv /scyllafps/scratch/$USER/venvs/smartsim |
| 20 | + source /scyllafps/scratch/$USER/venvs/smartsim/bin/activate |
| 21 | +
|
| 22 | +**Step 2:** Build the SmartRedis C++ and Fortran libraries: |
| 23 | + |
| 24 | +.. code:: bash |
| 25 | +
|
| 26 | + git clone https://github.com/CrayLabs/SmartRedis.git |
| 27 | + cd SmartRedis |
| 28 | + make lib-with-fortran |
| 29 | + pip install . |
| 30 | + cd .. |
| 31 | +
|
| 32 | +**Step 3:** Install SmartSim in the conda environment: |
| 33 | + |
| 34 | +.. code:: bash |
| 35 | +
|
| 36 | + pip install git+https://github.com/CrayLabs/SmartSim.git |
| 37 | +
|
| 38 | +**Step 4:** Build Redis, RedisAI, the backends, and all the Python packages: |
| 39 | + |
| 40 | +.. code:: bash |
| 41 | +
|
| 42 | + export TORCH_CUDA_ARCH_LIST="8.0 8.6 8.9 9.0" # Workaround for a PyTorch problem |
| 43 | + smart build --device=cuda-12 |
| 44 | + module unload cudnn # Workaround for a PyTorch problem |
| 45 | +
|
| 46 | +
|
| 47 | +.. note:: |
| 48 | + The first workaround is needed because for some reason the autodetection |
| 49 | + of CUDA architectures is not consistent internally with one of PyTorch's |
| 50 | + dependencies. This seems to be unique to this machine as we do not see |
| 51 | + this on other platforms. |
| 52 | + |
| 53 | + The second workaround is needed because PyTorch 2.3 (and possibly 2.2) |
| 54 | + will attempt to load the version of cuDNN that is in the LD_LIBRARY_PATH |
| 55 | + instead of the version shipped with PyTorch itself. This results in |
| 56 | + unfound symbols. |
| 57 | + |
| 58 | +**Step 5:** Check that SmartSim has been installed and built correctly: |
| 59 | + |
| 60 | +.. code:: bash |
| 61 | +
|
| 62 | + srun -n 1 -p gpu --gpus=1 --pty smart validate --device gpu |
| 63 | +
|
| 64 | +The following output indicates a successful install: |
| 65 | + |
| 66 | +.. code:: bash |
| 67 | +
|
| 68 | + [SmartSim] INFO Verifying Tensor Transfer |
| 69 | + [SmartSim] INFO Verifying Torch Backend |
| 70 | + [SmartSim] INFO Verifying ONNX Backend |
| 71 | + [SmartSim] INFO Verifying TensorFlow Backend |
| 72 | + 16:26:35 login SmartSim[557020:MainThread] INFO Success! |
| 73 | +
|
| 74 | +Post-installation |
| 75 | +----------------- |
| 76 | + |
| 77 | +After completing the above steps to install SmartSim in a conda environment, you |
| 78 | +can reload the conda environment by running the following commands: |
| 79 | + |
| 80 | +.. code:: bash |
| 81 | +
|
| 82 | + module load cudatoolkit/12.4.1 git # cudnn should NOT be loaded |
| 83 | + source /scyllafps/scratch/$USER/venvs/smartsim/bin/activate |
| 84 | +
|
0 commit comments