|
| 1 | +--- |
| 2 | +id: use-gpus |
| 3 | +title: Use GPUs on Nebari |
| 4 | +description: Overview of using GPUs on Nebari including server setup, environment setup, and validation. |
| 5 | +--- |
| 6 | + |
| 7 | +# Using GPUs on Nebari |
| 8 | +## Introduction |
| 9 | +Overview of using GPUs on Nebari including server setup, environment setup, and validation. |
| 10 | + |
| 11 | +## 1. Starting a GPU server |
| 12 | + |
| 13 | + Follow Steps 1 to 3 in the [Authenticate and launch JupyterLab][login-with-keycloak] tutorial. The UI will show a list of profiles (a.k.a, instances, servers, or machines). |
| 14 | + |
| 15 | +  |
| 16 | + Your administrator pre-configures these options, as described in [Profile Configuration documentation][profile-configuration]. |
| 17 | + |
| 18 | + Select an appropriate GPU Instance and click "Start". |
| 19 | + |
| 20 | + ### Understanding GPU setup on the server. |
| 21 | + The following steps describe how to get CUDA-related information from the server. |
| 22 | + 1. Once your server starts, it will redirect you to a JupyterLab home page. |
| 23 | + 2. Click on the **"Terminal"** icon. |
| 24 | + 3. Run the command `nvidia-smi`. The top right corner of the command's output should have the highest supported driver. |
| 25 | +  |
| 26 | + |
| 27 | + If you get the error `nvidia-smi: command not found`, you are most likely on a non-GPU server. Shutdown your server, and start up a GPU-enabled server. |
| 28 | + |
| 29 | + **Compatible environments for this server must contain CUDA versions *below* the GPU server version. For example, the server in this case is on 12.4. All environments used on this server must contain packages build with CUDA<=12.4.** |
| 30 | + |
| 31 | +## 2. Creating environments |
| 32 | + |
| 33 | + By default, `conda-store` will build CPU-compatible packages. To build GPU-compatible packages, we do the following. |
| 34 | + ### Build a GPU-compatible environment |
| 35 | + By default, `conda-store` will build CPU-compatible packages. To build GPU-compatible packages, we have two options: |
| 36 | + 1. **Create the environment specification using `CONDA_OVERRIDE_CUDA` (recommended approach)**: |
| 37 | + |
| 38 | + Conda-store provides an alternate mechanism to enable GPU environments via the setting of an environment variable as explained in the [conda-store docs](https://conda.store/conda-store-ui/tutorials/create-envs#set-environment-variables). |
| 39 | + While creating a new config, click on the **GUI <-> YAML** Toggle to edit yaml config. |
| 40 | + ``` |
| 41 | + channels: |
| 42 | + - pytorch |
| 43 | + - conda-forge |
| 44 | + dependencies: |
| 45 | + - pytorch |
| 46 | + - ipykernel |
| 47 | + variables: |
| 48 | + CONDA_OVERRIDE_CUDA: "12.1" |
| 49 | + ``` |
| 50 | + Alternatively, you can configure the same config using the UI. |
| 51 | + |
| 52 | + Add the `CONDA_OVERRIDE_CUDA` override to the variables section to tell conda-store to build a GPU-compatible environment. |
| 53 | + |
| 54 | +:::note |
| 55 | +At the time of writing this document, the latest CUDA version was showing as `12.1`. Please follow the steps below to determine the latest override value for the `CONDA_OVERRIDE_CUDA` environment variable. |
| 56 | + |
| 57 | +Please ensure that your choice from PyTorch documentation is not greater than the highest supported version in the `nvidia-smi` output (captured above). |
| 58 | +::: |
| 59 | + |
| 60 | + 2. **Create the environment specification based on recommendations from the PyTorch documentation**: |
| 61 | + You can check [PyTorch documentation](https://pytorch.org/get-started/locally/) to get a quick list of the necessary CUDA-specific packages. |
| 62 | + Select the following options to get the latest CUDA version: |
| 63 | + - PyTorch Build = Stable |
| 64 | + - Your OS = Linux |
| 65 | + - Package = Conda |
| 66 | + - Language = Python |
| 67 | + - Compute Platform = 12.1 (Select the version that is less than or equal to the `nvidia-smi` output (see above) on your server) |
| 68 | + |
| 69 | +  |
| 70 | + |
| 71 | + The command `conda install` from above is: |
| 72 | + ``` |
| 73 | + conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia |
| 74 | + ``` |
| 75 | + The corresponding yaml config would be: |
| 76 | + ``` |
| 77 | + channels: |
| 78 | + - pytorch |
| 79 | + - nvidia |
| 80 | + - conda-forge |
| 81 | + dependencies: |
| 82 | + - pytorch |
| 83 | + - pytorch-cuda==12.1 |
| 84 | + - torchvision |
| 85 | + - torchaudio |
| 86 | + - ipykernel |
| 87 | + variables: {} |
| 88 | + ``` |
| 89 | + :::note |
| 90 | + The order of the channels is respected by conda, so keep pytorch at the top, then nvidia, then conda-forge. |
| 91 | + |
| 92 | + You can use **GUI <-> YAML** Toggle to edit the config. |
| 93 | + |
| 94 | + |
| 95 | +## 3. Validating the setup |
| 96 | + You can check that your GPU server is compatible with your conda environment by opening a Jupyter Notebook, loading the environment, and running the following code: |
| 97 | + ``` |
| 98 | + import torch |
| 99 | + print(f"GPU available: {torch.cuda.is_available()}") |
| 100 | + print(f"Number of GPUs available: {torch.cuda.device_count()}") |
| 101 | + print(f"ID of current GPU: {torch.cuda.current_device()}") |
| 102 | + print(f"Name of first GPU: {torch.cuda.get_device_name(0)}") |
| 103 | + ``` |
| 104 | + Your output should look something like this: |
| 105 | +
|
| 106 | +  |
| 107 | +
|
| 108 | +<!-- Internal links --> |
| 109 | +[profile-configuration]: /docs/explanations/profile-configuration |
| 110 | +[login-with-keycloak]: /docs/tutorials/login-keycloak |
0 commit comments