|
| 1 | +--- |
| 2 | +title: "Using Host libraries: GPU drivers and OpenMPI BTLs" |
| 3 | +category: recipes |
| 4 | +permalink: tutorial-gpu-drivers-open-mpi-mtls |
| 5 | +--- |
| 6 | + |
| 7 | +Singularity does a fantastic job of isolating you from the host so you don't |
| 8 | +have to muck about with `LD_LIBRARY_PATH`, you just get exactly the library |
| 9 | +versions you want. However, in some situations you need to use library |
| 10 | +versions that match host exactly. Two common ones are NVIDIA gpu |
| 11 | +driver user-space libraries, and OpenMPI transport drivers for high performance |
| 12 | +networking. There are many ways to solve these problems. Some people build a container and |
| 13 | +copy the version of the libs (installed on the host) into the container. |
| 14 | + |
| 15 | +{% include toc.html %} |
| 16 | + |
| 17 | +## What We will learn today |
| 18 | +This document describes how to use a bind mount, symlinks and ldconfig so that when the host |
| 19 | +libraries are updated the container does not need to be rebuilt. |
| 20 | + |
| 21 | +**Note** this tutorial is tested with Singularity <a href="https://github.com/singularityware/singularity/commit/945c6ee343a1e6101e22396a90dfdb5944f442b6" target="_blank">commit 945c6ee343a1e6101e22396a90dfdb5944f442b6</a>, |
| 22 | + which is part of the (current) development branch, and thus it should work with version 2.3 |
| 23 | +when that is released. The version of OpenMPI used is 2.1.0 (versions above 2.1 should work). |
| 24 | + |
| 25 | +## Environment |
| 26 | + |
| 27 | +In our environment we run CentOS 7 hosts with: |
| 28 | + |
| 29 | + 1. slurm located on `/opt/slurm-<version>` and the slurm user `slurm` |
| 30 | + 2. Mellanox network cards with drivers installed to `/opt/mellanox` ( |
| 31 | + Specifically we run a RoCEv1 network for Lustre and MPI communications) |
| 32 | + 3. NVIDIA GPUs with drivers installed to `/lib64` |
| 33 | + 4. OpenMPI (by default) for MPI processes |
| 34 | + |
| 35 | +## Creating your image |
| 36 | +Since we are building an ubuntu image, it may be easier to create an ubuntu VM |
| 37 | +to create the image. Alternatively you can follow the recipe |
| 38 | +<a href="/building-ubuntu-rhel-host" target="_blank"> here</a>. |
| 39 | + |
| 40 | +Use the following def file to create the image. |
| 41 | + |
| 42 | +{% include gist.html username='l1ll1' id='89b3f067d5b790ace6e6767be5ea2851' file='hostlibs.def' %} |
| 43 | + |
| 44 | +The mysterious `wget` line gets a list of all the libraries that the CentOS host |
| 45 | +has in `/lib64` that *we* think its safe to use in the container. Specifically |
| 46 | +these are things like nvidia drivers. |
| 47 | + |
| 48 | +{% include gist.html username='l1ll1' id='89b3f067d5b790ace6e6767be5ea2851' file='desired_hostlibs.txt' %} |
| 49 | + |
| 50 | +Also note: |
| 51 | + |
| 52 | +1. in `hostlibs.def` we create a slurm user. Obviously if your `SlurmUser` is different you should change this name. |
| 53 | +2. We make directories for `/opt` and `/usr/local/openmpi`. We're going to bindmount these from the host so we get all the bits of OpenMPI and Mellanox and Slurm that we need. |
| 54 | + |
| 55 | + |
| 56 | +## Executing your image |
| 57 | +On our system we do: |
| 58 | + |
| 59 | +``` |
| 60 | +SINGULARITYENV_LD_LIBRARY_PATH=/usr/local/openmpi/2.1.0-gcc4/lib:/opt/munge-0.5.11/lib:/opt/slurm-16.05.4/lib:/opt/slurm-16.05.4/lib/slurm:/desired_hostlibs:/opt/mellanox/mxm/lib/ |
| 61 | +export SINGULARITYENV_LD_LIBRARY_PATH |
| 62 | +``` |
| 63 | + |
| 64 | +then |
| 65 | + |
| 66 | +``` |
| 67 | +srun singularity exec -B /usr/local/openmpi:/usr/local/openmpi -B /opt:/opt -B /lib64:/all_hostlibs hostlibs.img <path to binary> |
| 68 | +``` |
0 commit comments