The purpose of this repository is to document best practices for running Julia on HPC systems (i.e., "supercomputers"). At the moment, both information relevant for supercomputer operators as well as users is collected here. There is no guarantee for permanence or that information here is up-to-date, neither for a useful ordering and/or categorization of issues.
According to this Discourse post, the difference between compiling Julia from source with architecture-specific optimization and using the official Julia binaries is negligible. This has been confirmed by Ludovic Räss for an Nvidia DGX-1 system at CSCS, where also no performance differences between a Spack-installed version and the official binaries were found (April 2022).
Since installing from source using, e.g., Spack, can sometimes be cumbersome, the general recommendation is to go with the pre-built binaries unless benchmarked and found to be different.
- This is also the current approach on NERSC's systems
Last update: April 2022
When using Julia on a system that uses an environment-variable based module
system (such as modules or
Lmod), the LD_LIBRARY_PATH
variable might
be filled with entries pointing to different packages and libraries. To avoid
issues from Julia loading another library instead of the ones packaged with
Julia, make sure that Julia's lib
directory is always the first directory in
LD_LIBRARY_PATH
.
One possibility to achieve this is to create a wrapper shell script that
modifies LD_LIBRARY_PATH
before calling the Julia executable. Inspired by a
script
from UCL's Owain Kenway:
#!/usr/bin/env bash
# This wrapper makes sure the julia binary distributions picks up the GCC
# libraries provided with it correctly meaning that it does not rely on
# the gcc-libs version.
# Dr Owain Kenway, 20th of July, 2021
# Source: https://github.com/UCL-RITS/rcps-buildscripts/blob/04b2e2ccfe7e195fd0396b572e9f8ff426b37f0e/files/julia/julia.sh
location=$(readlink -f $0)
directory=$(readlink -f $(dirname ${location})/..)
export LD_LIBRARY_PATH=${directory}/lib/julia:${LD_LIBRARY_PATH}
exec ${directory}/bin/julia "$@"
Note that using readlink
might not be optimal from a performance perspective
if used in a massively parallel environment. Alternatively, hard-code the Julia
path or set an environment variable accordingly.
Also note that fixing the LD_LIBRARY_PATH
variable does not seem to be a hard
requirement, since it is not used universally (e.g., it is not necessary on NERSC's systems).
Last update: April 2022
There is no clear consensus where the Julia depot folder (by default on
Unix-like systems: ~/.julia
) should be located. On some systems that have
good I/O connectivity, it resides in the user's home directory, e.g., at NERSC.
On other systems, e.g., at CSCS, it is put on a scratch file system. At the time
of writing (April 2022), there does not seem to be reliable performance data
available that could help to make a data-based decision.
If the depot path, which can be controlled by the
JULIA_DEPOT_PATH
variable, is located on a scratch/workspace file system with automatic deletion
of unused files, it must be ensured that there is a mechanism (either
operator-provided or documented and in userspace) to prevent the deletion of
files.
In case multiple platforms share a single home directory, it might
make sense to make the depot path platform dependend by setting the
JULIA_DEPOT_PATH
environment variable appropriately, e.g.,
prepend-path JULIA_DEPOT_PATH $env(HOME)/.julia/$platform
where $platform
contains the current system name
(source).
On the NERSC systems, there is a pre-built MPI.jl for each programming environment, which is loaded through a settings module. More information on the NERSC module file setup can be found here.
It seems to be generally advisable to set the environment variable
JULIA_CUDA_USE_BINARYBUILDER=false
in the module files when loading Julia on a system with GPUs. Otherwise, Julia will try to download its own BinaryBuilder.jl-provided CUDA stack, which is typically not what you want on a production HPC system. Instead, you should make sure that Julia finds the local CUDA installation by setting relevant environment variables (see also the CUDA.jl docs).
Johannes Blaschke provides scripts and
templates to set up modules file for Julia on some of NERSC's systems:
https://gitlab.blaschke.science/nersc/julia/-/tree/main/modulefiles
There are a number of environment variables that should be considered to be set through the module mechanism:
JULIA_DEPOT_PATH
: Ensure depot path is on the correct file systemJULIA_CUDA_USE_BINARYBUILDER
: Use system-provided CUDA stack
Samuel Omlin and colleagues from CSCS provide their Easybuild configuration files used for Piz Daint online at https://github.com/eth-cscs/production/tree/master/easybuild/easyconfigs/j/Julia. For example, there are configurations available for Julia 1.7.2 and for Julia 1.7.2 with CUDA support. Looking at these files also helps to decide which kind of environment variables are useful to set.
- There is a lengthy discussion on the Julia Discourse about how to set up a
centralized Julia installation. Some of it is already dated (probably), but
it gives a good overview of some best practices and about approaches that work
(and some which do not). In particular, the summary from CSCS is very helpful:
https://discourse.julialang.org/t/how-does-one-set-up-a-centralized-julia-installation/13922/32 - NERSC's Johannes Blaschke has a nice repository set up with lots
of scripts and helpful information on setting up Julia on Cori and
Perlmutter:
https://gitlab.blaschke.science/nersc/julia/-/tree/main
The following is an (incomplete) list of HPC systems that provide a Julia installation and/or support for using Julia to its users:
Center | System | Installation | Support | Interactive | Architecture | Accelerators | Documentation |
---|---|---|---|---|---|---|---|
CSCS | Piz Daint | yes | ? | yes | Intel Xeon Broadwell + Haswell | Nvidia Tesla P100 | 1 |
NERSC | Cori | yes | ? | ? | Intel Xeon Haswell | Intel Xeon Phi | 1 |
NERSC | Perlmutter | yes | yes | ? | AMD EPYC Milan | Nvidia Ampere A100 | 1, 2 |
PC², U Paderborn | Noctua 1 | yes | ? | yes | Intel Xeon Skylake | Intel Stratix 10 | 1 |
PC², U Paderborn | Noctua 2 | ? | ? | ? | AMD EPYC Milan | Nvidia Ampere A100, Xilinx Alveo U280 | 1 |
Nomenclature:
- Center: The HPC center's name
- System: The compute system's "marketing" name
- Installation: Is there a pre-installed Julia configuration available?
- Support: Is Julia officially supported on the system?
- Interactive: Is interactive computing with Julia supported?
- Architecture: The main CPU used in the system
- Accelerators: The main accelerator (if anything) in the system
- Documentation: Links to documentation for Julia users
- Michael Schlottke-Lakemper (University of Stuttgart, Germany)
These people have provided valuable input to this repository via private communication:
- Johannes Blaschke (@jblaschke)
- Valentin Churavy (@vchuravy)
- Mosè Giordano (@giordano)
- Ludovic Räss (@luraess)
- Samuel Omlin (@omlins)
Everything is provided as is and without warranty. Use at your own risk!