Login to the Graphcore login node from your local machine. Once you are on the login node, ssh to one of the Graphcore nodes.
local > ssh [email protected]
# or
local > ssh [email protected]
login-01.ai.aclf.anl.gov > ssh gc-poplar-02.ai.alcf.anl.gov
# or
login-01.ai.aclf.anl.gov > ssh gc-poplar-03.ai.alcf.anl.gov
# or
login-01.ai.aclf.anl.gov > ssh gc-poplar-04.ai.alcf.anl.gov
mkdir -p ~/venvs/graphcore
virtualenv ~/venvs/graphcore/poptorch33_env
source ~/venvs/graphcore/poptorch33_env/bin/activate
pip install $POPLAR_SDK_ROOT/poptorch-3.3.0+113432_960e9c294b_ubuntu_20_04-cp38-cp38-linux_x86_64.whl
virtualenv ~/venvs/graphcore/tensorflow2_33_env
source ~/venvs/graphcore/tensorflow2_33_env/bin/activate
pip install $POPLAR_SDK_ROOT/tensorflow-2.6.3+gc3.3.0+251580+08d96978c7f+amd_znver1-cp38-cp38-linux_x86_64.whl
pip install $POPLAR_SDK_ROOT/keras-2.6.0+gc3.3.0+251582+a3785372-py2.py3-none-any.whl
mkdir ~/tmp
export TF_POPLAR_FLAGS=--executable_cache_path=~/tmp
export PYTHONPATH=/software/graphcore/poplar_sdk/3.3.0/poplar-ubuntu_20_04-3.3.0+7857-b67b751185/python:$PYTHONPATH
We use examples from Graphcore Examples repository for this hands-on. Clone the Graphcore Examples repository.
mkdir ~/graphcore
cd ~/graphcore
git clone https://github.com/graphcore/examples.git
cd examples
git tag
git checkout v3.3.0
ALCF's Graphcore POD64 system uses Slurm for job submission and queueing. Below are some of the important commands for using Slurm.
: The Slurm commandsrun
can be used to run individual Python scripts. Use the --ipus= option to specify the number of IPUs required for the run. e.g.srun --ipus=1 python mnist_poptorch.py
: The jobs can be submitted to the Slurm workload manager through a batch script by using thesbatch
: command provides information about jobs located in the Slurm scheduling queue.sCancel
: is used to signal or cancel jobs, job arrays, or job steps.
Refer to respective instrcutions below
Note: Precompiled artifacts are present at the /software/graphcore/projects/models_compile location for the above models.
copy them to your ~/tmp and set export POPTORCH_CACHE_DIR=~/tmp to skip the compile process.
We will use Pop Vision Graph Analyzer and System Analyzer to produce profiles.
To generate a profile for PopVision Graph Analyzer, run the executable with the following prefix
$ POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true", "autoReport.directory":"./graph_profile", "profiler.includeFlopEstimates": "true"}' python mnist_poptorch.py
This will generate all the graph profiling reports along with flops estimates and save the output to the graph_profile directory.
To visualize the profiles, download generated profiles to a local machine and open them using PopVision Graph Analyzer.
To generate a profile for PopVision System Analyzer, run the executable with the following prefix
$ PVTI_OPTIONS='{"enable":"true", "directory": "./system_profile"}' python mnist_poptorch.py
This will generate all the system profiling reports and save the output to system_profile directory.
To visualize the profiles, download generated profiles to a local machine and open them using PopVision Graph Analyzer.
- ALCF Graphcore Documentation
- Graphcore Documentation
- Graphcore Examples Repository
- Graphcore SDK Path: