Merge pull request #302 from Exabyte-io/docs/SOF-7534

timurbazhirov · web-flow · commit 21a41166136b · 2025-01-21T09:42:37.000-08:00
SOF-7534: QE GPU tutorial
diff --git a/.github/workflows/build-tests.yml b/.github/workflows/build-tests.yml
@@ -15,7 +15,6 @@ jobs:
       matrix:
         os: ["ubuntu-24.04"]
         python-version:
-          - "3.8"
           - "3.9"
           - "3.10"
           - "3.11"
@@ -45,22 +44,22 @@ jobs:
     if: (github.repository != 'Exabyte-io/template-definitions-js-py') && (github.ref_name == 'master')
 
     steps:
-      -   name: Checkout this repository
-          uses: actions/checkout@v4
-          with:
-            lfs: true
+      - name: Checkout this repository
+        uses: actions/checkout@v4
+        with:
+          lfs: true
 
-      -   name: Checkout actions repository
-          uses: actions/checkout@v4
-          with:
-            repository: Exabyte-io/actions
-            token: ${{ secrets.BOT_GITHUB_TOKEN }}
-            path: actions
+      - name: Checkout actions repository
+        uses: actions/checkout@v4
+        with:
+          repository: Exabyte-io/actions
+          token: ${{ secrets.BOT_GITHUB_TOKEN }}
+          path: actions
 
-      -   name: Publish python release
-          uses: ./actions/py/publish
-          with:
-            python-version: 3.9.x
-            github-token: ${{ secrets.BOT_GITHUB_TOKEN }}
-            publish-tag: 'true'
-            publish-to-pypi: 'false'
+      - name: Publish python release
+        uses: ./actions/py/publish
+        with:
+          python-version: "3.10"
+          github-token: ${{ secrets.BOT_GITHUB_TOKEN }}
+          publish-tag: "true"
+          publish-to-pypi: "false"
diff --git a/.github/workflows/s3-deploy.yml b/.github/workflows/s3-deploy.yml
@@ -3,9 +3,9 @@ name: Update S3 deploy
 on:
   push:
     branches:
-      - 'master'
+      - "master"
   schedule:
-    - cron: '0 0 1 1 *'
+    - cron: "0 0 1 1 *"
   workflow_dispatch:
 
 jobs:
@@ -26,7 +26,7 @@ jobs:
       - name: Set python 3 version
         uses: actions/setup-python@v5
         with:
-          python-version: "3.8"
+          python-version: "3.10"
 
       - name: Build pages
         uses: Exabyte-io/action-mkdocs-build@main
diff --git a/README.md b/README.md
@@ -6,7 +6,7 @@
 
 For a quick installation:
 
-1. Install dependencies: python 3 (tested on Python `3.8`-`3.13`), `pip`, `curl`, [`virtualenv`](https://virtualenv.pypa.io/en/latest/installation/), git, [git-lfs](https://git-lfs.github.com/).
+1. Install dependencies: python 3 (tested on Python `3.9`-`3.13`), `pip`, `curl`, [`virtualenv`](https://virtualenv.pypa.io/en/latest/installation/), git, [git-lfs](https://git-lfs.github.com/).
 
 2. Clone this repository:
 
diff --git a/images/jobs-cli/open-web-terminal.webp b/images/jobs-cli/open-web-terminal.webp
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:72b0af010b58c47d3932739bb7ee2be551784ec40de2716622bd93d54abdcb34
+size 27512
diff --git a/lang/en/docs/tutorials/jobs-cli/qe-gpu.json b/lang/en/docs/tutorials/jobs-cli/qe-gpu.json
@@ -0,0 +1,174 @@
+{
+    "descriptionLinks": [
+        "Accelerate Quantum ESPRESSO simulation with GPUs: https://docs.mat3ra.com/tutorials/jobs-cli/qe-gpu/"
+    ],
+    "description": "We walk through a step-by-step example of running a Quantum ESPRESSO job on a GPU enabled node. We see significant performance improvement by using CUDA/GPU-enabled version of Quantum ESPRESSO.",
+    "tags": [
+        {
+            "...": "../../metadata/general.json#/tags"
+        },
+        {
+            "...": "../../models-directory/dft.json#/tags"
+        },
+        {
+            "...": "../../software-directory/modeling/quantum-espresso.json#/tags"
+        },
+        "CUDA",
+        "GPU",
+        "NVIDIA"
+    ],
+    "title": "Mat3ra Tutorial: Accelerate Quantum ESPRESSO simulation with GPUs",
+    "youTubeCaptions": [
+        {
+            "text": "Hello, and welcome to the matera tutorial series.",
+            "startTime": "00:00:00.000",
+            "endTime": "00:00:03.000"
+        },
+        {
+            "text": "In today's tutorial, we will go through a step-by-step example of running a Quantum ESPRESSO simulation on one of our GPU enabled compute nodes.",
+            "startTime": "00:00:04.000",
+            "endTime": "00:00:14.000"
+        },
+        {
+            "text": "We will see how we can dramatically improve the performance of our simulation using GPUs.",
+            "startTime": "00:00:15.000",
+            "endTime": "00:00:20.000"
+        },
+        {
+            "text": "At the moment, GPU build of Quantum ESPRESSO is only available via our command line interface, and soon it will be made available in the web interface.",
+            "startTime": "00:00:21.000",
+            "endTime": "00:00:30.000"
+        },
+        {
+            "text": "Let's connect to the login node using SSH.",
+            "startTime": "00:00:31.000",
+            "endTime": "00:00:34.000"
+        },
+        {
+            "text": "You can use your terminal application and type S S H, your username at login dot matera dot com and press enter.",
+            "startTime": "00:00:35.000",
+            "endTime": "00:00:41.000"
+        },
+        {
+            "text": "If you need help on how to set up S S H, please visit our documentation site at docs dot matera dot com, and search S S H.",
+            "startTime": "00:00:42.000",
+            "endTime": "00:00:51.000"
+        },
+        {
+            "text": "Here you will find step by step guide to setup S S H key for seamless authentication.",
+            "startTime": "00:00:52.000",
+            "endTime": "00:00:57.000"
+        },
+        {
+            "text": "Note that it is also possible to connect to the login node from our web platform using the web terminal.",
+            "startTime": "00:00:58.000",
+            "endTime": "00:01:04.000"
+        },
+        {
+            "text": "Besides, <break time='0.5'/> it is also possible to run a command line job via bash workflow in our web platform.",
+            "startTime": "00:01:05.000",
+            "endTime": "00:01:12.000"
+        },
+        {
+            "text": "Create a new workflow. Select shell script as application.",
+            "startTime": "00:01:13.000",
+            "endTime": "00:01:16.000"
+        },
+        {
+            "text": "Add an execution unit and write your job script.",
+            "startTime": "00:01:17.000",
+            "endTime": "00:01:20.000"
+        },
+        {
+            "text": "For now, let's focus on the command line part.",
+            "startTime": "00:01:22.000",
+            "endTime": "00:01:24.000"
+        },
+        {
+            "text": "The example calculation we are going to demonstrate is available in our github repository C L I job examples.",
+            "startTime": "00:01:25.000",
+            "endTime": "00:01:33.000"
+        },
+        {
+            "text": "Please browse under espresso, then gpu, where you will find required input and reference output files.",
+            "startTime": "00:01:34.000",
+            "endTime": "00:01:39.000"
+        },
+        {
+            "text": "Once connected to the login node, let's navigate to your working directory, and clone our example repository.",
+            "startTime": "00:01:40.000",
+            "endTime": "00:01:47.000"
+        },
+        {
+            "text": "After cloning the repository, we also need to sync the L F S objects with git L F S pull.",
+            "startTime": "00:01:50.000",
+            "endTime": "00:01:56.000"
+        },
+        {
+            "text": "Let's navigate to our GPU example.",
+            "startTime": "00:01:57.000",
+            "endTime": "00:02:00.000"
+        },
+        {
+            "text": "Let's examine the P B S job script.",
+            "startTime": "00:02:03.000",
+            "endTime": "00:02:05.000"
+        },
+        {
+            "text": "We will run our job in GPU enabled G O F queue, we will request one node which has eight CPUs.",
+            "startTime": "00:02:07.000",
+            "endTime": "00:02:13.000"
+        },
+        {
+            "text": "To run quantum espresso jobs in GPUs, we need to load the CUDA build of quantum espresso.",
+            "startTime": "00:02:14.000",
+            "endTime": "00:02:19.000"
+        },
+        {
+            "text": "We set eight open M P threads and 1 M P I per GPU.",
+            "startTime": "00:02:20.000",
+            "endTime": "00:02:24.000"
+        },
+        {
+            "text": "We can also set parallelization options for k point and matrix diagonalization.",
+            "startTime": "00:02:25.000",
+            "endTime": "00:02:30.000"
+        },
+        {
+            "text": "Finally, we can submit our job with Q sub command. We can find the status of job with Q stat.",
+            "startTime": "00:02:31.000",
+            "endTime": "00:02:37.000"
+        },
+        {
+            "text": "Once the job is completed, we can examine the output file.",
+            "startTime": "00:02:38.000",
+            "endTime": "00:02:41.000"
+        },
+        {
+            "text": "We will see that the GPU acceleration was enabled for the calculation.",
+            "startTime": "00:02:44.000",
+            "endTime": "00:02:49.000"
+        },
+        {
+            "text": "If we scroll to the bottom of the file, we will see the total time taken by the program. The wall time for this job was slightly less than a minute.",
+            "startTime": "00:02:50.000",
+            "endTime": "00:02:58.000"
+        },
+        {
+            "text": "For comparison, we ran the same job using eight CPUs but without GPU acceleration, <break time='0.5'/> it took about 20 times longer.",
+            "startTime": "00:03:02.000",
+            "endTime": "00:03:10.000"
+        },
+        {
+            "text": "Now you may test different combination of M P I and open M P threads, different parallelization option, and see what gives you the best performance.",
+            "startTime": "00:03:11.000",
+            "endTime": "00:03:20.000"
+        },
+        {
+            "text": "Thank you for watching this tutorial and using our platform.",
+            "startTime": "00:03:21.000",
+            "endTime": "00:03:24.000"
+        }
+    ],
+    "youTubeId": "trLDEwWc3ho"
+}
diff --git a/lang/en/docs/tutorials/jobs-cli/qe-gpu.md b/lang/en/docs/tutorials/jobs-cli/qe-gpu.md
@@ -0,0 +1,90 @@
+---
+tags:
+  - GPU
+  - CUDA
+hide:
+  - tags
+---
+# Accelerate Quantum ESPRESSO simulation with GPUs
+
+We will walk through a step-by-step example of running a Quantum ESPRESSO job on
+GPUs. As of the time of writing, the GPU (CUDA) build of Quantum ESPRESSO is
+only available via the Command Line Interface (CLI). We will see that we can
+dramatically speedup our Quantum ESPRESSO simulation by using GPUs.
+
+1. First connect to login node via [SSH client](../../remote-connection/ssh.md),
+or [web terminal](../../remote-connection/web-terminal.md). Note that it is also
+possible to run CLI jobs by creating a [bash workflow](
+../../software-directory/scripting/shell/overview.md).
+
+    ![Wen Terminal](../../images/jobs-cli/open-web-terminal.webp)
+
+2. Example job that we are going to run is available in git repository
+[exabyte-io/cli-job-examples](https://github.com/exabyte-io/cli-job-examples).
+You may clone the repository to your working directory:
+```bash
+git clone https://github.com/exabyte-io/cli-job-examples
+cd cli-job-examples
+git lfs pull
+cd espresso/gpu
+```
+
+3. You will find all required input files and job script under `espresso/gpu`.
+Please review the input files and PBS job script, update the project name, and
+other parameters as necessary.
+
+4. We will use [GOF](../../infrastructure/clusters/aws.md#hardware-specifications)
+queue, which comprises 8 CPUs and 1 NVIDIA V100 GPU per node.
+
+5. Since our compute node contains 8 CPUs with 1 GPU, we will run 1 MPI process
+with 8 OpenMP threads.
+```bash
+module load espresso/7.4-cuda-12.4-cc-70
+export OMP_NUM_THREADS=8
+mpirun -np 1 pw.x -npool 1 -ndiag 1 -in pw.cuo.scf.in > pw.cuo.gpu.scf.out
+```
+
+6. Finally, we can submit our job using:
+```bash
+qsub job.gpu.pbs
+```
+
+7. Once, the job is completed, we can inspect the output file `pw.cuo.gpu.scf.out`.
+We will see that GPU was used, and the job took about 1 minute wall time.
+```
+Parallel version (MPI & OpenMP), running on       8 processor cores
+Number of MPI processes:                 1
+Threads/MPI process:                     8
+...
+
+GPU acceleration is ACTIVE.  1 visible GPUs per MPI rank
+GPU-aware MPI enabled
+...
+
+Parallel routines
+
+PWSCF        :     37.94s CPU     50.77s WALL
+```
+
+8. For comparison, we ran the same calculation using only CPUs, and it took
+about 20 times longer.
+```
+Parallel version (MPI), running on     8 processors
+
+MPI processes distributed on     1 nodes
+...
+
+Parallel routines
+
+PWSCF        :  18m 0.56s CPU  18m25.33s WALL
+```
+
+You may experiment different combinations of MPI and OpenMP, various
+[parallelization options](https://www.quantum-espresso.org/Doc/user_guide/node20.html),
+and find what gives you the best performance.
+
+## Step-by-step screenshare video
+
+<div class="video-wrapper">
+<iframe class="gifffer" width="100%" height="100%" src="https://www.youtube.com/embed/trLDEwWc3ho" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+</div>
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -153,6 +153,7 @@ nav:
             - Overview:                              tutorials/jobs-cli/overview.md
             - Create + run a CLI Job:                tutorials/jobs-cli/job-cli-example.md
             - Import a CLI Job to Web Interface:     tutorials/jobs-cli/cli-job-import.md
+            - QE GPU Job:                            tutorials/jobs-cli/qe-gpu.md
         - Templating:
             - Overview:                              tutorials/templating/overview.md
             - Flags by Elemental Composition:        tutorials/templating/set-flag-by-composition.md
diff --git a/netlify.toml b/netlify.toml
@@ -3,5 +3,5 @@
   publish = "site/"
 
 [build.environment]
-  PYTHON_VERSION = "3.8"
+  PYTHON_VERSION = "3.10"
   NODE_VERSION = "20"
diff --git a/requirements.txt b/requirements.txt