MPS feature

mikemckiernan · mikemckiernan · commit b46f15cda58c · 2024-06-12T14:08:51.000-04:00
Signed-off-by: Mike McKiernan &lt;mmckiernan@nvidia.com&gt;
diff --git a/gpu-operator/gpu-sharing-mps.rst b/gpu-operator/gpu-sharing-mps.rst
diff --git a/gpu-operator/gpu-sharing.rst b/gpu-operator/gpu-sharing.rst
@@ -54,21 +54,35 @@ and not modify nodes with other GPU models.
 You can combine the two approaches by applying a cluster-wide default configuration
 and then label nodes so that those nodes receive a node-specific configuration.
 
-Comparison: Time-Slicing and Multi-Instance GPU
-===============================================
+.. _comparison-ts-mps-mig:
 
-The latest generations of NVIDIA GPUs provide an operation mode called
-Multi-Instance GPU (MIG). MIG allows you to partition a GPU
-into several smaller, predefined instances, each of which looks like a
+Comparison: Time-Slicing, Multi-Process Service, and Multi-Instance GPU
+=======================================================================
+
+Each of the technologies, time-slicing, Multi-Process Service (MPS), and Multi-Instance GPU (MIG)
+enable sharing a physical GPU with more than one workload.
+
+NVIDIA A100 and newer GPUs provide an operation mode called MIG.
+MIG enables you to partition a GPU into *slices*.
+A slice is a smaller, predefined GPU instance that looks like a
 mini-GPU that provides memory and fault isolation at the hardware layer.
 You can share access to a GPU by running workloads on one of
 these predefined instances instead of the full native GPU.
 
 MIG support was added to Kubernetes in 2020. Refer to `Supporting MIG in Kubernetes <https://www.google.com/url?q=https://docs.google.com/document/d/1mdgMQ8g7WmaI_XVVRrCvHPFPOMCm5LQD5JefgAh6N8g/edit&sa=D&source=editors&ust=1655578433019961&usg=AOvVaw1F-OezvM-Svwr1lLsdQmu3>`_
 for details on how this works.
 
-Time-slicing trades the memory and fault-isolation that is provided by MIG
-for the ability to share a GPU by a larger number of users.
+NVIDIA V100 and newer GPUs support MPS.
+MPS enables dividing a physical GPU into *replicas* and assigning workloads to a replica.
+While MIG provides fault isolation in hardware, MPS uses software to divide the GPU into replicas.
+Each replica receives an equal portion of memory and thread percentage.
+For example, if you configure two replicas, each replica has access to 50% of GPU memory and 50% of compute capacity.
+
+Time-slicing is available with all GPUs supported by the Operator.
+Unlike MIG, time-slicing has no special memory or fault-isolation.
+Like MPS, time-slicing uses the term *replica*, however, the GPU is not divided between workloads.
+The GPU performs a context switch and swaps resources on and off the GPU when a workload is scheduled.
+
 Time-slicing also provides a way to provide shared access to a GPU for
 older generation GPUs that do not support MIG.
 However, you can combine MIG and time-slicing to provide shared access to
@@ -234,15 +248,15 @@ The following table describes the key fields in the config map.
 Applying One Cluster-Wide Configuration
 =======================================
 
-Perform the following steps to configure GPU time-slicing if you already installed the GPU operator
+Perform the following steps to configure GPU time-slicing if you already installed the GPU Operator
 and want to apply the same time-slicing configuration on all nodes in the cluster.
 
 #. Create a file, such as ``time-slicing-config-all.yaml``, with contents like the following example:
 
    .. literalinclude:: ./manifests/input/time-slicing-config-all.yaml
       :language: yaml
 
-#. Add the config map to the same namespace as the GPU operator:
+#. Add the config map to the same namespace as the GPU Operator:
 
    .. code-block:: console
 
@@ -284,7 +298,7 @@ control which configuration is applied to which nodes.
    .. literalinclude:: ./manifests/input/time-slicing-config-fine.yaml
       :language: yaml
 
-#. Add the config map to the same namespace as the GPU operator:
+#. Add the config map to the same namespace as the GPU Operator:
 
    .. code-block:: console
 
@@ -339,9 +353,9 @@ Configuring Time-Slicing Before Installing the NVIDIA GPU Operator
 You can enable time-slicing with the NVIDIA GPU Operator by passing the
 ``devicePlugin.config.name=<config-map-name>`` parameter during installation.
 
-Perform the following steps to configure time-slicing before installing the operator:
+Perform the following steps to configure time-slicing before installing the Operator:
 
-#. Create the namespace for the operator:
+#. Create the namespace for the Operator:
 
    .. code-block:: console
 
@@ -418,15 +432,17 @@ Perform the following steps to verify that the time-slicing configuration is app
    * The ``nvidia.com/gpu.count`` label reports the number of physical GPUs in the machine.
    * The ``nvidia.com/gpu.product`` label includes a ``-SHARED`` suffix to the product name.
    * The ``nvidia.com/gpu.replicas`` label matches the reported capacity.
+   * The ``nvidia.com/gpu.sharing-strategy`` label is set to ``time-slicing``.
 
    .. code-block:: output
-      :emphasize-lines: 3,4,5,7
+      :emphasize-lines: 3-6,8
 
       ...
       Labels:
                         nvidia.com/gpu.count=4
                         nvidia.com/gpu.product=Tesla-T4-SHARED
                         nvidia.com/gpu.replicas=4
+                        nvidia.com/gpu.sharing-strategy=time-slicing
       Capacity:
         nvidia.com/gpu: 16
         ...
@@ -441,15 +457,17 @@ Perform the following steps to verify that the time-slicing configuration is app
    * The ``nvidia.com/gpu`` capacity reports ``0``.
    * The ``nvidia.com/gpu.shared`` capacity equals the number of physical GPUs multiplied by the
      specified number of GPU replicas to create.
+   * The ``nvidia.com/gpu.sharing-strategy`` label is set to ``time-slicing``.
 
    .. code-block:: output
-      :emphasize-lines: 3,7,8
+      :emphasize-lines: 3,8,9
 
       ...
       Labels:
                         nvidia.com/gpu.count=4
                         nvidia.com/gpu.product=Tesla-T4
                         nvidia.com/gpu.replicas=4
+                        nvidia.com/gpu.sharing-strategy=time-slicing
       Capacity:
         nvidia.com/gpu:        0
         nvidia.com/gpu.shared: 16
diff --git a/gpu-operator/index.rst b/gpu-operator/index.rst
@@ -40,6 +40,7 @@
    :hidden:
 
    Multi-Instance GPU <gpu-operator-mig.rst>
+   MPS GPU Sharing <gpu-sharing-mps.rst>
    Time-Slicing GPUs <gpu-sharing.rst>
    gpu-operator-rdma.rst
    Outdated Kernels <install-gpu-operator-outdated-kernels.rst>
diff --git a/gpu-operator/manifests/input/mps-config-all.yaml b/gpu-operator/manifests/input/mps-config-all.yaml
@@ -0,0 +1,12 @@
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: mps-config-all
+data:
+  mps-any: |-
+    version: v1
+    sharing:
+      mps:
+        resources:
+        - name: nvidia.com/gpu
+          replicas: 4
diff --git a/gpu-operator/manifests/input/mps-config-fine.yaml b/gpu-operator/manifests/input/mps-config-fine.yaml
@@ -0,0 +1,22 @@
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: mps-config-fine
+data:
+  mps-four: |-
+    version: v1
+    sharing:
+      mps:
+        renameByDefault: false
+        resources:
+        - name: nvidia.com/gpu
+          replicas: 4
+  mps-two: |-
+    version: v1
+    sharing:
+      mps:
+        renameByDefault: false
+        resources:
+        - name: nvidia.com/gpu
+          replicas: 2
+
diff --git a/gpu-operator/manifests/input/mps-verification.yaml b/gpu-operator/manifests/input/mps-verification.yaml
@@ -0,0 +1,32 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: mps-verification
+  labels:
+    app: mps-verification
+spec:
+  replicas: 5
+  selector:
+    matchLabels:
+      app: mps-verification
+  template:
+    metadata:
+      labels:
+        app: mps-verification
+    spec:
+      tolerations:
+        - key: nvidia.com/gpu
+          operator: Exists
+          effect: NoSchedule
+      hostPID: true
+      containers:
+        - name: cuda-sample-vector-add
+          image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
+          command: ["/bin/bash", "-c", "--"]
+          args:
+            - while true; do /cuda-samples/vectorAdd; done
+          resources:
+           limits:
+             nvidia.com/gpu: 1
+      nodeSelector:
+        nvidia.com/gpu.sharing-strategy: mps
diff --git a/gpu-operator/manifests/input/time-slicing-verification.yaml b/gpu-operator/manifests/input/time-slicing-verification.yaml
@@ -28,3 +28,5 @@ spec:
           resources:
            limits:
              nvidia.com/gpu: 1
+      nodeSelector:
+        nvidia.com/gpu.sharing-strategy: time-slicing
diff --git a/gpu-operator/manifests/output/mps-all-get-events.txt b/gpu-operator/manifests/output/mps-all-get-events.txt
@@ -0,0 +1,11 @@
+LAST SEEN   TYPE     REASON             OBJECT                                              MESSAGE                                                                               
+38s         Normal   SuccessfulDelete   daemonset/nvidia-device-plugin-daemonset            Deleted pod: nvidia-device-plugin-daemonset-l86fw                                     
+38s         Normal   SuccessfulDelete   daemonset/gpu-feature-discovery                     Deleted pod: gpu-feature-discovery-shj2m
+38s         Normal   Killing            pod/gpu-feature-discovery-shj2m                     Stopping container gpu-feature-discovery                                              
+38s         Normal   Killing            pod/nvidia-device-plugin-daemonset-l86fw            Stopping container nvidia-device-plugin
+37s         Normal   Scheduled          pod/nvidia-device-plugin-daemonset-lcklx            Successfully assigned gpu-operator/nvidia-device-plugin-daemonset-lcklx to worker-1
+37s         Normal   SuccessfulCreate   daemonset/gpu-feature-discovery                     Created pod: gpu-feature-discovery-pgx9l
+37s         Normal   Scheduled          pod/gpu-feature-discovery-pgx9l                     Successfully assigned gpu-operator/gpu-feature-discovery-pgx9l to worker-0            
+37s         Normal   SuccessfulCreate   daemonset/nvidia-device-plugin-daemonset            Created pod: nvidia-device-plugin-daemonset-lcklx                                     
+36s         Normal   Created            pod/nvidia-device-plugin-daemonset-lcklx            Created container config-manager-init                                                 
+36s         Normal   Pulled             pod/nvidia-device-plugin-daemonset-lcklx            Container image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v24.3.0" already present on machine 
diff --git a/gpu-operator/manifests/output/mps-get-pods.txt b/gpu-operator/manifests/output/mps-get-pods.txt
@@ -0,0 +1,6 @@
+NAME                                READY   STATUS    RESTARTS   AGE
+mps-verification-86c99b5666-hczcn   1/1     Running   0          3s
+mps-verification-86c99b5666-sj8z5   1/1     Running   0          3s
+mps-verification-86c99b5666-tnjwx   1/1     Running   0          3s
+mps-verification-86c99b5666-82hxj   1/1     Running   0          3s
+mps-verification-86c99b5666-9lhh6   1/1     Running   0          3s
diff --git a/gpu-operator/manifests/output/mps-logs-pods.txt b/gpu-operator/manifests/output/mps-logs-pods.txt
@@ -0,0 +1,13 @@
+Found 5 pods, using pod/mps-verification-86c99b5666-tnjwx
+[Vector addition of 50000 elements]
+Copy input data from the host memory to the CUDA device
+CUDA kernel launch with 196 blocks of 256 threads
+Copy output data from the CUDA device to the host memory
+Test PASSED
+Done
+[Vector addition of 50000 elements]
+Copy input data from the host memory to the CUDA device
+CUDA kernel launch with 196 blocks of 256 threads
+Copy output data from the CUDA device to the host memory
+Test PASSED
+...