-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding dra docs #162
base: main
Are you sure you want to change the base?
adding dra docs #162
Conversation
Signed-off-by: Abigail McCarthy <[email protected]>
Signed-off-by: Abigail McCarthy <[email protected]>
Documentation preview |
About the NVIDIA GPU DRA Driver | ||
******************************* | ||
|
||
The NVIDIA GPU DRA Driver leverages the Kubernetes Dynamic Resource Allocation (DRA) API to support NVIDIA IMEX channels available in GH200 and GB200 GPUs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be we should add a bit about imex channels. E.g.
.... to support NVIDIA IMEX (Internode Memory Exchange/Management Service) channels available in GH200 and G200 systems that allows the GPUs to directly read / write each other’s memory over a high-bandwidth NVLink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defined in line 13 and 14
The NVIDIA GPU DRA Driver creates and manages IMEX channels through the creation of a ComputeDomain custom resource. Use this custom resource to define your resource templates, and then reference the templates within your workload specs. | ||
|
||
An IMEX channel is a construct that allows a set of GPUs to directly read and write each other's memory over a high-bandwidth NVLink. | ||
The NVLink connection may either be directly between GPUs on the same node or between GPUs on separate nodes connected by an NVSwitch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh you defined it here.
gpu-operator/dra-driver.rst
Outdated
The NVLink connection may either be directly between GPUs on the same node or between GPUs on separate nodes connected by an NVSwitch. | ||
Once an IMEX channel has been established for a set of GPUs, they are free to read and write each other's memory via extensions to the CUDA memory call APIs. | ||
|
||
The ability to support IMEX channels on GH200 and GB200 systems is essential, as they have been designed specifically to exploit the use of IMEX channels to turn a rack of GPU machines (each with a small number of GPUs) into a giant supercomputer with up to 72 GPUs communicating at full NVLink bandwidth. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is imp to first summarize what is to follow. That imex channels are meant for multi-node communication
IMEX channel (by definition) is a resource that spans multiple nodes, hence the ability to support IMEX channels on GH200 and GB200 systems is essential. These GH200 and GB200 systems are designed specifically to leverage the use of IMEX channels to turn a rack of GPU machines (each with a small number of GPUs) into a giant supercomputer with up to 72 GPUs communicating at full NVLink bandwidth.
gpu-operator/dra-driver.rst
Outdated
|
||
Kubernetes Dynamic Resource Allocation (DRA), available as beta in Kubernetes v1.32, is an API for requesting and sharing resources between pods and containers inside a pod. | ||
This feature treats specialized hardware as a definable and reusable object and provides the necessary primitives to support cross-node resources such as IMEX channels. | ||
Along with the NVIDIA GPU DRA Driver, you are able to use DRA to define IMEX channel resources that can be managed by Kubernetes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVIDIA GPU DRA Driver uses DRA features to define IMEX channel resources that can be managed by Kubernetes.
gpu-operator/dra-driver.rst
Outdated
Drivers must be pre-installed on all GPU nodes before installing the NVIDIA GPU Operator as operator managed drivers are not supported at this time. | ||
|
||
- IMEX packages installed on GPU nodes with systemd service disabled. | ||
The IMEX package verisons must match the installed driver version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to the release notes for the exact version of the driver and IMEX to install.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guptaNswati can you share the a link to release notes you are referring to? thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the official way to install NVIDIA drivers https://www.nvidia.com/en-us/drivers/
gpu-operator/dra-driver.rst
Outdated
|
||
$ curl -fsSL -o /tmp/nvidia-imex-${IMEX_DRIVER_VERSION}_${DRIVER_VERSION}-1_${TARGETARCH}.deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/sbsa/nvidia-imex-${IMEX_DRIVER_VERSION}_${DRIVER_VERSION}-1_${TARGETARCH}.deb && dpkg -i /tmp/nvidia-imex-${IMEX_DRIVER_VERSION}_${DRIVER_VERSION}-1_${TARGETARCH}.deb && \ | ||
nvidia-imex --version && \ | ||
ls /etc/nvidia-imex && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont need this ls
gpu-operator/dra-driver.rst
Outdated
The following example shows how to install IMEX drivers. | ||
|
||
.. code-block:: console | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are assuming, you already installed NVIDIA drivers
DRIVER_VERSION=$(nvidia-smi -i 0 --query-gpu=driver_version --format=csv,noheader,nounits)
IMEX_DRIVER_VERSION="${DRIVER_VERSION%%.*}"
TARGETARCH=$(uname -m | sed -E 's/^x86_64$/amd64/; s/^(aarch64|arm64)$/arm64/')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, line 32 defines the second prerequisite for installing drivers
gpu-operator/dra-driver.rst
Outdated
Once all daemons have been fully started, the DRA driver unblocks each worker, injects its IMEX channel into the worker and allows it to start running. | ||
|
||
|
||
View the Compute Domain resources on your cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also add the MPI nvbandwidth example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its also described here NVIDIA/k8s-dra-driver-gpu#249
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add that!
gpu-operator/dra-driver.rst
Outdated
Node and Pod Affitnity Strategies | ||
========================================= | ||
|
||
The ComputeDomain object is not tied directly to the notion of an IMEX channel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rephrase it:
A ComputeDomain isn’t strictly about IMEX channels—it’s about running workloads across a group of compute nodes. This means even if some nodes are not IMEX capable, they can still be part of the same ComputeDomain. To control where your workloads run, you should use NodeAffinity and PodAffinity rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This pull request introduces documentation-related changes for the DRA driver while also providing configuration manifests for enabling dynamic resource allocation and related compute domain functionality.
- Adds a kubeadm initialization configuration to enable the DynamicResourceAllocation feature.
- Introduces a ComputeDomain resource and a pod manifest for resource channel injection.
- Provides an additional ComputeDomain definition in a separate file.
Reviewed Changes
File | Description |
---|---|
gpu-operator/manifests/input/kubeadm-init-config.yaml | Defines kubeadm ClusterConfiguration and KubeletConfiguration with dynamic resource allocation settings. |
gpu-operator/manifests/input/imex-channel-injection.yaml | Contains ComputeDomain and Pod definitions for testing resource channel injection. |
gpu-operator/manifests/input/dra-compute-domain-crd.yaml | Provides an alternative ComputeDomain resource definition that duplicates functionality in imex-channel-injection.yaml. |
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (1)
gpu-operator/manifests/input/dra-compute-domain-crd.yaml:1
- The ComputeDomain resource defined here duplicates the one in imex-channel-injection.yaml. Consider consolidating the definitions to avoid potential conflicts during deployment.
apiVersion: resource.nvidia.com/v1beta1
Signed-off-by: Abigail McCarthy <[email protected]>
|
||
|
||
****************************************************************** | ||
Run a Multi-node nvbandwidth Test Requiring IMEX Channels with MPI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i need to do some more testing to clean up this example and add some explanations around whats happening.
Signed-off-by: Abigail McCarthy <[email protected]>
SPDX-License-Identifier: Apache-2.0 | ||
|
||
############################################################ | ||
Install the NVIDIA GPU DRA Driver and Configure IMEX Support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what we're supporting from a customer perspective is "Multi-Node NVLink" and how we're supporting it is through IMEX channels. The IMEX channels are an implementation detail from the user perspective.
I also believe that the DRA driver is no longer called the IMEX driver, but the Compute Domain DRA Driver
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick feedback on naming.
I am OK with "NVIDIA GPU DRA Driver".
We need consistency. Naming is hard.
Here are my preferences, also after talking to Kevin:
NVIDIA DRA Driver for GPUs
Kubernetes DRA driver for NVIDIA GPUs
The latter is most precise and intuitive I think. But maybe too long, and too k8sy.
I trust that a certain decision has already been made, and we are late in the game with naming discussion. Yet, it's an important discussion.
For the Helm chart (public) listing I picked "NVIDIA DRA Driver for GPUs", see
https://catalog.ngc.nvidia.com/orgs/nvidia/helm-charts/nvidia-dra-driver-gpu
Here is how we should think about it:
The DRA driver for NVIDIA GPUs enables multi-node GPU workloads in Kubernetes environments
That's the very high-level.
I agree with Evan that anything IMEX is an (important) implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also believe that the DRA driver is no longer called the IMEX driver
And yes yes yes to this. We should think that it never was named IMEX driver. This was an organizational brainfart. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After Slack discussion: for now, we decided to use "NVIDIA DRA Driver for GPUs", like we did for nSpect and also the Helm chart registry. Kevin also prefers this:
I usually prefer NVIDIA DRA Driver for GPUs so there’s not 3 all caps acronyms right next to each other
This page details more information about the GPU DRA Driver, including how to install it and examples of deploying workloads using IMEX channels. | ||
|
||
******************************* | ||
About the NVIDIA GPU DRA Driver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the "GPU DRA Driver" in this context? Is it the driver that is managing the lifecycle of compute domains or access to GPUs themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the "GPU DRA Driver" in this context?
The "NVIDIA GPU DRA Driver" in this document always refers to the component developed in https://github.com/NVIDIA/k8s-dra-driver-gpu
Is it the driver that is managing the lifecycle of compute domains
in that sense: yes
As a general comment, we need to call out what that driver enables -- which is to use Multi-Node NVLink -- and not how this is enabled. The UX for ComputeDomains has been designed to explicitly make minimal references to IMEX since this is an implementation detail. It is required to enable MNNVL, but should not be something that a user is concerned with. I'll make another pass at reviewing this soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces draft manifests for the DRA driver documentation and related resources, outlining initial configurations and resource definitions.
- Added a kubeadm initialization configuration file with feature gate settings for dynamic resource allocation.
- Introduced a ComputeDomain resource manifest.
- Provided an injection manifest that combines a ComputeDomain definition with a Pod using the resource.
Reviewed Changes
Copilot reviewed 4 out of 7 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
gpu-operator/manifests/input/kubeadm-init-config.yaml | New kubeadm init config with extraArgs for enabling dynamic resource allocation. |
gpu-operator/manifests/input/dra-compute-domain-crd.yaml | New ComputeDomain resource definition with basic specs. |
gpu-operator/manifests/input/imex-channel-injection.yaml | Injection manifest combining a ComputeDomain and a Pod definition. |
Files not reviewed (3)
- gpu-operator/index.rst: Language not supported
- gpu-operator/manifests/output/compute-domain-channel-injection-crd.txt: Language not supported
- gpu-operator/manifests/output/imex-logs.txt: Language not supported
Comments suppressed due to low confidence (1)
gpu-operator/manifests/input/imex-channel-injection.yaml:2
- The ComputeDomain resource is defined in both this file and in dra-compute-domain-crd.yaml. Confirm if this duplication is intentional or consider consolidating these definitions to avoid potential deployment conflicts.
apiVersion: resource.nvidia.com/v1beta1
- name: "feature-gates" | ||
value: "DynamicResourceAllocation=true" | ||
- name: "runtime-config" | ||
value: "resource.k8s.io/v1alpha3=true" | ||
controllerManager: | ||
extraArgs: | ||
- name: "feature-gates" | ||
value: "DynamicResourceAllocation=true" | ||
scheduler: | ||
extraArgs: | ||
- name: "feature-gates" | ||
value: "DynamicResourceAllocation=true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation for 'value' under the extraArgs list item appears inconsistent with standard YAML formatting. Consider aligning it to 2 additional spaces relative to the '-' line (e.g., 4 spaces instead of 5) to ensure proper parsing.
- name: "feature-gates" | |
value: "DynamicResourceAllocation=true" | |
- name: "runtime-config" | |
value: "resource.k8s.io/v1alpha3=true" | |
controllerManager: | |
extraArgs: | |
- name: "feature-gates" | |
value: "DynamicResourceAllocation=true" | |
scheduler: | |
extraArgs: | |
- name: "feature-gates" | |
value: "DynamicResourceAllocation=true" | |
- name: "feature-gates" | |
value: "DynamicResourceAllocation=true" | |
- name: "runtime-config" | |
value: "resource.k8s.io/v1alpha3=true" | |
controllerManager: | |
extraArgs: | |
- name: "feature-gates" | |
value: "DynamicResourceAllocation=true" | |
scheduler: | |
extraArgs: | |
- name: "feature-gates" | |
value: "DynamicResourceAllocation=true" |
Copilot is powered by AI, so mistakes are possible. Review output carefully before use.
|
||
An IMEX channel is a construct that allows a set of GPUs to directly read and write each other's memory over a high-bandwidth NVLink. | ||
The NVLink connection may either be directly between GPUs on the same node or between GPUs on separate nodes connected by an NVSwitch. | ||
Once an IMEX channel has been established for a set of GPUs, they are free to read and write each other's memory via extensions to the CUDA memory call APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hunch is that we should not go into this detail here but just link to reference documentation. It's very easy to say something wrong/misleading here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgehrcke do you happen to know where that documentation lives? I have included a lot of this info b/c i was unable to find a place where we can link out to :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference docs are here, for example: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MALLOC__ASYNC.html#group__CUDA__MALLOC__ASYNC_1g8aa4c143dbc20293659cd883232b95f2
There is the beautiful statement of
When exporter and importer CUDA processes have been granted access to the same IMEX channel, they can securely share memory.
And that's already a better description because of the emphasis on securely.
Maybe we can borrow that statement, and otherwise just link to refdocs.
--namespace nvidia-dra-driver-gpu \ | ||
--set nvidiaDriverRoot=/run/nvidia/driver \ | ||
--set nvidiaCtkPath=/usr/local/nvidia/toolkit/nvidia-ctk \ | ||
--set resources.gpus.enabled=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for following this ❤️
* - ``nvidiaDriverRoot`` | ||
- Specifies the driver root on the host. | ||
For Operator managed drivers, use ``/run/nvidia/driver``. | ||
For pre-installed drivers, use ``/``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we provide a recommended mode of operation?
Something like "Use Operator-managed drivers if in doubt".
|
||
* - ``nvidiaCtkPath`` | ||
- Specifies the path of The NVIDIA Container Tool Kit binary (nvidia-ctk) on the host, as it should appear in the the generated CDI specification. | ||
The exact path depends on the system that runs on the node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe here we should also say sth like "use the Operator-managed nvidia-ctk if in doubt"
CC @elezar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do something like we do above for the nvidiaDriverRoot
and call out the two options with their default values.
/usr/bin/nvidia-ctk
for a pre-installed NVIDIA Container Toolkit/usr/local/nvidia/toolkit/nvidia-ctk
for an GPU Operator-installed NVIDIA Container Toolkit
Prerequisites | ||
============= | ||
|
||
- GH200 and GB200 GPUs with Mulit-Node NVLink connections between GPUs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multi
and maybe we can also introduce the abbreviation "(MNNVL)" here and re-use it below if we can. It's seemingly (as I am learning about this, too) a very common abbreviation in this growing ecosystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the Compute Domain driver does not require a GH200 or GB200 system. The driver will function on systems that do not have MNNVL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elezar what do you mean by function? like it will deploy and not throw errors, or will it actually create resources on the cluster, even though there is no MNNVL
This means even if some nodes are not IMEX capable, they can still be part of the same ComputeDomain. | ||
You must apply NodeAffinity and PodAffinity rules to make sure your workloads run on IMEX capable nodes. | ||
|
||
For example you could set PodAffinity with a preferred topologyKey set to ``nvidia.com/gpu.clique`` for workloads to span multiple NVLink domains but want them packed as tightly as possible. Or use a required topologyKey set to ``nvidia.con/gpu.clique`` when you require all workloads deployed into the same NVLink domain, but don't care which one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to add my understanding here and then we can think about potentially rewording/reordering, simplifying.
Technically, when requiring the value of nvidia.con/gpu.clique
to be the same among all jobs then this is the same as saying: all jobs must land in the same domain&clique. And that means: all jobs are mutually reachable (all-to-all communication is possible). For a MNNVL setup this is what's typically wanted, and maybe we should lead with that.
So, I consider this use case to be more advanced or exotic:
for workloads to span multiple NVLink domains but want them packed as tightly as possible.
But maybe I don't know enough.
apiVersion: resource.nvidia.com/v1beta1 | ||
kind: ComputeDomain | ||
metadata: | ||
name: imex-channel-injection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klueska do you have ideas about another name for this?
Install the NVIDIA GPU DRA Driver and Configure IMEX Support | ||
############################################################ | ||
|
||
THe NVIDIA GPU DRA Driver is an additional component you can install after the GPU Operator that enables you to use the Kubernetes DRA feature to define IMEX channel resources that are managed by Kubernetes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about:
THe NVIDIA GPU DRA Driver is an additional component you can install after the GPU Operator that enables you to use the Kubernetes DRA feature to define IMEX channel resources that are managed by Kubernetes. | |
The NVIDIA GPU DRA Driver is an additional component you can install alongside the GPU Operator that enables you to use the Kubernetes DRA feature to support Multi-Node NVLink in distributed applications. | |
We may also want to link to the Kubernetes DRA documentation? https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/
About the NVIDIA GPU DRA Driver | ||
******************************* | ||
|
||
The NVIDIA GPU DRA Driver leverages the Kubernetes Dynamic Resource Allocation (DRA) API to support NVIDIA IMEX channels available in GH200 and GB200 GPUs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does something like the following better capture the intent of the COMPUTE DOMAIN DRIVER:
The NVIDIA GPU DRA Driver provides a Compute Domain abstraction (as a Kubernetes CRD) that allows distributed applications to make use of technologies such as Mulit-node NVLink if available. The underlying "connectivity" (not sure about this word) is managed by the NVIDIA GPU DRA Driver to ensure portability workloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this does not mention IMEX channels.
Install the NVIDIA GPU DRA Driver | ||
********************************* | ||
|
||
The GPU DRA Driver is an addiitonal component that can be installed after you've installed the GPU Operator on your clsuter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's strictly speaking required that the GPU Operator be installed first, but may make things simpler.
Also a typo:
The GPU DRA Driver is an addiitonal component that can be installed after you've installed the GPU Operator on your clsuter. | |
The GPU DRA Driver is an addiitonal component that can be installed after you've installed the GPU Operator on your k8s cluster. | |
|
||
- GH200 and GB200 GPUs with Mulit-Node NVLink connections between GPUs. | ||
|
||
- Kubernetes v1.32 multi-node cluster with the DynamitcResourceAllocation feature gate enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Kubernetes v1.32 multi-node cluster with the DynamitcResourceAllocation feature gate enabled. | |
- A Kubernetes v1.32 cluster with the `DynamitcResourceAllocation` feature gate enabled and the `resource.k8s.io` API group enabled. | |
|
||
- Kubernetes v1.32 multi-node cluster with the DynamitcResourceAllocation feature gate enabled. | ||
|
||
The following is a sample for enabling DRA feature gates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following is a sample for enabling DRA feature gates. | |
The following is a sample for enabling the required feature gates and API groups. | |
:language: yaml | ||
:caption: Sample Kubeadm Init Config with DRA Feature Gates Enabled | ||
|
||
- The NVIDIA GPU Operator v25.3.0 or later installed with CDI enabled on all nodes and NVIDIA GPU Driver 565 or later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The v565 driver is only for GH200 / GB200 systems where MNNVL is supported. For other systems older driver versions should be sufficient.
--version=${version} \ | ||
--set cdi.enabled=true | ||
|
||
Note if you want to install the GPU DRA Driver using pre-installed drivers, you must install NVIDIA GPU Driver 565 or later, the corresponding IMEX packages on GPU nodes, and disable the IMEX systemd service before installing the GPU Operator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note if you want to install the GPU DRA Driver using pre-installed drivers, you must install NVIDIA GPU Driver 565 or later, the corresponding IMEX packages on GPU nodes, and disable the IMEX systemd service before installing the GPU Operator. | |
Note if you want to install the NVIDIA GPU DRA Driver using pre-installed drivers, you must install NVIDIA GPU Driver 565 or later, the corresponding IMEX packages on GPU nodes, and disable the IMEX systemd service before installing the GPU Operator. | |
|
||
* - ``nvidiaDriverRoot`` | ||
- Specifies the driver root on the host. | ||
For Operator managed drivers, use ``/run/nvidia/driver``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Operator managed drivers, use ``/run/nvidia/driver``. | |
For Operator-managed drivers, use ``/run/nvidia/driver``. | |
- ``/`` | ||
|
||
* - ``nvidiaCtkPath`` | ||
- Specifies the path of The NVIDIA Container Tool Kit binary (nvidia-ctk) on the host, as it should appear in the the generated CDI specification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Specifies the path of The NVIDIA Container Tool Kit binary (nvidia-ctk) on the host, as it should appear in the the generated CDI specification. | |
- Specifies the path of The NVIDIA Container Toolkit CLI binary (nvidia-ctk) on the host. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "generated CDI specification" is probably an implementation detail that should not be relevant in this case.
node3 nvidia.com/gpu.clique 1fbed3a8-bd74-4c83-afcb-cfb75ebc9304.1 | ||
node4 nvidia.com/gpu.clique 1fbed3a8-bd74-4c83-afcb-cfb75ebc9304.1 | ||
|
||
The GPU DRA Driver adds a Clique ID to each GPU node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GPU DRA Driver does not add this label. This label is added by GPU Feature Discovery.
The NVIDIA GPU DRA Driver introduces a new custom resource called ComputeDomain, which creates a DRA ResourceClaimTemplate that you can reference in workloads. | ||
The ComputeDomain resources also creates a unique ResourceClaim for each worker that links it back to the ComputeDomain where the ResourceClaimTemplate is defined. | ||
|
||
If a subset of the nodes associated with a ComputeDomain are capable of communicating over IMEX, the NVIDIA Kubernetes DRA will set up a one-off IMEX domain to allow GPUs to communicate over their multi-node NVLink connections. Multiple IMEX domains will be created as necessary depending on the number (and availability) of nodes allocated to the ComputeDomain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: IMEX -> Mulit-Node NVLink (MNNVL)
- None | ||
|
||
* - ``numNodes`` (required) | ||
- Specifies the number of nodes in the IMEX domain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Specifies the number of nodes in the IMEX domain. | |
- Specifies the number of nodes in the Compute Domain. | |
This is a draft version of the dra driver documentation.
To be decided: