-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Operator 25.3.0 release docs #165
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Abigail McCarthy <[email protected]>
Signed-off-by: Abigail McCarthy <[email protected]>
Documentation preview |
Signed-off-by: Abigail McCarthy <[email protected]>
Signed-off-by: Abigail McCarthy <[email protected]>
Signed-off-by: Abigail McCarthy <[email protected]>
Signed-off-by: Abigail McCarthy <[email protected]>
gpu-operator/gpu-operator-rdma.rst
Outdated
command-line argument for Helm. | ||
In GPU Operator v25.3.0 and later, the ``driver.kernelModuleType`` default is ``auto``, for the supported driver versions. | ||
This configuration allows the GPU Operator to choose the recommended driver kernel module type depending on the driver branch and the GPU devices available. | ||
Newer driver versions will use an open kernel module by default, however to make sure you are using an open model, include ``--set driver.kernelModuleType=open`` command-line arugment in your Operator install command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newer driver versions will use an open kernel module by default, however to make sure you are using an open model, include ``--set driver.kernelModuleType=open`` command-line arugment in your Operator install command. | |
Newer driver versions will use an open kernel module by default, however to make sure you are using the open kernel module, include ``--set driver.kernelModuleType=open`` command-line argument in your Operator helm install command. |
gpu-operator/release-notes.rst
Outdated
|
||
* Improved security by removing unnecessary permissions in the GPU Operator ClusterRole. | ||
|
||
* Improved GPU Operator metrics to include a `operatorMetricsNamespace` field that sets the metrcis namespace to `gpu_operator`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Improved GPU Operator metrics to include a `operatorMetricsNamespace` field that sets the metrcis namespace to `gpu_operator`. | |
* Improved GPU Operator metrics to include a `operatorMetricsNamespace` field that sets the metrics namespace to `gpu_operator`. |
- Specifies the type of the NVIDIA GPU Kernel modules to use. | ||
Valid values are ``auto`` (default), ``proprietary``, and ``open``. | ||
|
||
``Auto`` means that the recommended kernel module type is chosen based on the GPU devices on the host and the driver branch used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Do we want to include a note (like we do in the clusterpolicy table) about which driver container versions support auto
?
gpu-operator/gpu-operator-rdma.rst
Outdated
@@ -138,11 +138,11 @@ To use DMA-BUF and network device drivers that are installed on the host: | |||
-n gpu-operator --create-namespace \ | |||
nvidia/gpu-operator \ | |||
--version=${version} \ | |||
--set driver.useOpenKernelModules=true \ | |||
--set driver.kernelModuleType=open \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be consistent across the two install commands presented here in this section. Either both commands should specify --set driver.kernelModuleType=open
or both commands should omit setting this field. I would be in favor of omitting this field as auto
should take care of installing the open modules on supported systems. We can highlight that setting driver.kernelModuleType=open
is only needed for older driver container versions where auto
is not supported -- either in the text below or in a note.
I would be in favor of removing this field altogether since it is optional for users to configure it. The default setting, auto
, should be sufficient as called out in the text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you have already implemented my suggestion in the Installing the GPU Operator and Enabling GPUDirect Storage
section :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the sample here to match the sample in the other section!
Valid values include: | ||
|
||
* ``auto``: Default and recommended option. Use the default kernel module type (open or proprietary) based on the GPU Operator and driver containers used. | ||
* ``open``: Use the NVIDIA Open GPU Kernel module driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Should Kernel
be capitalized here?
|
||
* ``auto``: Default and recommended option. Use the default kernel module type (open or proprietary) based on the GPU Operator and driver containers used. | ||
* ``open``: Use the NVIDIA Open GPU Kernel module driver. | ||
* ``proprietary``: Use the NVIDIA Proprietary GPU Kernel module driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Should Kernel
be capitalized here?
Signed-off-by: Abigail McCarthy <[email protected]>
Signed-off-by: Abigail McCarthy <[email protected]>
No description provided.