Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Operator 25.3.0 release docs #165

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

a-mccarthy
Copy link
Collaborator

No description provided.

Verified

This commit was signed with the committer’s verified signature.
a-mccarthy Abigail McCarthy
Signed-off-by: Abigail McCarthy <[email protected]>

Verified

This commit was signed with the committer’s verified signature.
a-mccarthy Abigail McCarthy
Signed-off-by: Abigail McCarthy <[email protected]>
Copy link

Documentation preview

https://nvidia.github.io/cloud-native-docs/review/pr-165

@a-mccarthy a-mccarthy marked this pull request as draft March 14, 2025 03:19

Verified

This commit was signed with the committer’s verified signature.
a-mccarthy Abigail McCarthy
Signed-off-by: Abigail McCarthy <[email protected]>
@a-mccarthy a-mccarthy marked this pull request as ready for review March 17, 2025 19:57

Verified

This commit was signed with the committer’s verified signature.
a-mccarthy Abigail McCarthy
Signed-off-by: Abigail McCarthy <[email protected]>

Verified

This commit was signed with the committer’s verified signature.
a-mccarthy Abigail McCarthy
Signed-off-by: Abigail McCarthy <[email protected]>

Verified

This commit was signed with the committer’s verified signature.
a-mccarthy Abigail McCarthy
Signed-off-by: Abigail McCarthy <[email protected]>
command-line argument for Helm.
In GPU Operator v25.3.0 and later, the ``driver.kernelModuleType`` default is ``auto``, for the supported driver versions.
This configuration allows the GPU Operator to choose the recommended driver kernel module type depending on the driver branch and the GPU devices available.
Newer driver versions will use an open kernel module by default, however to make sure you are using an open model, include ``--set driver.kernelModuleType=open`` command-line arugment in your Operator install command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Newer driver versions will use an open kernel module by default, however to make sure you are using an open model, include ``--set driver.kernelModuleType=open`` command-line arugment in your Operator install command.
Newer driver versions will use an open kernel module by default, however to make sure you are using the open kernel module, include ``--set driver.kernelModuleType=open`` command-line argument in your Operator helm install command.


* Improved security by removing unnecessary permissions in the GPU Operator ClusterRole.

* Improved GPU Operator metrics to include a `operatorMetricsNamespace` field that sets the metrcis namespace to `gpu_operator`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Improved GPU Operator metrics to include a `operatorMetricsNamespace` field that sets the metrcis namespace to `gpu_operator`.
* Improved GPU Operator metrics to include a `operatorMetricsNamespace` field that sets the metrics namespace to `gpu_operator`.

- Specifies the type of the NVIDIA GPU Kernel modules to use.
Valid values are ``auto`` (default), ``proprietary``, and ``open``.

``Auto`` means that the recommended kernel module type is chosen based on the GPU devices on the host and the driver branch used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do we want to include a note (like we do in the clusterpolicy table) about which driver container versions support auto?

@@ -138,11 +138,11 @@ To use DMA-BUF and network device drivers that are installed on the host:
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--version=${version} \
--set driver.useOpenKernelModules=true \
--set driver.kernelModuleType=open \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be consistent across the two install commands presented here in this section. Either both commands should specify --set driver.kernelModuleType=open or both commands should omit setting this field. I would be in favor of omitting this field as auto should take care of installing the open modules on supported systems. We can highlight that setting driver.kernelModuleType=open is only needed for older driver container versions where auto is not supported -- either in the text below or in a note.

I would be in favor of removing this field altogether since it is optional for users to configure it. The default setting, auto, should be sufficient as called out in the text.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you have already implemented my suggestion in the Installing the GPU Operator and Enabling GPUDirect Storage section :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the sample here to match the sample in the other section!

Valid values include:

* ``auto``: Default and recommended option. Use the default kernel module type (open or proprietary) based on the GPU Operator and driver containers used.
* ``open``: Use the NVIDIA Open GPU Kernel module driver.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Should Kernel be capitalized here?


* ``auto``: Default and recommended option. Use the default kernel module type (open or proprietary) based on the GPU Operator and driver containers used.
* ``open``: Use the NVIDIA Open GPU Kernel module driver.
* ``proprietary``: Use the NVIDIA Proprietary GPU Kernel module driver.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Should Kernel be capitalized here?

Verified

This commit was signed with the committer’s verified signature.
a-mccarthy Abigail McCarthy
Signed-off-by: Abigail McCarthy <[email protected]>

Verified

This commit was signed with the committer’s verified signature.
a-mccarthy Abigail McCarthy
Signed-off-by: Abigail McCarthy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants