Skip to content

Conversation

aryangorwade
Copy link
Collaborator

Helm

The operator now generates validatingwebhookconfiguration in the operator code itself (in cmd/main.go). Also detects orchestrator type in main.go. validatingwebhookconfiguration removed from Helm charts.

internal/webhook/apps/v1alpha1/configuration.go contains this generation.

In deployments/helm/k8s-nim-operator/templates/manager-rbac.yaml added create and update to RBAC.

OLM

Validatingwebhookconfiguration (called webhookdefinitions in OLM) removed from CSV and moved to the operator code (same as above). The non-Helm specific changes (changes not in deployments are common to OLM as well (including new ENV variables).

In OLM, "OLM manages the webhook” means the CSV must include webhookdefinitions. If you remove them and generate the ValidatingWebhookConfiguration in your operator, OLM stops managing the webhook (including cert volume injection). In short, OLM manages the webhook lifecycle (including cert injection) only if webhookdefinitions are present in the CSV.

To work around this:

  • Removed webhookdefinitions from the CSV.
  • Added RBAC for validatingwebhookconfigurations in the CSV (add new apiGroup)
  • Added OpenShift inject CA annotation in webhook: service.beta.openshift.io/inject-cabundle": "true"
  • Created new file in bundle/manifests to create certificate and key: bundle/manifests/k8s-nim-operator.webhookservice.yaml with an OpenShift Service annotation: service.beta.openshift.io/serving-cert-secret-name: k8s-nim-operator-webhook-server-cert
  • Added the cert under manager container's volumeMounts and under the deployment's volumes.

Copy link

copy-pr-bot bot commented Sep 9, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@aryangorwade
Copy link
Collaborator Author

Continuation of discussion:

@varunrsekar

Thanks for the change @aryangorwade! One comment: Can we add another option to generate self-signed certs as part of an init-container or application startup? This will help eliminate any dependencies for easy-install.

@aryangorwade

@varunrsekar It will most likely not used in production or enterprise customers, but would help with individual customers to avoid having to generate certs and create an additional secret with them. @shivamerla is looking into its use case

@varunrsekar

Thanks. To add to the usecase to have an init-container, we can also use it to validate the input cert when using a SECRET

// EnsureValidatingWebhook creates or updates the ValidatingWebhookConfiguration
// that used to be templated by Helm. It is a best-effort reconciliation and
// returns an error only when we cannot make the desired state match the spec.
func EnsureValidatingWebhook(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be done using the renderAndSyncResource pattern we follow in the controllers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VWC is cluster-scoped. Kubernetes doesn't allow setting a namespaced owner as controller of a cluster-scoped object. The current helper calls SetControllerReference, which would fail in that case.

	if err = controllerutil.SetControllerReference(nimService, resource, r.GetScheme()); err != nil {
		logger.Error(err, "failed to set owner", conditionType, namespacedName)
		statusError := r.updater.SetConditionsFailed(ctx, nimService, reason, err.Error())
		if statusError != nil {
			logger.Error(statusError, "failed to update status", "nimservice", nimService.Name)
		}
		return err
	}

To use this exact pattern for the cluster scoped vwc we'd have to either:

  • Add a variant that skips SetControllerReference (and handle cleanup via explicit delete/finalizer), or
  • Manage the webhook from an operator-owned, cluster-scoped controller rather than from a namespaced CR

Copy link
Collaborator

@visheshtanksale visheshtanksale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add code to call EnsureValidatingWebhook

@aryangorwade aryangorwade force-pushed the wvc-in-operator-code branch 4 times, most recently from fb6af6d to 4d07047 Compare September 10, 2025 19:43
@aryangorwade
Copy link
Collaborator Author

Fixed. main.go how calls code to EnsureValidatingWebhook

@shivamerla shivamerla marked this pull request as draft October 10, 2025 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants