Skip to content

Conversation

@omerap12
Copy link
Member

What type of PR is this?

/kind feature

What this PR does / why we need it:

Implements Helm-managed MutatingWebhookConfiguration with automatic TLS certificate generation using kube-webhook-certgen. This replaces the application's self-registration logic.

Which issue(s) this PR fixes:

Related to #8587

Special notes for your reviewer:

ingress-nginx is dead, so I’m not sure about the future of kube-webhook-certgen, which is part of the old nginx stack (registry.k8s.io/ingress-nginx/kube-webhook-certgen). Does that mean we should use it?

Right now, the hook simply creates a Secret containing all the certificates, CA bundles, and the mutating webhook configuration. To rotate or update those certificates, we would need just to add a Job that deletes the Secret before.
This is just an initial proposal, I’d like to hear other opinions on this setup.

Also, I have noticed some RBAC problems so I fixed those.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/needs-area labels Nov 29, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: omerap12

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 29, 2025
@k8s-ci-robot k8s-ci-robot added area/vertical-pod-autoscaler size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed do-not-merge/needs-area labels Nov 29, 2025
@omerap12
Copy link
Member Author

/cc @adrianmoisey

Copy link
Contributor

@iamzili iamzili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Omer, I checked out your branch and I'm having trouble getting the admission controller to work. I would expect the following command to deploy everything properly:

helm upgrade vpa \
   /home/zili/Repos/autoscaler/vertical-pod-autoscaler/charts/vertical-pod-autoscaler \
   --install \
   --version 0.6.0 \
   --namespace vpa

What I have found so far is that the following settings seem to be incorrect:

  1. The --webhook-service value in the admission controller's Deployment.
  2. The service name in the MutatingWebhookConfiguration object.

I'm also seeing an error in the admission controller when it attempts to perform actuation:

2025/12/04 13:37:32 http: TLS handshake error from 10.244.0.1:25304: remote error: tls: bad certificate

metadata:
name: {{ include "vertical-pod-autoscaler.admissionController.certGen.fullname" . }}
annotations:
"helm.sh/hook": pre-install,pre-upgrade,post-install,post-upgrade
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a specific reason why this resource (and I see the same pattern in several others, such as ClusterRoleBinding, Role, RoleBinding) defines both pre and post hooks? I think pre-install,pre-upgrade would be sufficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, you’re right - pre-install and pre-upgrade are enough, but it really depends on how we want to handle things going forward. As mentioned above, the hook currently creates a Secret containing all certificates, CA bundles, and the mutating webhook configuration. Because this Secret already exists during an upgrade, kube-webhook-certgen will not rotate the certificate values.

To address this, we may want to add post-install and post-upgrade hooks to delete the Secret, ensuring that on the next upgrade kube-webhook-certgen generates a fresh one.

Copy link
Contributor

@iamzili iamzili Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But using post-install and post-upgrade hooks to delete the Secret is still not a good idea, I assume because it would mean that after a Helm upgrade or install, there would be no Secret object in the cluster as Helm executes post-install and post-upgrade hooks after all non-hook resources have been deployed to the cluster.

If we want to delete the certificate before running a Helm upgrade or install, then we need to do it in pre-install and pre-upgrade hooks, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more thing: kube-webhook-certgen generates certificates that expire after 100 years, so I assume we don't need to rotate them. We could just ignore the "secret already exists" log when kube-webhook-certgen create runs.

@iamzili
Copy link
Contributor

iamzili commented Dec 4, 2025

Also I'm not sure if we need to bump the chart version every time at this stage of the chart's development here:

vertical-pod-autoscaler/charts/vertical-pod-autoscaler/Chart.yaml

@omerap12
Copy link
Member Author

omerap12 commented Dec 4, 2025

Hey Omer, I checked out your branch and I'm having trouble getting the admission controller to work. I would expect the following command to deploy everything properly:

helm upgrade vpa \
   /home/zili/Repos/autoscaler/vertical-pod-autoscaler/charts/vertical-pod-autoscaler \
   --install \
   --version 0.6.0 \
   --namespace vpa

What I have found so far is that the following settings seem to be incorrect:

  1. The --webhook-service value in the admission controller's Deployment.
  2. The service name in the MutatingWebhookConfiguration object.

I'm also seeing an error in the admission controller when it attempts to perform actuation:

2025/12/04 13:37:32 http: TLS handshake error from 10.244.0.1:25304: remote error: tls: bad certificate

Thanks for review! I'll check that out

@omerap12
Copy link
Member Author

omerap12 commented Dec 4, 2025

Also I'm not sure if we need to bump the chart version every time at this stage of the chart's development here:

vertical-pod-autoscaler/charts/vertical-pod-autoscaler/Chart.yaml

Yeah, we already discussed it. if I remember correctly we have to do it because of the pre-commit.
@adrianmoisey might remember.

@adrianmoisey
Copy link
Member

Yeah, we already discussed it. if I remember correctly we have to do it because of the pre-commit.
@adrianmoisey might remember.

I can't remember, your memory may be correct, since cluster-autoscaler does it.

However, I think that may be broken now?

webhooks:
- admissionReviewVersions:
- v1
clientConfig:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this config needs to be populated with the caCert

@iamzili
Copy link
Contributor

iamzili commented Dec 4, 2025

Let me add my thoughts regarding kube-webhook-certgen:

  1. Personally I think we can start using kube-webhook-certgen and see if it fully meets our needs (I believe it will). Since the project is becoming unmaintained soon, let's keep an eye on whether anyone decides to fork it and start maintaining it. Side note: based on the commit history, it seems the previous maintainers mostly performed Go version bumps.
  2. I would prefer not to roll out our own solution for creating and renewing self-signed certificates like Kyverno does:
  3. I also checked how the folks at Gatekeeper handle certificate management, and they use the github.com/open-policy-agent/cert-controller library (which btw KEDA also uses).

@omerap12
Copy link
Member Author

omerap12 commented Dec 5, 2025

I would prefer not to roll out our own solution for creating and renewing self-signed certificates like Kyverno does:

I agree that we shouldn’t build our own mechanism for generating and renewing self-signed certificates.
KEDA’s (and similar) approaches rely on code-based solutions, but IMHO the better approach is to avoid handling this in code and instead delegate it to Helm, which can manage it for us - similar to what I attempted in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants