Skip to content

Conversation

iamzili
Copy link
Contributor

@iamzili iamzili commented Sep 29, 2025

What type of PR is this?

/kind documentation
/kind feature
/area vertical-pod-autoscaler

What this PR does / why we need it:

Autoscaling Enhancement Proposal (AEP) for pod-level resources support in VPA.

Related ticket from which this AEP originated: Issue

More details about pod-level resources can be found here:

I'd love to hear your thoughts on this feature.

@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/feature Categorizes issue or PR as related to a new feature. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 29, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: iamzili
Once this PR has been reviewed and has the lgtm label, please assign adrianmoisey for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 29, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @iamzili. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 29, 2025

## Summary

Starting with Kubernetes version 1.34, it is now possible to specify CPU and memory `resources` for Pods at the pod level in addition to the existing container-level `resources` specifications. For example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth linking the KEP here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm linking the KEP and the official blog post a little further down: here


This section describes how VPA reacts based on where resources are defined (pod level, container level or both).

Before this KEP, the recommender computes recommendations only at the container level, and VPA applies changes only to container-level fields. With this proposal, the recommender also computes pod-level recommendations in addition to container-level ones. Pod-level recommendations are derived from per-container usage and recommendations, typically by aggregating container recommendations. Container-level policy still influences pod-level output: setting `mode: Off` in `spec.resourcePolicy.containerPolicies` excludes a container from recommendations, and `minAllowed`/`maxAllowed` bounds continue to apply.
Copy link
Member

@adrianmoisey adrianmoisey Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to sanity check this a little.

typically by aggregating container recommendations

From what I can tell, the metric that metric-server provides is per-container.

So the idea is to leave the recommender as is, making per-container recommendations based on its per-container metric, and let the updater/admission-controller use an aggregated value for the Pod resources.

Is my understanding here right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially, since the recommender will calculate the pod-level recommendations (from your comment, it seems that the updater/admission controller would do that). My plan is to continue relying on the current approach for collecting and aggregating container-level metrics, as well as for generating per-container recommendations.

The difference introduced by this AEP is that if a pod-level resources stanza is defined at the workload API level, the recommender will also calculate pod-level recommendations, which are simply the sum of the container recommendations. The pod-level recommendations will be stored in the status.recommendation.podRecommendation stanza of the VPA object (new!).

The updater and the admission controller will read from status.recommendation.podRecommendation (and of course from status.recommendation.containerRecommendations) to perform their actions - the updater will evict pods or perform in-place container-level updates, while the admission controller will modify pod specs on the fly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially, since the recommender will calculate the pod-level recommendations (from your comment, it seems that the updater/admission controller would do that).

I was just making an assumption. If I'm hearing you right, you want the recommender to create the pod recommendation, and store it in the VPA resource, which makes more sense than my assumption.

The difference introduced by this AEP is that if a pod-level resources stanza is defined at the workload API level, the recommender will also calculate pod-level recommendations, which are simply the sum of the container recommendations. The pod-level recommendations will be stored in the status.recommendation.podRecommendation stanza of the VPA object (new!).

The updater and the admission controller will read from status.recommendation.podRecommendation (and of course from status.recommendation.containerRecommendations) to perform their actions - the updater will evict pods or perform in-place container-level updates, while the admission controller will modify pod specs on the fly.

Makes sense!

Comment on lines +201 to +207
- Extend the VPA object:
1. Add a new `spec.resourcePolicy.podPolicies` stanza. This stanza is user-modifiable and allows setting constraints for pod-level recommendations:
- `controlledResources`: Specifies which resource types are recommended (and possibly applied). Valid values are `cpu`, `memory`, or both. If not specified, both resource types are controlled by VPA.
- `controlledValues`: Specifies which resource values are controlled. Valid values are `RequestsAndLimits` and `RequestsOnly`. The default is `RequestsAndLimits`.
- `minAllowed`: Specifies the minimum resources that will be recommended for the Pod. The default is no minimum.
- `maxAllowed`: Specifies the maximum resources that will be recommended for the Pod. The default is no maximum. To ensure per-container recommendations do not exceed the Pod's defined maximum, apply the formula to adjust the recommendations for containers proposed by @omerap12 (see [discussion](https://github.com/kubernetes/autoscaler/issues/7147#issuecomment-2515296024)). This field takes precedence over the global Pod maximum set by the new flags (see "Global Pod maximums").
2. Add a new `status.recommendation.podRecommendation` stanza. This field is not user-modifiable, it is populated by the VPA recommender and stores the Pod-level recommendations. The updater and admission controller use this stanza to read Pod-level recommendations. The updater may evict Pods to apply the recommendation, the admission controller applies the recommendation when the Pod is recreated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to have an example Go Type here?


## Proposal

- Add a new feature flag named `PodLevelResources`. Because this proposal introduces new code paths across all three VPA components, this flag will be added to each component.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a feature flag to assist with GAing the feature, or is it a flag to enable/disable the feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention is to use the flag to enable or disable the feature. In other words, the feature should be disabled by default at first, and once the feature matures, it can be enabled by default starting from a specific VPA version.

Could you please clarify what you mean by using the flag for GAing the feature?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The normal pattern for Kubernetes is to use a feature gate to introduce a new feature. Normally it works like this across many releases:

  1. First release - add a feature gate as alpha - defaulted to off
  2. Second release - promote to beta - default to on
  3. Third release - promote to GA - locked to on
  4. A few releases later (3 I think) - remove feature gate logic completely

This is mostly for the kubernetes components to handle roll forward/back gracefully.
I think the main thing it protects is if a user starts using the feature in the beta mode, if they roll back 1 release, that feature would continue to work (ie: the APIs would be valid) since the logic exists in the alpha mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation - I appreciate it! Based on your comment, the feature flags (there will be a new one for each component) will serve both purposes, i.e. GAing and enabling/disabling the feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the point of feature gates in Kubernetes is to eventually remove them. enabling/disabling the feature should be driven by the API


For workloads that define only pod-level resources, VPA will control resources at the pod level. At the time of writing, [in-place pod-level resource resizing](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize) is not available for pod-level fields, so applying pod-level recommendations requires evicting Pods.

When [in-place pod-level resource resizing](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize) becomes available, VPA should attempt to apply pod-level recommendations in place first and fall back to eviction if in-place updates fail, mirroring the current `InPlaceOrRecreate` behavior used for container-level updates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this AEP has a dependency on the functionality described in https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize, can we restate the language as if https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize is already implemented, and then add a note that we won't approve this AEP until post-1.35 (when in-place resizing of pod-level resources has been implemented)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that this AEP isn't dependant on that feature, it's calling out that we can't do in-place resizing until that KEP is ready

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this section, there is no connection between the current AEP and the in-place feature.
This AEP should focus on pod level resources only.

Copy link
Contributor Author

@iamzili iamzili Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as @omerap12 suggested, I removed the parts mentioning the In-Place Pod-Level Resources Resize, and kept only a note stating that we should leverage it once it becomes available

Feel free to resolve the conversation if applicable.

Copy link
Member

@omerap12 omerap12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really thanks for the hard work here Erik!
To my opinion I think we should choose option 2 as the default (control both pod-level and initially set container-level resources) here.
I left couple of notes throughout the proposal.
Can we please remove the in-place feature from this AEP?
This AEP should focus only on pod-level resource so cons like "Applying both pod-level and container-level recommendations requires eviction because is not yet available" are redundant.

- `controlledResources`: Specifies which resource types are recommended (and possibly applied). Valid values are `cpu`, `memory`, or both. If not specified, both resource types are controlled by VPA.
- `controlledValues`: Specifies which resource values are controlled. Valid values are `RequestsAndLimits` and `RequestsOnly`. The default is `RequestsAndLimits`.
- `minAllowed`: Specifies the minimum resources that will be recommended for the Pod. The default is no minimum.
- `maxAllowed`: Specifies the maximum resources that will be recommended for the Pod. The default is no maximum. To ensure per-container recommendations do not exceed the Pod's defined maximum, apply the formula to adjust the recommendations for containers proposed by @omerap12 (see [discussion](https://github.com/kubernetes/autoscaler/issues/7147#issuecomment-2515296024)). This field takes precedence over the global Pod maximum set by the new flags (see "Global Pod maximums").
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that! (I forgot I wrote that TBH ) :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My formula should be correct, but what happens if after the normalization of the container[i] resources we get a value which is little/bigger than the minAllowed/maxAllowed?
I thought we can do something like that:

- If adjusted[i] < container.minAllowed[i]: set to minAllowed[i]
- If adjusted[i] > container.maxAllowed[i]: set to maxAllowed[i]

And then we need to re-check pod limits after container policy adjustments ( since it might be bigger ).
If we are still exceeding pod limits - what we wanna do here?
cc @adrianmoisey

Sorry if I wasn't clear enough :)

Copy link
Contributor Author

@iamzili iamzili Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An individual container limit can't be larger than the pod-level limit, but the aggregated container-level limits can exceed the pod-level limit - Ref.

So, when a new pod-level recommendation is calculated and the limit is set proportionally at the pod level, we also need to check the container-level limits. If a container-level limit is greater than the pod-level limit, it should be set to the same value as the pod-level limit, and the calculated container-level recommendation should be reduced proportionally as well to maintain the original request to limit ratio (similar to how it works when a LimitRange API object is in place).


### Test Plan

TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order for this AEP to merged this has to be filled ( I know it's a WIP but just a remainder ) :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

address the missing test plan

Co-authored-by: Adrian Moisey <[email protected]>
@iamzili
Copy link
Contributor Author

iamzili commented Oct 7, 2025

Really thanks for the hard work here Erik! To my opinion I think we should choose option 2 as the default (control both pod-level and initially set container-level resources) here. I left couple of notes throughout the proposal. Can we please remove the in-place feature from this AEP? This AEP should focus only on pod-level resource so cons like "Applying both pod-level and container-level recommendations requires eviction because is not yet available" are redundant.

I would also prefer option 2 (control both pod-level and initially set container-level resources). BTW when do you think a decision will be made to go with this option? We will need to update the AEP to reflect the chosen approach. Once the decision is final, I also plan to add more details.

Furthermore, why are you suggesting that the pod-level resources in-place resize related parts should be removed from this AEP? Since this AEP focuses on the pod-level resources stanza, how it can be mutated (or not) seems relevant from the VPA's perspective

@omerap12
Copy link
Member

omerap12 commented Oct 8, 2025

Really thanks for the hard work here Erik! To my opinion I think we should choose option 2 as the default (control both pod-level and initially set container-level resources) here. I left couple of notes throughout the proposal. Can we please remove the in-place feature from this AEP? This AEP should focus only on pod-level resource so cons like "Applying both pod-level and container-level recommendations requires eviction because is not yet available" are redundant.

I would also prefer option 2 (control both pod-level and initially set container-level resources). BTW when do you think a decision will be made to go with this option? We will need to update the AEP to reflect the chosen approach. Once the decision is final, I also plan to add more details.

Furthermore, why are you suggesting that the pod-level resources in-place resize related parts should be removed from this AEP? Since this AEP focuses on the pod-level resources stanza, how it can be mutated (or not) seems relevant from the VPA's perspective

I see your point - it’s not completely independent, since once Kubernetes supports in-place updates for pod-level resources, the VPA will likely extend that support as well (similar to what we already do for container-level in-place updates).

But, the main scope of this AEP is to define how we provide recommendations for pod-level resources. The actual application of those recommendations - whether in-place or through eviction - is more of an implementation detail and doesn’t directly affect the design decisions in this proposal.
We can add a short note about the current state of in-place updates for pod-level resources (KEP-5419
) and mention that future VPA enhancements will align once that functionality is available.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 10, 2025
@jackfrancis
Copy link
Contributor

/release-note-none

@k8s-ci-robot
Copy link
Contributor

@jackfrancis: you can only set the release note label to release-note-none if the release-note block in the PR body text is empty or "none".

In response to this:

/release-note-none

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jackfrancis
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 10, 2025
Co-authored-by: Adrian Moisey <[email protected]>

This section describes how VPA reacts based on where resources are defined (pod level, container level or both).

Before this AEP, the recommender computes recommendations only at the container level, and VPA applies changes only to container-level fields. With this proposal, the recommender also computes pod-level recommendations in addition to container-level ones. Pod-level recommendations are derived from per-container usage and recommendations, typically by aggregating container recommendations. Container-level policy still influences pod-level output: setting `mode: Off` in `spec.resourcePolicy.containerPolicies` excludes a container from recommendations, and `minAllowed`/`maxAllowed` bounds continue to apply.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions on this part:

  1. Do we include or exclude sidecar containers in this? Currently VPA doesn't handle sidecar containers
  2. What happens if a new container is added to a Pod, what values will the recommender set for the Pod?
  3. What happens if a container is removed from a Pod, does its recommendation still get included in the Pod level recommendation?

There is some ongoing work for points 2 and 3 here: #6745

cc @jkyros

Copy link
Contributor Author

@iamzili iamzili Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Since the latest VPA doesn't support initContainers, I think we shouldn't implement support for them in this AEP (a new AEP should be created for that IMHO). At the same time, other non-native sidecar containers defined in the Pod spec should be included by default in the calculation of pod-level recommendations when a pod-level resources stanza is present.

  2. If a new container is added to the Pod, then in the next recommender loop, recommendations will be calculated for the new container, which will trigger a recalculation at the pod level (default behavior). The updater may then evict the Pod if the new pod-level recommendation deviates significantly from the current one.

  3. if a container is removed from a Pod, the next recommender loop should calculate a new pod-level recommendation. However based on #6745, it seems this requires some additional work, as in the latest VPA version the recommender continues to use stale container aggregates for a period of time even after a container is removed, is that correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yup, makes sense. I think the AEP needs to be clear about what or what isn't included in the Pod calculation.
  2. There seems to be a chicken and egg situation though. When the new container is added to a Deployment, new Pods will be created prior to the new recommendation. What value will the Pod resource be getting here?
  3. Yes, possibly. This makes me wonder If the Pod level recommendation shouldn't be done in the VPA resource, but rather at the moment any recommendation needs to be applied, the Pod recommendation is created on the fly, using the recommendations available in the VPA resource along with the current containers in said Pod.

For what it's worth, I believe all these points need to be documented on the AEP too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that these points need to be added to the AEP. I will add them after we find the best approach of course.

I kind of like your approach in the 3rd point, where you mention that the Pod recommendation might be created on the fly instead of by the recommender (that is, saved to the VPA object). Your approach would make things easier when the user adds or removes a container from a workload API object such as a Deployment.

Here I propose the workflow when the user adds a container to the Deployment with container-level resources stanza (both with requests and limits):

  1. A Pod re-creation is triggered by the Deployment controller.
  2. The admission-controller intercepts the request and calculates the pod-level recommendations by adding the container-level recommendations. The container recommendations are still read from the VPA object. It also includes the container-level requests and limits set by the user for the new container in the calculation to avoid violating the PodSpec Validation Rules. Then the admission controller calculates the patches and applies them. The Pod is started successfully with the new container.
  3. In the following recommender loop, the recommender component re-calculates the container-level recommendations (including for the new container) and saves the results to the VPA object.
  4. The updater fetches the container recommendations from the VPA object and calculates the pod-level recommendations on the fly by summing the container-level ones. If they differ significantly from the actual resource allocation, it may evict the Pod.
  5. Same as step 2, but "just" using the container recommendations from the VPA object, since no new container was added by the user.

When the user adds a new container WITHOUT a container-level resources stanza:

  1. A Pod re-creation is triggered by the Deployment controller.
  2. The admission-controller calculates the pod-level recommendations based on the container recommendations specified in the VPA object. Patches are then applied accordingly. If the Pod already contains the most up-to-date recommendations from the VPA object, there will be no change, as at this point we do not yet know the recommendation for the newly added container.

Same as steps 3, 4 and 5 above.

I'm not entirely sure what we can do when a container is removed from a Pod, as this results in stale recommendations in the VPA object for some amount of time. That in turn cause the admission-controller to recreate the Pod with a non-existing container. Maybe in this AEP we could modify the updater and the admission-controller so they verify that containers still exist in the workload API? Or of course we could wait until #6745 is completed.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants