-
Notifications
You must be signed in to change notification settings - Fork 4.3k
AEP-8818: InPlace Update Mode #8818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
10a2270
29e7843
c6d27c0
26456cb
e76adef
19e34c0
f46365c
13f1fa7
bf77412
a065f33
ba9514a
33a9603
8c0facd
1fa38e4
3e5bd3a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,256 @@ | ||
| # AEP-8818: Non-Disruptive In-Place Updates in VPA | ||
|
|
||
| <!-- toc --> | ||
| - [Summary](#summary) | ||
| - [Motivation](#motivation) | ||
| - [Goals](#goals) | ||
| - [Non-Goals](#non-goals) | ||
| - [Proposal](#proposal) | ||
| - [Design Details](#design-details) | ||
| - [Resize Status Handling](#resize-status-handling) | ||
| - [Risk Mitigation](#risk-mitigation) | ||
| - [Memory Limit Downsize Risk](#memory-limit-downsize-risk) | ||
| - [Mitigation Strategies](#mitigation-strategies) | ||
| - [Test Plan](#test-plan) | ||
| - [Graduation Criteria](#graduation-criteria) | ||
| - [Alpha](#alpha) | ||
| - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) | ||
| - [Upgrade](#upgrade) | ||
| - [Downgrade](#downgrade) | ||
| - [Feature Enablement and Rollback](#feature-enablement-and-rollback) | ||
| - [How can this feature be enabled / disabled in a live cluster?](#how-can-this-feature-be-enabled--disabled-in-a-live-cluster) | ||
| - [Kubernetes version compatibility](#kubernetes-version-compatibility) | ||
| - [Implementation History](#implementation-history) | ||
| <!-- /toc --> | ||
|
|
||
| ## Summary | ||
|
|
||
| [AEP-4016](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler/enhancements/4016-in-place-updates-support) introduced the `InPlaceOrRecreate` update mode which attempts in-place updates first but falls back to pod eviction if the in-place update fails. However, for certain workloads, any disruption is unacceptable, and users would prefer to retry in-place updates indefinitely rather than evict and recreate pods. | ||
|
|
||
| This proposal introduces a new update mode that only attempts in-place updates and retries on failure without ever falling back to eviction. | ||
|
|
||
| ## Motivation | ||
|
|
||
| There are several use cases where pod disruption should be avoided at all costs: | ||
|
|
||
| - Stateful workloads: Pods managing critical state where restart would cause data loss or lengthy recovery. | ||
| - Long-running computations: Jobs or services performing computations that cannot be checkpointed and would need to restart from the beginning. | ||
| - Strict SLO requirements: Services with stringent availability requirements where even brief disruptions are unacceptable. | ||
| In these scenarios, users would prefer: | ||
|
|
||
| - To operate with current (potentially suboptimal) resource allocations until an in-place update becomes feasible | ||
| - To receive clear signals when updates cannot be applied | ||
| - To have VPA continuously retry updates as cluster conditions change | ||
|
|
||
| ## Goals | ||
|
|
||
| - Provide a truly non-disruptive VPA update mode that never evicts pods | ||
| - Allow VPA to eventually apply updates when cluster conditions improve | ||
| - Respect the existing in-place update infrastructure from AEP-4016 | ||
|
|
||
| ## Non-Goals | ||
|
|
||
| - Guarantee that all updates will eventually succeed (node capacity constraints may prevent this) | ||
| - Provide mechanisms to automatically increase node capacity to accommodate updates | ||
| - Change the behavior of existing update modes (Off, Recreate, InPlaceOrRecreate) | ||
| - Eliminate all possible disruption scenarios (see [Risk Mitigation](#risk-mitigation) for details on memory limit downsizing risks) | ||
|
|
||
|
|
||
| ## Proposal | ||
|
|
||
| Add a new supported value of UpdateMode: `InPlace` | ||
| This mode will: | ||
| - Apply recommendations during pod admission (like all other modes) | ||
| - Attempt in-place updates for running pods under the same conditions as `InPlaceOrRecreate` | ||
| - Never add pods to `podsForEviction` if in-place updates fail | ||
| - Continuously retry failed in-place update | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we have a backoff policy for retrying, or do we think linear retry is sufficient if we keep failing?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have another idea - let me know what you think. The main drawback is that the updater now has to remember which recommendation was last applied for each pod, which means some extra memory use and more bookkeeping in the code.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that makes sense what you are proposing but I think the existing resize conditions will tell us this information without the extra bookkeeping: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources#resize-status I think this already exists in the vpa code -> with With
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Exactly, with deferred we will just skip that pod do nothing ( the kubelet will do the hard job for us ).
Agree. So to sum up:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Regarding both of these cases, what happens if the recommendation changes? (Omer already mentioned this earlier in the thread). Should the updater check if recommendations != spec.resources, and if they aren't the same, resize again? It's possible that the new recommendation could be smaller, allowing for the pod to be resized.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should, since we don’t know how long the pod will remain in deferred mode and we don’t want to miss recommendations.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That makes sense +1 |
||
|
|
||
| ## Design Details | ||
|
|
||
| Add `UpdateModeInPlace` to the VPA types: | ||
|
|
||
| ```golang | ||
| // In pkg/apis/autoscaling.k8s.io/v1/types.go | ||
| const ( | ||
| // ... existing modes ... | ||
| // UpdateModeInPlace means that VPA will only attempt to update pods in-place | ||
| // and will never evict them. If in-place update fails, VPA will rely on | ||
| // Kubelet's automatic retry mechanism. | ||
| UpdateModeInPlace UpdateMode = "InPlace" | ||
| ) | ||
| ``` | ||
|
|
||
| ### Resize Status Handling | ||
|
|
||
| The `InPlace` mode handles different resize statuses with distinct behaviors. Critically, before checking the resize status, VPA first compares the current recommendation against the pod's spec.resources. If they differ, VPA attempts to apply the new recommendation regardless of the current resize status, as the new recommendation may be feasible (e.g., a smaller resource request that fits on the node). | ||
|
|
||
| - `Deferred`: When the resize status is Deferred and the recommendation matches spec, VPA waits and lets kubelet handle it. This means kubelet is waiting to apply the resize, and VPA should not interfere. | ||
| - `Infeasible`: When the resize status is Infeasible and the recommendation matches spec, VPA defers action. The node cannot accommodate the current resize, but if the recommendation changes, VPA will attempt the new resize. | ||
| - `InProgress`: When the resize status is InProgress and the recommendation matches spec, VPA waits for completion. The resize is actively being applied by kubelet. | ||
| - `Error`: When the resize status is Error, VPA retries the operation. An error occurred during resize and retrying may succeed. | ||
|
|
||
|
|
||
| Modify the `CanInPlaceUpdate` to accommodate the new update mode: | ||
|
|
||
| ```golang | ||
| // CanInPlaceUpdate checks if pod can be safely updated | ||
| func (ip *PodsInPlaceRestrictionImpl) CanInPlaceUpdate(pod *apiv1.Pod, updateMode vpa_types.UpdateMode) utils.InPlaceDecision { | ||
| switch updateMode { | ||
| case vpa_types.UpdateModeInPlaceOrRecreate: | ||
| if !features.Enabled(features.InPlaceOrRecreate) { | ||
| return utils.InPlaceEvict | ||
| } | ||
| case vpa_types.UpdateModeInPlace: | ||
| if !features.Enabled(features.InPlace) { | ||
| return utils.InPlaceDeferred | ||
| } | ||
| case vpa_types.UpdateModeAuto: | ||
| // Auto mode is deprecated but still supports in-place updates | ||
| // when the feature gate is enabled | ||
| if !features.Enabled(features.InPlaceOrRecreate) { | ||
| return utils.InPlaceEvict | ||
| } | ||
| default: | ||
| // UpdateModeOff, UpdateModeInitial, UpdateModeRecreate, etc. | ||
| return utils.InPlaceEvict | ||
| } | ||
|
|
||
| cr, present := ip.podToReplicaCreatorMap[getPodID(pod)] | ||
| if present { | ||
| singleGroupStats, present := ip.creatorToSingleGroupStatsMap[cr] | ||
| if pod.Status.Phase == apiv1.PodPending { | ||
| return utils.InPlaceDeferred | ||
| } | ||
| if present { | ||
| if isInPlaceUpdating(pod) { | ||
| resizeStatus := getResizeStatus(pod) | ||
| // For InPlace mode: wait for Deferred, retry for Infeasible (no backoff for alpha) | ||
| if updateMode == vpa_types.UpdateModeInPlace { | ||
| switch resizeStatus { | ||
| case utils.ResizeStatusInfeasible: | ||
| // Infeasible means node can't accommodate the resize. | ||
| // For alpha, retry with no backoff. | ||
| klog.V(4).InfoS("In-place update infeasible, will retry", "pod", klog.KObj(pod)) | ||
| return utils.InPlaceInfeasible | ||
| case utils.ResizeStatusDeferred: | ||
| // Deferred means kubelet is waiting to apply the resize. | ||
| // Do nothing, wait for kubelet to proceed. | ||
| klog.V(4).InfoS("In-place update deferred by kubelet, waiting", "pod", klog.KObj(pod)) | ||
| return utils.InPlaceDeferred | ||
| case utils.ResizeStatusInProgress: | ||
| // Resize is actively being applied, wait for completion. | ||
| klog.V(4).InfoS("In-place update in progress, waiting for completion", "pod", klog.KObj(pod)) | ||
| return utils.InPlaceDeferred | ||
| case utils.ResizeStatusError: | ||
| // Error during resize, retry | ||
| klog.V(4).InfoS("In-place update error, will retry", "pod", klog.KObj(pod)) | ||
| return utils.InPlaceInfeasible | ||
| default: | ||
| klog.V(4).InfoS("In-place update status unknown, waiting", "pod", klog.KObj(pod), "status", resizeStatus) | ||
| return utils.InPlaceDeferred | ||
| } | ||
| } | ||
| // For InPlaceOrRecreate mode, check timeout | ||
| canEvict := CanEvictInPlacingPod(pod, singleGroupStats, ip.lastInPlaceAttemptTimeMap, ip.clock) | ||
| if canEvict { | ||
| return utils.InPlaceEvict | ||
| } | ||
| return utils.InPlaceDeferred | ||
| } | ||
| if singleGroupStats.isPodDisruptable() { | ||
| return utils.InPlaceApproved | ||
| } | ||
| } | ||
| } | ||
| klog.V(4).InfoS("Can't in-place update pod, but not falling back to eviction. Waiting for next loop", "pod", klog.KObj(pod)) | ||
| return utils.InPlaceDeferred | ||
| } | ||
| ``` | ||
|
|
||
| ### Behavior when Feature Gate is Disabled | ||
|
|
||
| - When `InPlace` feature gate is disabled and a VPA is configured with `UpdateMode: InPlace`, the updater will skip processing that VPA entirely (not fall back to eviction). | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just want to check: it won't evict and it won't in-place update? Also, what does the admission-controller do when the feature gate is disabled but a pod is set to InPlace?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The admission controller will deny the the request ref
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That’s what I assumed, because if someone wants to use in-place mode only, it likely means the workload can’t be evicted. In that case, I think the correct action is to do nothing.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, what if someone does this:
Does the admission-controller:
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TBH I didn't test it. but it should be 1
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just checked, we set the resources as per the recommendation.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool, that's worth clarifying here
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done in: 13f1fa7 |
||
| - In contrast, `InPlaceOrRecreate` with its feature gate disabled will fall back to eviction mode. | ||
|
|
||
| This design ensures that `InPlace` mode truly guarantees no evictions, even in misconfiguration scenarios. | ||
|
|
||
| ## Risk Mitigation | ||
|
|
||
| ### Memory Limit Downsize Risk | ||
|
|
||
| While `InPlace` mode prevents pod eviction and eliminates the disruption associated with pod recreation, it is still subject to the behavior of Kubernetes' `InPlacePodVerticalScaling` feature. | ||
| When a memory limit is decreased in-place, there is a small but non-zero risk of `OOMKill` if the container's current memory usage exceeds the new lower limit at the moment the resize is applied. | ||
| This is an inherent limitation of in-place resource updates documented in [KEP-1287](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/1287-in-place-update-pod-resources/README.md), not a VPA-specific behavior. | ||
| This risk may be unacceptable for workloads with strict SLO requirements where even brief disruptions (including `OOMKills`) cannot be tolerated. | ||
|
|
||
| ### Mitigation Strategies | ||
|
|
||
| For workloads where even unintended OOMKills are unacceptable, users should implement one or more of the following strategies: | ||
|
|
||
| - Disable memory limits for critical containers - configure your VPA's ResourcePolicy to prevent VPA from managing memory limits entirely. | ||
| - Use conservative memory limit recommendations - if memory limits must be managed by VPA, configure generous bounds and safety buffers | ||
|
|
||
| ## Test Plan | ||
|
|
||
| The following test scenarios will be added to e2e tests. The InPlace mode will be tested in the following scenarios: | ||
|
|
||
| - Basic In-Place Update: Pod successfully updated in-place with InPlace mode | ||
| - No Eviction on Failure: Update fails due to node capacity, verify no eviction occurs and pod remains running | ||
| - Feature Gate Disabled: Verify InPlace mode is rejected when feature gate is disabled | ||
| - Indefinite Wait for In-Progress Updates: Update is in-progress for extended period, verify no timeout/eviction occurs (unlike `InPlaceOrRecreate`) | ||
| - Failed Update - Retry Success: Update fails initially, conditions improve, verify successful retry | ||
| - Infeasible Resize Handling: Pod resize marked as infeasible, verify pod is deferred (not evicted) | ||
|
|
||
| ## Upgrade / Downgrade Strategy | ||
|
|
||
| ### Upgrade | ||
|
|
||
| On upgrade to VPA 1.6.0 (tentative release version), users can opt into the new `InPlace` mode by enabling the alpha Feature Gate (which defaults to disabled) by passing `--feature-gates=InPlace=true` to the updater and admission-controller components and setting their VPA UpdateMode to use `InPlace`. | ||
| Existing VPAs will continue to work as before. | ||
|
|
||
| ### Downgrade | ||
|
|
||
| On downgrade of VPA from 1.6.0 (tentative release version), nothing will change. VPAs will continue to work as previously, unless, the user had enabled the feature gate. In which case downgrade could break their VPA that uses `InPlace`. | ||
|
|
||
| ## Graduation Criteria | ||
|
|
||
| ### Alpha | ||
|
|
||
| - Feature gate InPlace is disabled by default | ||
| - Basic functionality implemented including InPlace update mode accepted by admission controller, updater attempts in-place updates and never evicts pods, retry behavior for Infeasible status with no backoff, and deferred behavior for Deferred, InProgress, and unknown statuses | ||
| - Unit tests covering core logic | ||
| - E2E tests for basic scenarios | ||
| - Documentation updated | ||
|
|
||
| ## Feature Enablement and Rollback | ||
|
|
||
| ### How can this feature be enabled / disabled in a live cluster? | ||
|
|
||
| - Feature gate name: `InPlace` | ||
| - Components depending on the feature gate: | ||
| - admission-controller | ||
| - updater | ||
|
|
||
| Disabling of feature gate `InPlace` will cause the following to happen: | ||
| - admission-controller will: | ||
| - Reject new VPA objects being created with `InPlace` configured | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just want to clarify, it will reject new VPAs with InPlace, existing VPAs with InPlace can still be modified, right?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes - I need to double check on that but yes, just like |
||
| - A descriptive error message should be returned to the user letting them know that they are using a feature gated feature | ||
| - Continue to apply recommendations at pod admission time for existing VPAs configured with InPlace mode (behaving similarly to Initial mode) | ||
| - This ensures that when pods are deleted and recreated (e.g., by a deployment rollout or manual deletion), they receive the latest resource recommendations | ||
| - Only the in-place update functionality is disabled; admission-time updates remain functional | ||
| - updater will: | ||
| - Skip processing VPAs with `InPlace` mode (no in-place updates or evictions will be attempted) | ||
| - Effectively treating these VPAs as if they were in Initial or Off mode for running pods | ||
|
|
||
| Enabling of feature gate `InPlace` will cause the following to happen: | ||
| - admission-controller to accept new VPA objects being created with `InPlace` configured | ||
| - updater will attempt to perform an in-place **only** adjustment for VPAs configured with `InPlace` | ||
|
|
||
| ## Kubernetes version compatibility | ||
|
|
||
| `InPlace` is being built assuming that it will be running on a Kubernetes version of at least 1.33 with the beta version of [KEP-1287: In-Place Update of Pod Resources](https://github.com/kubernetes/enhancements/issues/1287) enabled. | ||
| Should these conditions not be true, the VPA shall not be able to scale your workload at all. | ||
|
|
||
| ## Implementation History | ||
|
|
||
| - 2025-15-11: initial version | ||
|
Comment on lines
+254
to
+256
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know we didn't write anything down for AEP-4016 in terms of graduation criteria, but since we went through the process of graduating that one from alpha to beta, I'm wondering if we should have some sort of idea for this one? I don't know if we should have some formal process, but just judging from the last graduation, I think it make senses to say we would keep it in alpha for one release cycle to allow early adoption, and if there's no graduation bugs/blockers that come up in the issues, then we are okay to graduate to beta.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added Graduation Criteria section. |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we think we should have some small note that this update mode is subject to the behavior of the
inplacepodverticalscalinggate, such that it's possible (but improbable) that a resize can cause an OOMkill during a memory limit downsize?Though I don't actually know the probability of this happening if a limit gets resized close to the usage, I think it may be useful to callout since we emphasize that brief disruptions are unnacceptable.
I think to mitigate risk here we may want to recommend that if you absolutely cannot tolerate disruption (i.e. unintended OOMkill), then you can either:
^ Though this may or may not be better for our docs, instead of getting into it in the AEP here.
Thoughts? cc @adrianmoisey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right.
I was thinking a similar though on the "Provide a truly non-disruptive VPA update mode that never evicts pods" goal.
I think it may be worth softening the language in the AEP (since we can't make guarantees that resizes are non-disruptive)
I also agree that most of what you suggested may be good for the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related: #8805
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that sounds very reasonable. I think we can have this in both our docs and the AEP.
lemme know what you think of this: ba9514a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks for that 👍