feat(kubernetes): support HA gateway rebalancing#1868
Open
TaylorMutch wants to merge 6 commits into
Open
Conversation
|
Label |
24c1003 to
3e590e6
Compare
elezar
reviewed
Jun 12, 2026
| - op: add | ||
| path: /deploy/helm/releases/0/valuesFiles/- | ||
| value: ci/values-high-availability.yaml | ||
| - name: ha-envoy |
Member
There was a problem hiding this comment.
When reading the docs initially, it was not clear that ha-envoy included the high-availability profile? Could we call this out explicitly (perhaps renaming the profile), or make it so that these are composable?
Collaborator
Author
There was a problem hiding this comment.
I'm collapsing the two since they are are used in practice together.
fb46193 to
a60c79c
Compare
e93a30d to
493f3da
Compare
493f3da to
a831deb
Compare
This was referenced Jun 25, 2026
Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
a831deb to
236c8ab
Compare
Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds HA gateway rebalancing support for Kubernetes deployments so client and supervisor traffic can survive gateway replica scale-up, scale-down, and pod rotation.
This PR targets
maindirectly. The reconciler lease work from #1577 has already landed, so this PR now focuses on peer authentication/routing, supervisor relay handoff, Kubernetes ownership behavior, Helm/Skaffold HA wiring, CLI retry hardening, and HA validation.How it works
The Gateway now exposes a peer Service to let gateway replicas discover and call each other to reach supervisors. When a client request lands on a replica that does not currently own the target sandbox supervisor session, that gateway resolves the owning replica and relays the supervisor traffic to the peer instead of failing the request. Kubernetes lease/reconciler ownership keeps sandbox supervision coordinated as gateway pods scale, roll, or disappear, while the CLI retries transient sync probes during those handoffs.
Related Issue
Closes #1021
Related: #1012, #1429, #1577, #1731, #1488
Changes
Testing
cargo fmt --all -- --checkmise run helm:lintcargo check -p openshell-server --features test-supportmise run pre-committest:e2e-kubernetesChecklist