-
Notifications
You must be signed in to change notification settings - Fork 894
feat(kubernetes): support HA gateway rebalancing #1868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
23d8620
cd865bd
9ac25ce
2fd12c9
236c8ab
98a9372
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -132,10 +132,9 @@ Common findings: | |
| helm -n openshell status openshell | ||
| helm -n openshell get values openshell | ||
| kubectl -n openshell get deployment,statefulset,pod,svc,pvc | ||
| kubectl -n openshell logs deployment/openshell -c openshell-gateway --tail=200 | ||
| kubectl -n openshell logs statefulset/openshell -c openshell-gateway --tail=200 | ||
| kubectl -n openshell rollout status deployment/openshell | ||
| kubectl -n openshell rollout status statefulset/openshell | ||
| WORKLOAD="$(kubectl -n openshell get deployment openshell >/dev/null 2>&1 && echo deployment/openshell || echo statefulset/openshell)" | ||
| kubectl -n openshell logs "${WORKLOAD}" -c openshell-gateway --tail=200 | ||
| kubectl -n openshell rollout status "${WORKLOAD}" | ||
| ``` | ||
|
|
||
| Use the log and rollout commands for the workload kind that exists in the | ||
|
|
@@ -153,6 +152,32 @@ kubectl -n openshell get deployment,service,pod -l app.kubernetes.io/name=opensh | |
| kubectl -n openshell logs deployment/openshell-e2e-postgres --tail=200 | ||
| ``` | ||
|
|
||
| For multi-replica gateway installs, supervisor and client session traffic may | ||
| be served by a non-owner gateway replica and relayed to the current supervisor | ||
| owner over the internal `PeerRelay` RPC. Check the headless peer Service, | ||
| projected peer ServiceAccount token volume, and TokenReview RBAC: | ||
|
|
||
| ```bash | ||
| kubectl -n openshell get svc openshell-peer -o wide | ||
| kubectl -n openshell get endpoints openshell-peer | ||
| kubectl -n openshell get pod -l app.kubernetes.io/instance=openshell \ | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Question: Does naming the envvar |
||
| -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{.spec.volumes[?(@.name=="gateway-peer-token")]}{"\n"}{.spec.containers[0].env[?(@.name=="OPENSHELL_PEER_SERVICE_ACCOUNT_TOKEN_FILE")]}{"\n"}{.spec.containers[0].env[?(@.name=="OPENSHELL_PEER_ENDPOINT")]}{"\n"}{end}' | ||
| kubectl auth can-i create tokenreviews.authentication.k8s.io \ | ||
| --as=system:serviceaccount:openshell:openshell | ||
| kubectl auth can-i get pods -n openshell \ | ||
| --as=system:serviceaccount:openshell:openshell | ||
| kubectl -n openshell logs "${WORKLOAD}" --tail=200 | grep -E 'gateway peer|PeerRelay|supervisor owner|owner relay' | ||
| ``` | ||
|
|
||
| Expected gateway startup logs include | ||
| `gateway peer ServiceAccount TokenReview authentication enabled`. If peer relay | ||
| calls fail with `Unauthenticated`, verify the `gateway-peer-token` projected | ||
| volume has audience `openshell-gateway-peer` and that the receiving gateway can | ||
| create TokenReviews. If they fail with `PermissionDenied`, verify the gateway | ||
| ServiceAccount name, release namespace, pod UID, and Helm selector labels match | ||
| the live gateway pods. Deployment-backed gateway pods should also publish | ||
| `OPENSHELL_PEER_ENDPOINT` from their pod IP. | ||
|
|
||
| Check required Helm deployment secrets: | ||
|
|
||
| ```bash | ||
|
|
@@ -199,8 +224,8 @@ label, supervisor env vars `OPENSHELL_K8S_SA_TOKEN_FILE` and | |
| Check the image references currently used by the gateway deployment: | ||
|
|
||
| ```bash | ||
| kubectl -n openshell get deployment openshell -o jsonpath="{.spec.template.spec.containers[*].image}{\"\n\"}{.spec.template.spec.containers[*].env[?(@.name==\"OPENSHELL_SUPERVISOR_IMAGE\")].value}{\"\n\"}" | ||
| kubectl -n openshell get statefulset openshell -o jsonpath="{.spec.template.spec.containers[*].image}{\"\n\"}{.spec.template.spec.containers[*].env[?(@.name==\"OPENSHELL_SUPERVISOR_IMAGE\")].value}{\"\n\"}" | ||
| WORKLOAD="$(kubectl -n openshell get deployment openshell >/dev/null 2>&1 && echo deployment/openshell || echo statefulset/openshell)" | ||
| kubectl -n openshell get "${WORKLOAD}" -o jsonpath="{.spec.template.spec.containers[*].image}{\"\n\"}{.spec.template.spec.containers[*].env[?(@.name==\"OPENSHELL_SUPERVISOR_IMAGE\")].value}{\"\n\"}" | ||
| helm -n openshell get values openshell | grep -E 'repository|tag|supervisorImage|workload' | ||
| ``` | ||
|
|
||
|
|
@@ -244,8 +269,8 @@ If the gateway is healthy but sandbox creation fails: | |
| ```bash | ||
| kubectl -n openshell get pods | ||
| kubectl -n openshell get events --sort-by=.lastTimestamp | tail -n 50 | ||
| kubectl -n openshell logs deployment/openshell -c openshell-gateway --tail=200 | ||
| kubectl -n openshell logs statefulset/openshell -c openshell-gateway --tail=200 | ||
| WORKLOAD="$(kubectl -n openshell get deployment openshell >/dev/null 2>&1 && echo deployment/openshell || echo statefulset/openshell)" | ||
| kubectl -n openshell logs "${WORKLOAD}" -c openshell-gateway --tail=200 | ||
| ``` | ||
|
|
||
| Check the configured sandbox namespace: | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -66,10 +66,31 @@ generates mTLS secrets on first install. Envoy Gateway opt-in; see the Optional | |
|
|
||
| The gateway Service uses ClusterIP. Access is via Envoy Gateway (port `8080`) or `kubectl port-forward`. | ||
|
|
||
| **HA test deploy** (two gateway replicas + external PostgreSQL Secret): uncomment | ||
| `#- ci/values-high-availability.yaml` in `deploy/helm/openshell/skaffold.yaml`, | ||
| create the Secret named `openshell-ha-pg` with a `uri` key, then run | ||
| `mise run helm:skaffold:run` or `mise run helm:skaffold:dev`. | ||
| The Skaffold profile for HA reverse-proxy development is available from | ||
| `deploy/helm/openshell/`: | ||
|
|
||
| ```bash | ||
| # Two gateway replicas + external PostgreSQL Secret + Envoy Gateway + Gateway API route. | ||
| KUBECONFIG=../../../kubeconfig skaffold run -p high-availability | ||
| ``` | ||
|
|
||
| The `high-availability` profile expects a Secret named `openshell-ha-pg` in the `openshell` | ||
| namespace with a `uri` key. For local manual testing, either create your own | ||
| PostgreSQL Secret or use the e2e PostgreSQL fixture manifest in | ||
| `e2e/kubernetes/postgres-fixture.yaml`. | ||
|
|
||
| For the `high-availability` profile, return to the repository root and apply the | ||
| GatewayClass and BackendTrafficPolicy manifest after Skaffold has installed | ||
| Envoy Gateway: | ||
|
|
||
| ```bash | ||
| KUBECONFIG=kubeconfig mise run helm:gateway:apply | ||
| ``` | ||
|
|
||
| The BackendTrafficPolicy disables Envoy request and stream-duration timeouts for | ||
| OpenShell's `GRPCRoute`. Keep that policy in `deploy/kube/manifests/envoy-gateway-openshell.yaml`, | ||
| not in the Helm chart; it is required for long-lived gRPC create/watch/exec/relay | ||
| streams during gateway rollouts and scale events. | ||
|
|
||
| ### TLS behaviour | ||
|
|
||
|
|
@@ -139,23 +160,76 @@ but will point to a deleted cluster — safe to ignore or clean up manually. | |
|
|
||
| ## Optional Add-ons | ||
|
|
||
| Each add-on requires uncommenting the corresponding `valuesFiles` entry in | ||
| `deploy/helm/openshell/skaffold.yaml` before running `helm:skaffold:dev` or `helm:skaffold:run`. | ||
| Some add-ons can be enabled by uncommenting values in `skaffold.yaml`, but prefer | ||
| the dedicated Skaffold profiles when they exist. Profiles avoid leaving local | ||
| manual edits in the worktree. | ||
|
|
||
| ### Envoy Gateway (Gateway API / GRPCRoute) | ||
|
|
||
| Envoy Gateway is already installed by Skaffold (the `envoy-gateway` Helm release in | ||
| `skaffold.yaml`). To activate routing: | ||
| Use the `high-availability` Skaffold profile for HA reverse-proxy testing: | ||
|
|
||
| 1. Uncomment `#- values-gateway.yaml` in `skaffold.yaml` | ||
| 2. Redeploy: `mise run helm:skaffold:run` | ||
| 3. Apply the GatewayClass: `mise run helm:gateway:apply` | ||
| 4. Access: `http://127.0.0.1:8080` | ||
| ```bash | ||
| cd deploy/helm/openshell | ||
| KUBECONFIG=../../../kubeconfig skaffold run -p high-availability | ||
| cd ../../.. | ||
| KUBECONFIG=kubeconfig mise run helm:gateway:apply | ||
| ``` | ||
|
|
||
| `values-gateway.yaml` creates a `Gateway` (listener on port 80, class `eg`) and | ||
| `GRPCRoute` in the `openshell` namespace. The `high-availability` profile installs the | ||
| Envoy Gateway Helm chart and layers both `values-high-availability.yaml` and | ||
| `values-gateway.yaml` onto the OpenShell release. | ||
|
|
||
| `deploy/kube/manifests/envoy-gateway-openshell.yaml` creates: | ||
|
|
||
| `values-gateway.yaml` creates a `Gateway` (listener on port 80, class `eg`) and a | ||
| `GRPCRoute` in the `openshell` namespace. Envoy Gateway provisions a LoadBalancer | ||
| service for the proxy; klipper-lb binds it to hostPort 80, reachable via the | ||
| `8080:80` load balancer port mapping. | ||
| - `GatewayClass/eg` | ||
| - `BackendTrafficPolicy/openshell-grpc-timeouts` | ||
|
|
||
| The Envoy Gateway proxy Service is usually exposed through the k3d load balancer | ||
| at `http://127.0.0.1:8080`. If the cluster was created with a different | ||
| `HELM_K3S_LB_HOST_PORT`, use that host port instead. | ||
|
|
||
| For manual tests against an existing cluster, prefer forwarding the Envoy proxy | ||
| Service rather than `svc/openshell`. That keeps client traffic on the same path | ||
| as a real reverse proxy while gateway pods rotate behind it: | ||
|
|
||
| ```bash | ||
| KUBECONFIG=kubeconfig kubectl get svc -A \ | ||
| -l gateway.envoyproxy.io/owning-gateway-name=openshell | ||
| KUBECONFIG=kubeconfig kubectl -n <envoy-service-namespace> port-forward \ | ||
| svc/<envoy-service-name> 8080:80 | ||
| openshell gateway add http://127.0.0.1:8080 --name openshell --local | ||
| ``` | ||
|
|
||
| When running e2e tests manually through Envoy, register gateway metadata (as | ||
| above) instead of relying only on `OPENSHELL_GATEWAY_ENDPOINT`; some tests call | ||
| `openshell gateway info` and expect metadata for the active gateway. | ||
|
|
||
| ### Kubernetes E2E Notes | ||
|
|
||
| Use `mise run e2e:kubernetes` for the standard Helm-backed Kubernetes suite. | ||
| The kube e2e wrapper creates only one port-forward, to `svc/openshell`; it no | ||
| longer forwards the unauthenticated health listener or runs a `/readyz` e2e | ||
| target. `/readyz` remains covered by server unit/integration tests. | ||
|
|
||
| Use `mise run e2e:kubernetes:ha-rebalancing` for full-suite HA coverage. The | ||
| task creates an external PostgreSQL fixture, installs Envoy Gateway, applies | ||
| `deploy/kube/manifests/envoy-gateway-openshell.yaml`, enables the chart | ||
| `GRPCRoute`, and runs the full Kubernetes e2e suite, including | ||
| `kubernetes_ha_rebalancing`. That coverage validates sandbox create/watch and | ||
| exec through the Envoy proxy while gateway replicas scale up, scale down, and | ||
| rotate. | ||
|
Comment on lines
+219
to
+221
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it also run a longer running sandbox and then perform the operations? Are those relevant / in scope? |
||
|
|
||
| If you reuse an existing Skaffold cluster for the full kube suite, make sure the | ||
| cluster has the Docker Desktop host-gateway alias configured for host-gateway | ||
| tests. The e2e wrapper sets this on chart installs; manual reuse may require: | ||
|
Comment on lines
+223
to
+225
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Question: Why is Docker Desktop relevant here? I thought Skaffold uses k3s behind the scenes? |
||
|
|
||
| ```bash | ||
| KUBECONFIG=kubeconfig helm upgrade openshell deploy/helm/openshell \ | ||
| --namespace openshell --reuse-values \ | ||
| --set server.hostGatewayIP=192.168.65.254 \ | ||
| --wait --timeout 5m | ||
| ``` | ||
|
|
||
| ### Keycloak OIDC | ||
|
|
||
|
|
@@ -253,6 +327,6 @@ for dependencies still declared in `Chart.yaml`. | |
| | `deploy/helm/openshell/ci/values-spire.yaml` | SPIFFE/SPIRE provider token grant overlay | | ||
| | `deploy/helm/openshell/ci/values-spire-stack.yaml` | SPIRE hardened chart values for local dev | | ||
| | `deploy/helm/openshell/ci/values-tls-disabled.yaml` | Lint-only: TLS + auth disabled (reverse-proxy edge termination) | | ||
| | `deploy/kube/manifests/envoy-gateway-openshell.yaml` | GatewayClass for Envoy Gateway (`mise run helm:gateway:apply`) | | ||
| | `deploy/kube/manifests/envoy-gateway-openshell.yaml` | GatewayClass and BackendTrafficPolicy for Envoy Gateway (`mise run helm:gateway:apply`) | | ||
| | `tasks/scripts/helm-k3s-local.sh` | k3d cluster create/delete/start/stop/status | | ||
| | `tasks/scripts/keycloak-k8s-setup.sh` | Keycloak deploy + realm import | | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -32,6 +32,21 @@ on: | |
| required: false | ||
| type: string | ||
| default: "" | ||
| test-name: | ||
| description: "Rust e2e test target to run (sets OPENSHELL_E2E_KUBE_TEST)" | ||
| required: false | ||
| type: string | ||
| default: "" | ||
| kubernetes-features: | ||
| description: "Cargo feature list for the Kubernetes e2e crate" | ||
| required: false | ||
| type: string | ||
| default: "" | ||
| use-envoy-gateway: | ||
| description: "Install Envoy Gateway and run the e2e command through the chart GRPCRoute" | ||
| required: false | ||
| type: boolean | ||
| default: false | ||
| mise-version: | ||
| description: "mise version to install on the bare Kubernetes e2e runner" | ||
| required: false | ||
|
|
@@ -117,6 +132,9 @@ jobs: | |
| OPENSHELL_E2E_KUBE_CONTEXT: kind-${{ env.KIND_CLUSTER_NAME }} | ||
| OPENSHELL_E2E_KUBE_EXTRA_VALUES: ${{ inputs.extra-helm-values }} | ||
| OPENSHELL_E2E_KUBE_EXTERNAL_POSTGRES_SECRET: ${{ inputs.external-postgres-secret }} | ||
| OPENSHELL_E2E_KUBE_TEST: ${{ inputs.test-name }} | ||
| OPENSHELL_E2E_KUBERNETES_FEATURES: ${{ inputs.kubernetes-features }} | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Question: Why |
||
| OPENSHELL_E2E_KUBE_USE_ENVOY: ${{ inputs.use-envoy-gateway }} | ||
| IMAGE_TAG: ${{ inputs.image-tag }} | ||
| OPENSHELL_REGISTRY: ghcr.io/nvidia/openshell | ||
| run: mise run --no-deps --skip-deps e2e:kubernetes | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -91,6 +91,36 @@ authenticated sandbox ID with any sandbox ID or name resolved from the request. | |
| Supervisor control and relay streams require a matching sandbox principal before | ||
| the gateway registers the session or bridges relay bytes. | ||
|
|
||
| ## HA Supervisor Ownership | ||
|
|
||
| In multi-replica Kubernetes deployments, every gateway pod can accept client | ||
| RPCs, but a sandbox supervisor maintains one active stream to one gateway | ||
| replica at a time. The connected replica publishes a short-lived supervisor | ||
| owner record in the shared Postgres object store with its replica id, peer DNS | ||
| endpoint, supervisor instance id, and connection epoch. Heartbeats renew the | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this means that a supervisor can switch owner when a heartbeat occurs? |
||
| record, and reconnects from the same supervisor instance with a newer epoch can | ||
| supersede the previous owner before the TTL expires. | ||
|
|
||
| Session-bound operations such as exec, TCP forwarding, file sync, and sandbox | ||
| service routing first check the local session registry. If the supervisor is | ||
| owned by another gateway replica, the serving gateway opens an internal | ||
| `PeerRelay` stream to that owner and asks it to open the supervisor relay. This | ||
| keeps client traffic working when a Kubernetes Service routes the client to a | ||
| non-owner gateway pod. If a peer owner is stale or unreachable during a rollout, | ||
| the serving gateway retries ownership lookup until the normal relay wait | ||
| deadline. | ||
|
Comment on lines
+109
to
+111
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not clear what this means if the owner is lost / non-recoverable. Does "ownership lookup" mean that a new owner may be selected if we retry after a heartbeat? |
||
|
|
||
| File upload and download use tar-over-SSH through the same relay path. A gateway | ||
| pod termination drops the active SSH proxy byte stream, so the CLI retries the | ||
| whole sync operation with a fresh SSH session instead of attempting mid-stream | ||
| resume. | ||
|
|
||
| Gateway peer RPCs authenticate with Kubernetes ServiceAccount identity rather | ||
| than a shared secret. Helm mounts a projected, pod-bound token with audience | ||
| `openshell-gateway-peer`; the receiving gateway validates it through | ||
| TokenReview, checks the live pod UID and chart selector labels, and authorizes | ||
| only the internal peer relay method. | ||
|
|
||
| ## API Surface | ||
|
|
||
| The gateway API is organized around platform objects and operational streams: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming nit: In the context of OpenShell, I have been using "workload" to mean something that a user is trying to run. Although not critical, seeing workload here may be a little confusing. Would
DEPLOYMENTmake more sense here -- even though it may be represented by a statefulset?