Skip to content

Commit c4ca283

Browse files
authored
refactor(helm): require external postgres for ha (#1844)
1 parent 70acbaf commit c4ca283

23 files changed

Lines changed: 256 additions & 360 deletions

File tree

.agents/skills/debug-openshell-cluster/SKILL.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -138,13 +138,14 @@ kubectl -n openshell rollout status statefulset/openshell
138138

139139
Look for failed installs, unexpected values, missing namespace, wrong image tag, TLS settings that do not match the registered endpoint, and scheduling failures.
140140

141-
For HA or PostgreSQL-backed installs, also check the service-binding Secret and
142-
bundled PostgreSQL workload:
141+
For HA or PostgreSQL-backed installs, also check the external database Secret
142+
referenced by `server.externalDbSecret` and the PostgreSQL workload if the test
143+
or operator deployed one in-cluster:
143144

144145
```bash
145-
kubectl -n openshell get secret -l app.kubernetes.io/instance=openshell
146-
kubectl -n openshell get statefulset,pod,pvc -l app.kubernetes.io/instance=openshell
147-
kubectl -n openshell logs statefulset/openshell-postgres --tail=200
146+
kubectl -n openshell get secret openshell-ha-pg -o yaml
147+
kubectl -n openshell get deployment,service,pod -l app.kubernetes.io/name=openshell-e2e-postgres
148+
kubectl -n openshell logs deployment/openshell-e2e-postgres --tail=200
148149
```
149150

150151
Check required Helm deployment secrets:

.agents/skills/helm-dev-environment/SKILL.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -66,9 +66,10 @@ generates mTLS secrets on first install. Envoy Gateway opt-in; see the Optional
6666

6767
The gateway Service uses ClusterIP. Access is via Envoy Gateway (port `8080`) or `kubectl port-forward`.
6868

69-
**HA test deploy** (two gateway replicas + bundled PostgreSQL): uncomment
69+
**HA test deploy** (two gateway replicas + external PostgreSQL Secret): uncomment
7070
`#- ci/values-high-availability.yaml` in `deploy/helm/openshell/skaffold.yaml`,
71-
then run `mise run helm:skaffold:run` or `mise run helm:skaffold:dev`.
71+
create the Secret named `openshell-ha-pg` with a `uri` key, then run
72+
`mise run helm:skaffold:run` or `mise run helm:skaffold:dev`.
7273

7374
### TLS behaviour
7475

@@ -203,7 +204,7 @@ mise run helm:k3s:status
203204
| `deploy/helm/openshell/ci/values-skaffold.yaml` | Dev overrides (image pull policy, TLS disabled for local Skaffold) |
204205
| `deploy/helm/openshell/ci/values-cert-manager.yaml` | cert-manager PKI overlay (opt-in; disables pkiInitJob) |
205206
| `deploy/helm/openshell/ci/values-gateway.yaml` | Envoy Gateway GRPCRoute + Gateway overlay |
206-
| `deploy/helm/openshell/ci/values-high-availability.yaml` | HA test overlay (`replicaCount: 2` with bundled PostgreSQL) |
207+
| `deploy/helm/openshell/ci/values-high-availability.yaml` | HA test overlay (`replicaCount: 2` with external PostgreSQL Secret) |
207208
| `deploy/helm/openshell/ci/values-keycloak.yaml` | Keycloak OIDC overlay |
208209
| `deploy/helm/openshell/ci/values-tls-disabled.yaml` | Lint-only: TLS + auth disabled (reverse-proxy edge termination) |
209210
| `deploy/kube/manifests/envoy-gateway-openshell.yaml` | GatewayClass for Envoy Gateway (`mise run helm:gateway:apply`) |

.github/actions/release-helm-oci/action.yml

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -71,14 +71,6 @@ runs:
7171
exit 1
7272
fi
7373
74-
- name: Build chart dependencies
75-
env:
76-
CHART_DIR: ${{ steps.prep.outputs.chart_dir }}
77-
shell: bash
78-
run: |
79-
set -euo pipefail
80-
helm dependency build "${CHART_DIR}"
81-
8274
- name: Package Helm chart
8375
env:
8476
CHART_DIR: ${{ steps.prep.outputs.chart_dir }}

.github/workflows/branch-e2e.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ jobs:
122122
image-tag: ${{ github.sha }}
123123
job-name: Kubernetes HA E2E (Rust smoke)
124124
extra-helm-values: deploy/helm/openshell/ci/values-high-availability.yaml
125+
external-postgres-secret: openshell-ha-pg
125126

126127
core-e2e-result:
127128
name: Core E2E result

.github/workflows/e2e-kubernetes-test.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,11 @@ on:
2727
required: false
2828
type: string
2929
default: ""
30+
external-postgres-secret:
31+
description: "Create an ephemeral external PostgreSQL fixture and write its URI to this Secret"
32+
required: false
33+
type: string
34+
default: ""
3035
mise-version:
3136
description: "mise version to install on the bare Kubernetes e2e runner"
3237
required: false
@@ -111,6 +116,7 @@ jobs:
111116
env:
112117
OPENSHELL_E2E_KUBE_CONTEXT: kind-${{ env.KIND_CLUSTER_NAME }}
113118
OPENSHELL_E2E_KUBE_EXTRA_VALUES: ${{ inputs.extra-helm-values }}
119+
OPENSHELL_E2E_KUBE_EXTERNAL_POSTGRES_SECRET: ${{ inputs.external-postgres-secret }}
114120
IMAGE_TAG: ${{ inputs.image-tag }}
115121
OPENSHELL_REGISTRY: ghcr.io/nvidia/openshell
116122
run: mise run --no-deps --skip-deps e2e:kubernetes

architecture/compute-runtimes.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,9 @@ runtime still owns GPU device injection.
8686

8787
## Deployment Shape
8888

89-
Kubernetes deployments use the Helm chart under `deploy/helm/openshell`.
89+
Kubernetes deployments use the Helm chart under `deploy/helm/openshell`. The
90+
chart deploys the gateway and sandbox runtime integration, but HA deployments
91+
must point `server.externalDbSecret` at an operator-managed PostgreSQL database.
9092
Standalone local deployments start the gateway with a selected runtime such as
9193
Docker, Podman, or VM. The CLI can register multiple gateways and switch between
9294
them without changing the sandbox architecture.

deploy/helm/openshell/Chart.lock

Lines changed: 0 additions & 6 deletions
This file was deleted.

deploy/helm/openshell/Chart.yaml

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,3 @@ type: application
1111
# empty), so a released chart automatically pulls the matching gateway and supervisor images.
1212
version: 0.0.0
1313
appVersion: "0.0.0"
14-
dependencies:
15-
- name: postgresql
16-
version: 18.6.7
17-
repository: oci://registry-1.docker.io/bitnamicharts
18-
condition: postgres.enabled
19-
alias: postgres

deploy/helm/openshell/README.md

Lines changed: 7 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ See [`values.yaml`](values.yaml) for source defaults. Selected overlays:
5656
- [`ci/values-gateway.yaml`](ci/values-gateway.yaml) - gateway-only configuration
5757
- [`ci/values-cert-manager.yaml`](ci/values-cert-manager.yaml) - cert-manager integration
5858
- [`ci/values-keycloak.yaml`](ci/values-keycloak.yaml) - Keycloak OIDC integration
59-
- [`ci/values-high-availability.yaml`](ci/values-high-availability.yaml) - HA gateway test overlay with bundled PostgreSQL
59+
- [`ci/values-high-availability.yaml`](ci/values-high-availability.yaml) - CI overlay for multi-replica external PostgreSQL testing
6060

6161
### Database backend
6262

@@ -65,12 +65,15 @@ By default, OpenShell uses SQLite:
6565
```yaml
6666
server:
6767
dbUrl: "sqlite:/var/openshell/openshell.db"
68-
postgres:
69-
enabled: false
7068
```
7169
7270
#### External PostgreSQL
7371
72+
Use external PostgreSQL when the gateway should connect to a database managed
73+
outside this chart. The OpenShell chart does not deploy a database; install
74+
PostgreSQL separately using the chart, operator, or managed service that fits
75+
your environment, then pass the connection URI through a Secret.
76+
7477
Create a Secret containing the PostgreSQL connection URI if one does not
7578
already exist:
7679
@@ -87,18 +90,6 @@ helm install openshell oci://ghcr.io/nvidia/openshell/helm-chart --version <vers
8790
--set server.externalDbSecret=my-pg-credentials
8891
```
8992

90-
#### Bundled PostgreSQL
91-
92-
Deploy a PostgreSQL instance alongside the gateway using the bundled
93-
Bitnami subchart. A random password is generated automatically:
94-
95-
```bash
96-
helm install openshell oci://ghcr.io/nvidia/openshell/helm-chart --version <version> \
97-
--set postgres.enabled=true
98-
```
99-
100-
To set an explicit password, add `--set postgres.auth.password=my-secret-password`.
101-
10293
#### OpenShift
10394

10495
Append these flags to any of the PostgreSQL commands above for OpenShift:
@@ -159,12 +150,6 @@ JWT signing Secret.
159150
| podLabels | object | `{}` | Extra labels to add to the gateway pod. |
160151
| podLifecycle.terminationGracePeriodSeconds | int | `5` | Grace period, in seconds, before Kubernetes terminates the gateway pod. |
161152
| podSecurityContext.fsGroup | int | `1000` | fsGroup assigned to the gateway pod. |
162-
| postgres.auth.database | string | `"openshell"` | |
163-
| postgres.auth.password | string | `""` | |
164-
| postgres.auth.username | string | `"openshell"` | |
165-
| postgres.enabled | bool | `false` | Deploy the bundled Bitnami PostgreSQL subchart. |
166-
| postgres.primary.persistence.enabled | bool | `true` | |
167-
| postgres.serviceBindings.enabled | bool | `true` | |
168153
| probes.liveness.failureThreshold | int | `3` | Liveness probe failure threshold before the container is restarted. |
169154
| probes.liveness.initialDelaySeconds | int | `2` | Liveness probe initial delay, in seconds. |
170155
| probes.liveness.periodSeconds | int | `5` | Liveness probe period, in seconds. |
@@ -176,7 +161,7 @@ JWT signing Secret.
176161
| probes.startup.failureThreshold | int | `30` | Startup probe failure threshold before the container is killed. |
177162
| probes.startup.periodSeconds | int | `2` | Startup probe period, in seconds. |
178163
| probes.startup.timeoutSeconds | int | `1` | Startup probe timeout, in seconds. |
179-
| replicaCount | int | `1` | Number of OpenShell gateway replicas. |
164+
| replicaCount | int | `1` | Number of OpenShell gateway replicas. Values greater than 1 require server.externalDbSecret because the default SQLite backend is per pod. |
180165
| resources | object | `{}` | Gateway pod resource requests and limits. |
181166
| sandboxServiceAccount.annotations | object | `{}` | Annotations to add to the generated sandbox service account. |
182167
| sandboxServiceAccount.create | bool | `true` | Create a service account for sandbox pods. |

deploy/helm/openshell/README.md.gotmpl

Lines changed: 6 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ See [`values.yaml`](values.yaml) for source defaults. Selected overlays:
5656
- [`ci/values-gateway.yaml`](ci/values-gateway.yaml) - gateway-only configuration
5757
- [`ci/values-cert-manager.yaml`](ci/values-cert-manager.yaml) - cert-manager integration
5858
- [`ci/values-keycloak.yaml`](ci/values-keycloak.yaml) - Keycloak OIDC integration
59-
- [`ci/values-high-availability.yaml`](ci/values-high-availability.yaml) - HA gateway test overlay with bundled PostgreSQL
59+
- [`ci/values-high-availability.yaml`](ci/values-high-availability.yaml) - CI overlay for multi-replica external PostgreSQL testing
6060

6161
### Database backend
6262

@@ -65,12 +65,15 @@ By default, OpenShell uses SQLite:
6565
```yaml
6666
server:
6767
dbUrl: "sqlite:/var/openshell/openshell.db"
68-
postgres:
69-
enabled: false
7068
```
7169

7270
#### External PostgreSQL
7371

72+
Use external PostgreSQL when the gateway should connect to a database managed
73+
outside this chart. The OpenShell chart does not deploy a database; install
74+
PostgreSQL separately using the chart, operator, or managed service that fits
75+
your environment, then pass the connection URI through a Secret.
76+
7477
Create a Secret containing the PostgreSQL connection URI if one does not
7578
already exist:
7679

@@ -87,18 +90,6 @@ helm install openshell oci://ghcr.io/nvidia/openshell/helm-chart --version <vers
8790
--set server.externalDbSecret=my-pg-credentials
8891
```
8992

90-
#### Bundled PostgreSQL
91-
92-
Deploy a PostgreSQL instance alongside the gateway using the bundled
93-
Bitnami subchart. A random password is generated automatically:
94-
95-
```bash
96-
helm install openshell oci://ghcr.io/nvidia/openshell/helm-chart --version <version> \
97-
--set postgres.enabled=true
98-
```
99-
100-
To set an explicit password, add `--set postgres.auth.password=my-secret-password`.
101-
10293
#### OpenShift
10394

10495
Append these flags to any of the PostgreSQL commands above for OpenShift:

0 commit comments

Comments
 (0)