Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 18 additions & 13 deletions deployments/scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ SPDX-License-Identifier: Apache-2.0

# OSMO Deployment Scripts

End-to-end deployer for OSMO 6.3 across multiple Kubernetes flavors and storage backends. The single entry point is `deploy-osmo-minimal.sh`; everything else (Terraform, KAI install, GPU Operator, MinIO, storage credential wiring, smoke tests) is invoked as a phase.
End-to-end deployer for OSMO 6.3 across multiple Kubernetes flavors and storage backends. The single entry point is `deploy-osmo-minimal.sh`; everything else (Terraform, KAI install, GPU Operator, MinIO or RustFS, storage credential wiring, smoke tests) is invoked as a phase.

## Quick Start

Expand Down Expand Up @@ -50,16 +50,17 @@ Three orthogonal axes:

Cells show which auth methods are valid for each `(provider, storage-backend)` pair:

| ↓ Provider \ Storage → | `minio` | `azure-blob` | `s3` | `byo` |
|------------------------|--------------|----------------------|------------|----------------------|
| `azure` (AKS) | static | static, WI | static | static, WI |
| `aws` (EKS) | static | static | static | static, WI (IRSA) |
| `microk8s` (single-node) | static | — | — | static |
| `byo` (any K8s) | static | static, WI* | static | static, WI* |
| ↓ Provider \ Storage → | `minio` | `rustfs` | `azure-blob` | `s3` | `byo` |
|------------------------|--------------|--------------|----------------------|------------|----------------------|
| `azure` (AKS) | static | static | static, WI | static | static, WI |
| `aws` (EKS) | static | static | static | static | static, WI (IRSA) |
| `microk8s` (single-node) | static | static | — | — | static |
| `byo` (any K8s) | static | static | static, WI* | static | static, WI* |

\* `workload-identity` on `byo` requires the cluster's K8s API server to have the appropriate OIDC issuer + the cloud-side trust set up by the caller.

Notes:
- `rustfs` is an in-cluster, S3-compatible object store ([rustfs.com](https://rustfs.com)) — a drop-in alternative to `minio`. The two are **mutually exclusive**: selecting `rustfs` never installs MinIO and never enables the MicroK8s `minio` addon (a MinIO that's already installed is left untouched — it isn't uninstalled). Like `minio` it has no cloud-identity path (self-hosted), so only `static` auth is valid.
- `s3` does **not** support `workload-identity` directly — use `--backend byo --auth-method workload-identity` with IRSA instead. `s3.sh` errors out with this guidance.
- `microk8s` deliberately has no cloud-identity path — it's a single-node dev/eval flow.
- Cross-cloud combinations (e.g. AKS pointing at S3) are valid for `static` auth.
Expand All @@ -70,6 +71,7 @@ Notes:
|------------|------------------------|--------|
| `azure` | `azure-blob` / static | ✅ |
| `microk8s` | `minio` / static | ✅ |
| `microk8s` | `rustfs` / static | ⏳ |
| `byo` | `minio` / static | ✅ |
| `aws` | `s3` / static | ✅ |
| `azure` | `azure-blob` / WI | ⏳ |
Expand All @@ -85,9 +87,10 @@ scripts/
├── common.sh # Shared logging, OSMO CLI install, helm helpers
├── install-kai-scheduler.sh # KAI Scheduler (idempotent, CRD-detected)
├── install-gpu-operator.sh # NVIDIA GPU Operator (multi-signal auto-skip)
├── install-minio.sh # In-cluster MinIO (bitnami; auto-skips if addon/release present)
├── install-minio.sh # In-cluster MinIO (auto-skips if addon/release present)
├── install-rustfs.sh # In-cluster RustFS via helm (alternative to MinIO; mutually exclusive)
├── configure-storage.sh # 6.3 storage wiring: K8s Secrets + values fragment
├── storage/ # Per-backend storage logic (minio, azure-blob, s3, byo)
├── storage/ # Per-backend storage logic (minio, rustfs, azure-blob, s3, byo)
├── port-forward.sh # One-shot or watchdog kubectl port-forward
├── verify.sh # End-to-end smoke tests (hello + GPU workflows)
├── azure/terraform.sh # Azure Terraform driver
Expand Down Expand Up @@ -116,6 +119,7 @@ When invoked, the entry-point runs these phases in order. Each is idempotent and
- `install-kai-scheduler.sh` (CRD-detected: `podgroups.scheduling.run.ai`)
- `install-gpu-operator.sh` (skipped under `--no-gpu`; multi-signal detection: addon, helm release, CR, DaemonSet)
- `install-minio.sh` (only when `--storage-backend minio`; skipped if addon/release present)
- `install-rustfs.sh` (only when `--storage-backend rustfs`; standalone helm install, sets `RUSTFS_OBS_ENVIRONMENT=production` + `RUSTFS_OBS_LOGGER_LEVEL=warn`, no resource limits)
3. **Storage credential wiring**
- `configure-storage.sh --backend X --auth-method Y` writes K8s Secrets (`osmo-workflow-{data,log,app}-cred`) and emits `values/.storage-values.yaml` for the helm install to merge
4. **OSMO Helm install** (`deploy-k8s.sh`)
Expand All @@ -136,7 +140,7 @@ Main entry point — see `--help` for the full flag list. Orchestrates all phase
| Flag | Purpose |
|------|---------|
| `--provider {azure,aws,microk8s,byo}` | Required. Selects bootstrap path. |
| `--storage-backend {auto,minio,azure-blob,s3,byo,none}` | Default `auto`: chooses based on provider (azure→azure-blob, aws→s3, microk8s→minio, byo→error). |
| `--storage-backend {auto,minio,rustfs,azure-blob,s3,byo,none}` | Default `auto`: chooses based on provider (azure→azure-blob, aws→s3, microk8s→minio, byo→error). `rustfs` installs the in-cluster RustFS S3 store instead of MinIO (mutually exclusive). |
| `--auth-method {static,workload-identity}` | Default `static`. See [Deployment Combinations](#deployment-combinations) for what's supported per backend. |
| `--workload-identity-client-id ID` | Azure UAMI client ID (azure-blob + WI). |
| `--workload-identity-role-arn ARN` | AWS IAM role ARN (byo + WI / IRSA). |
Expand Down Expand Up @@ -204,14 +208,15 @@ Each is idempotent — safe to invoke on a cluster where the target component al
|--------|---------|---------------------|
| `install-kai-scheduler.sh` | KAI Scheduler v0.14.0 (gang scheduling) | CRD `podgroups.scheduling.run.ai` |
| `install-gpu-operator.sh` | NVIDIA GPU Operator (drivers + container toolkit) | microk8s `nvidia` addon, helm release in any ns, `clusterpolicies.nvidia.com` CR (covers NVAIE), or `nvidia-device-plugin` DaemonSet |
| `install-minio.sh` | Bitnami MinIO chart | microk8s `minio` addon or existing `minio` service in `minio-operator` ns |
| `configure-storage.sh` | 6.3 storage wiring: K8s Secrets + helm values fragment for `services.configs.workflow.workflow_*.credential.secretName`. Dispatcher → `storage/{minio,azure-blob,s3,byo}.sh`. | n/a — backend chosen via `--backend` |
| `install-minio.sh` | Single-pod MinIO (plain manifests) | microk8s `minio` addon or existing `minio` service in `minio-operator` ns |
| `install-rustfs.sh` | RustFS helm chart (`https://charts.rustfs.com`), standalone mode. Always sets `RUSTFS_OBS_ENVIRONMENT=production` + `RUSTFS_OBS_LOGGER_LEVEL=warn` (perf-critical) and runs with no resource limits. Never installs/adds MinIO; an already-installed MinIO is left untouched (warns only). | existing `rustfs` helm release or ready `rustfs` Deployment |
| `configure-storage.sh` | 6.3 storage wiring: K8s Secrets + helm values fragment for `services.configs.workflow.workflow_*.credential.secretName`. Dispatcher → `storage/{minio,rustfs,azure-blob,s3,byo}.sh`. | n/a — backend chosen via `--backend` |
| `port-forward.sh` | One-shot or `--watchdog` PF, tagged `osmo-pf-watchdog:<svc>` for cleanup with `pkill -f 'osmo-pf-watchdog:'`. Watchdog readiness waits up to `OSMO_PF_HEALTH_TIMEOUT_SECONDS` (default 300). | Reuses live PF if context+namespace match |
| `verify.sh` | Submits `workflows/verify-hello.yaml` + `verify-gpu.yaml`; polls until terminal state, dumps logs on failure. `SKIP_GPU=1` to skip GPU test. | n/a |

### `microk8s/install.sh`

Single-node MicroK8s bootstrap, used only by `--provider microk8s`. Installs snapd → microk8s 1.31/stable → kubectl/helm/helmfile → core addons (`dns`, `hostpath-storage`, `helm3`, `rbac`, `minio`) → optional `nvidia` addon → containerd Docker Hub creds patch (when `~/.docker/config.json` exists) → kubeconfig export. Run as root: `sudo ./microk8s/install.sh [--gpu]`. Idempotent.
Single-node MicroK8s bootstrap, used only by `--provider microk8s`. Installs snapd → microk8s 1.31/stable → kubectl/helm/helmfile → core addons (`dns`, `hostpath-storage`, `helm3`, `rbac`) → the `minio` addon **only** for the `minio`/`auto` storage backends (skipped for `rustfs` and others; pass `--storage-backend X` to control this) → optional `nvidia` addon → containerd Docker Hub creds patch (when `~/.docker/config.json` exists) → kubeconfig export. Run as root: `sudo ./microk8s/install.sh [--gpu] [--storage-backend X]`. Idempotent.

### `azure/terraform.sh`, `aws/terraform.sh`

Expand Down
12 changes: 8 additions & 4 deletions deployments/scripts/configure-storage.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
# configure-storage.sh [options]
#
# Options:
# --backend {auto|minio|azure-blob|byo|none} Backend (default: auto)
# --backend {auto|s3|minio|rustfs|azure-blob|byo|none} Backend (default: auto)
# --auth-method {static|workload-identity} Auth mode (default: static)
# --namespace NS OSMO namespace (default: osmo-minimal)
# --output-values PATH Where to write the values fragment
Expand All @@ -44,6 +44,8 @@
# auto — Probe live signals: BYO env vars → microk8s minio addon →
# helm-installed minio service → osmo Azure TF output → fail
# minio — Read MinIO root creds; create osmo-workflow-* Secrets
# rustfs — Read RustFS creds; create osmo-workflow-* Secrets (in-cluster
# S3, mutually exclusive with minio)
# azure-blob — Read STORAGE_ACCOUNT/STORAGE_KEY (env or osmo TF) → connection string
# byo — Read all values from env vars (S3-compatible)
# none — Skip storage configuration entirely (caller will configure later)
Expand Down Expand Up @@ -192,6 +194,8 @@ if [[ "$BACKEND" == "auto" ]]; then
ERROR: no storage backend detected. Pick one explicitly with --backend:

--backend minio — in-cluster MinIO (microk8s addon or helm-installed)
--backend rustfs — in-cluster RustFS S3 store (helm-installed;
mutually exclusive with minio)
--backend s3 — AWS S3; set STORAGE_BUCKET / STORAGE_ACCESS_KEY_ID /
STORAGE_ACCESS_KEY (or use the osmo AWS TF outputs
when s3_bucket_enabled = true)
Expand Down Expand Up @@ -293,9 +297,9 @@ will fail at runtime with 401/403 from S3. There is no safety net here.${NC}

EOF
;;
minio)
log_error "Workload identity is not supported for the minio backend (no cloud-vendor IdP)."
log_error "Use --auth-method static for minio, or switch to azure-blob / byo."
minio|rustfs)
log_error "Workload identity is not supported for the $BACKEND backend (no cloud-vendor IdP)."
log_error "Use --auth-method static for $BACKEND, or switch to azure-blob / byo."
exit 2
;;
esac
Expand Down
18 changes: 14 additions & 4 deletions deployments/scripts/deploy-osmo-minimal.sh
Original file line number Diff line number Diff line change
Expand Up @@ -136,11 +136,15 @@ General Options:
pods pull anonymously, works for public images only.
Set explicitly to reference a pre-created secret
(e.g. AKS-managed "imagepullsecret").
--storage-backend X Storage backend: auto|minio|s3|azure-blob|byo|none (default: auto)
--storage-backend X Storage backend: auto|minio|rustfs|s3|azure-blob|byo|none (default: auto)
rustfs installs the in-cluster RustFS S3 store
(rustfs.com) instead of MinIO; the two are mutually
exclusive (selecting rustfs skips MinIO, including the
MicroK8s minio addon).
--auth-method X Storage auth: static|workload-identity (default: static)
workload-identity REQUIRES caller-provisioned cloud
identity (UAMI for Azure, IAM role for AWS) + RBAC.
Not valid for --storage-backend minio.
Not valid for --storage-backend minio or rustfs.
--workload-identity-client-id ID
Azure UAMI client ID (required for azure-blob + WI)
--workload-identity-role-arn ARN
Expand Down Expand Up @@ -668,6 +672,9 @@ bootstrap_microk8s() {
log_info "Bootstrapping MicroK8s..."
local args=()
[[ "$ENABLE_MICROK8S_GPU" == "true" ]] && args+=(--gpu)
# Pass the storage backend so the minio addon is only enabled for the
# minio/auto backends — rustfs (or any other) must not bring up MinIO.
args+=(--storage-backend "$STORAGE_BACKEND")
sudo "$SCRIPT_DIR/microk8s/install.sh" "${args[@]}"
fi
# Stub the `nvidia` RuntimeClass when running CPU-only. Older chart versions
Expand Down Expand Up @@ -701,8 +708,11 @@ install_cluster_dependencies() {
NO_GPU="$NO_GPU" bash "$SCRIPT_DIR/install-kai-scheduler.sh"
NO_GPU="$NO_GPU" bash "$SCRIPT_DIR/install-gpu-operator.sh"

# MinIO is only installed if the user actually selected it as the backend.
if [[ "$STORAGE_BACKEND" == "minio" ]] || [[ "$STORAGE_BACKEND" == "auto" && "$PROVIDER" == "microk8s" ]]; then
# In-cluster object stores are only installed when explicitly selected.
# MinIO and RustFS are mutually exclusive — never install both.
if [[ "$STORAGE_BACKEND" == "rustfs" ]]; then
bash "$SCRIPT_DIR/install-rustfs.sh"
elif [[ "$STORAGE_BACKEND" == "minio" ]] || [[ "$STORAGE_BACKEND" == "auto" && "$PROVIDER" == "microk8s" ]]; then
bash "$SCRIPT_DIR/install-minio.sh"
fi

Expand Down
Loading
Loading