diff --git a/deployments/scripts/README.md b/deployments/scripts/README.md index ce15b77e7..f957862a7 100644 --- a/deployments/scripts/README.md +++ b/deployments/scripts/README.md @@ -18,7 +18,7 @@ SPDX-License-Identifier: Apache-2.0 # OSMO Deployment Scripts -End-to-end deployer for OSMO 6.3 across multiple Kubernetes flavors and storage backends. The single entry point is `deploy-osmo-minimal.sh`; everything else (Terraform, KAI install, GPU Operator, MinIO, storage credential wiring, smoke tests) is invoked as a phase. +End-to-end deployer for OSMO 6.3 across multiple Kubernetes flavors and storage backends. The single entry point is `deploy-osmo-minimal.sh`; everything else (Terraform, KAI install, GPU Operator, MinIO or RustFS, storage credential wiring, smoke tests) is invoked as a phase. ## Quick Start @@ -50,16 +50,17 @@ Three orthogonal axes: Cells show which auth methods are valid for each `(provider, storage-backend)` pair: -| ↓ Provider \ Storage → | `minio` | `azure-blob` | `s3` | `byo` | -|------------------------|--------------|----------------------|------------|----------------------| -| `azure` (AKS) | static | static, WI | static | static, WI | -| `aws` (EKS) | static | static | static | static, WI (IRSA) | -| `microk8s` (single-node) | static | — | — | static | -| `byo` (any K8s) | static | static, WI* | static | static, WI* | +| ↓ Provider \ Storage → | `minio` | `rustfs` | `azure-blob` | `s3` | `byo` | +|------------------------|--------------|--------------|----------------------|------------|----------------------| +| `azure` (AKS) | static | static | static, WI | static | static, WI | +| `aws` (EKS) | static | static | static | static | static, WI (IRSA) | +| `microk8s` (single-node) | static | static | — | — | static | +| `byo` (any K8s) | static | static | static, WI* | static | static, WI* | \* `workload-identity` on `byo` requires the cluster's K8s API server to have the appropriate OIDC issuer + the cloud-side trust set up by the caller. Notes: +- `rustfs` is an in-cluster, S3-compatible object store ([rustfs.com](https://rustfs.com)) — a drop-in alternative to `minio`. The two are **mutually exclusive**: selecting `rustfs` never installs MinIO and never enables the MicroK8s `minio` addon (a MinIO that's already installed is left untouched — it isn't uninstalled). Like `minio` it has no cloud-identity path (self-hosted), so only `static` auth is valid. - `s3` does **not** support `workload-identity` directly — use `--backend byo --auth-method workload-identity` with IRSA instead. `s3.sh` errors out with this guidance. - `microk8s` deliberately has no cloud-identity path — it's a single-node dev/eval flow. - Cross-cloud combinations (e.g. AKS pointing at S3) are valid for `static` auth. @@ -70,6 +71,7 @@ Notes: |------------|------------------------|--------| | `azure` | `azure-blob` / static | ✅ | | `microk8s` | `minio` / static | ✅ | +| `microk8s` | `rustfs` / static | ⏳ | | `byo` | `minio` / static | ✅ | | `aws` | `s3` / static | ✅ | | `azure` | `azure-blob` / WI | ⏳ | @@ -85,9 +87,10 @@ scripts/ ├── common.sh # Shared logging, OSMO CLI install, helm helpers ├── install-kai-scheduler.sh # KAI Scheduler (idempotent, CRD-detected) ├── install-gpu-operator.sh # NVIDIA GPU Operator (multi-signal auto-skip) -├── install-minio.sh # In-cluster MinIO (bitnami; auto-skips if addon/release present) +├── install-minio.sh # In-cluster MinIO (auto-skips if addon/release present) +├── install-rustfs.sh # In-cluster RustFS via helm (alternative to MinIO; mutually exclusive) ├── configure-storage.sh # 6.3 storage wiring: K8s Secrets + values fragment -├── storage/ # Per-backend storage logic (minio, azure-blob, s3, byo) +├── storage/ # Per-backend storage logic (minio, rustfs, azure-blob, s3, byo) ├── port-forward.sh # One-shot or watchdog kubectl port-forward ├── verify.sh # End-to-end smoke tests (hello + GPU workflows) ├── azure/terraform.sh # Azure Terraform driver @@ -116,6 +119,7 @@ When invoked, the entry-point runs these phases in order. Each is idempotent and - `install-kai-scheduler.sh` (CRD-detected: `podgroups.scheduling.run.ai`) - `install-gpu-operator.sh` (skipped under `--no-gpu`; multi-signal detection: addon, helm release, CR, DaemonSet) - `install-minio.sh` (only when `--storage-backend minio`; skipped if addon/release present) + - `install-rustfs.sh` (only when `--storage-backend rustfs`; standalone helm install, sets `RUSTFS_OBS_ENVIRONMENT=production` + `RUSTFS_OBS_LOGGER_LEVEL=warn`, no resource limits) 3. **Storage credential wiring** - `configure-storage.sh --backend X --auth-method Y` writes K8s Secrets (`osmo-workflow-{data,log,app}-cred`) and emits `values/.storage-values.yaml` for the helm install to merge 4. **OSMO Helm install** (`deploy-k8s.sh`) @@ -136,7 +140,7 @@ Main entry point — see `--help` for the full flag list. Orchestrates all phase | Flag | Purpose | |------|---------| | `--provider {azure,aws,microk8s,byo}` | Required. Selects bootstrap path. | -| `--storage-backend {auto,minio,azure-blob,s3,byo,none}` | Default `auto`: chooses based on provider (azure→azure-blob, aws→s3, microk8s→minio, byo→error). | +| `--storage-backend {auto,minio,rustfs,azure-blob,s3,byo,none}` | Default `auto`: chooses based on provider (azure→azure-blob, aws→s3, microk8s→minio, byo→error). `rustfs` installs the in-cluster RustFS S3 store instead of MinIO (mutually exclusive). | | `--auth-method {static,workload-identity}` | Default `static`. See [Deployment Combinations](#deployment-combinations) for what's supported per backend. | | `--workload-identity-client-id ID` | Azure UAMI client ID (azure-blob + WI). | | `--workload-identity-role-arn ARN` | AWS IAM role ARN (byo + WI / IRSA). | @@ -204,14 +208,15 @@ Each is idempotent — safe to invoke on a cluster where the target component al |--------|---------|---------------------| | `install-kai-scheduler.sh` | KAI Scheduler v0.14.0 (gang scheduling) | CRD `podgroups.scheduling.run.ai` | | `install-gpu-operator.sh` | NVIDIA GPU Operator (drivers + container toolkit) | microk8s `nvidia` addon, helm release in any ns, `clusterpolicies.nvidia.com` CR (covers NVAIE), or `nvidia-device-plugin` DaemonSet | -| `install-minio.sh` | Bitnami MinIO chart | microk8s `minio` addon or existing `minio` service in `minio-operator` ns | -| `configure-storage.sh` | 6.3 storage wiring: K8s Secrets + helm values fragment for `services.configs.workflow.workflow_*.credential.secretName`. Dispatcher → `storage/{minio,azure-blob,s3,byo}.sh`. | n/a — backend chosen via `--backend` | +| `install-minio.sh` | Single-pod MinIO (plain manifests) | microk8s `minio` addon or existing `minio` service in `minio-operator` ns | +| `install-rustfs.sh` | RustFS helm chart (`https://charts.rustfs.com`), standalone mode. Always sets `RUSTFS_OBS_ENVIRONMENT=production` + `RUSTFS_OBS_LOGGER_LEVEL=warn` (perf-critical) and runs with no resource limits. Never installs/adds MinIO; an already-installed MinIO is left untouched (warns only). | existing `rustfs` helm release or ready `rustfs` Deployment | +| `configure-storage.sh` | 6.3 storage wiring: K8s Secrets + helm values fragment for `services.configs.workflow.workflow_*.credential.secretName`. Dispatcher → `storage/{minio,rustfs,azure-blob,s3,byo}.sh`. | n/a — backend chosen via `--backend` | | `port-forward.sh` | One-shot or `--watchdog` PF, tagged `osmo-pf-watchdog:` for cleanup with `pkill -f 'osmo-pf-watchdog:'`. Watchdog readiness waits up to `OSMO_PF_HEALTH_TIMEOUT_SECONDS` (default 300). | Reuses live PF if context+namespace match | | `verify.sh` | Submits `workflows/verify-hello.yaml` + `verify-gpu.yaml`; polls until terminal state, dumps logs on failure. `SKIP_GPU=1` to skip GPU test. | n/a | ### `microk8s/install.sh` -Single-node MicroK8s bootstrap, used only by `--provider microk8s`. Installs snapd → microk8s 1.31/stable → kubectl/helm/helmfile → core addons (`dns`, `hostpath-storage`, `helm3`, `rbac`, `minio`) → optional `nvidia` addon → containerd Docker Hub creds patch (when `~/.docker/config.json` exists) → kubeconfig export. Run as root: `sudo ./microk8s/install.sh [--gpu]`. Idempotent. +Single-node MicroK8s bootstrap, used only by `--provider microk8s`. Installs snapd → microk8s 1.31/stable → kubectl/helm/helmfile → core addons (`dns`, `hostpath-storage`, `helm3`, `rbac`) → the `minio` addon **only** for the `minio`/`auto` storage backends (skipped for `rustfs` and others; pass `--storage-backend X` to control this) → optional `nvidia` addon → containerd Docker Hub creds patch (when `~/.docker/config.json` exists) → kubeconfig export. Run as root: `sudo ./microk8s/install.sh [--gpu] [--storage-backend X]`. Idempotent. ### `azure/terraform.sh`, `aws/terraform.sh` diff --git a/deployments/scripts/configure-storage.sh b/deployments/scripts/configure-storage.sh index 55f5a6fd2..4371b5519 100755 --- a/deployments/scripts/configure-storage.sh +++ b/deployments/scripts/configure-storage.sh @@ -30,7 +30,7 @@ # configure-storage.sh [options] # # Options: -# --backend {auto|minio|azure-blob|byo|none} Backend (default: auto) +# --backend {auto|s3|minio|rustfs|azure-blob|byo|none} Backend (default: auto) # --auth-method {static|workload-identity} Auth mode (default: static) # --namespace NS OSMO namespace (default: osmo-minimal) # --output-values PATH Where to write the values fragment @@ -44,6 +44,8 @@ # auto — Probe live signals: BYO env vars → microk8s minio addon → # helm-installed minio service → osmo Azure TF output → fail # minio — Read MinIO root creds; create osmo-workflow-* Secrets +# rustfs — Read RustFS creds; create osmo-workflow-* Secrets (in-cluster +# S3, mutually exclusive with minio) # azure-blob — Read STORAGE_ACCOUNT/STORAGE_KEY (env or osmo TF) → connection string # byo — Read all values from env vars (S3-compatible) # none — Skip storage configuration entirely (caller will configure later) @@ -192,6 +194,8 @@ if [[ "$BACKEND" == "auto" ]]; then ERROR: no storage backend detected. Pick one explicitly with --backend: --backend minio — in-cluster MinIO (microk8s addon or helm-installed) + --backend rustfs — in-cluster RustFS S3 store (helm-installed; + mutually exclusive with minio) --backend s3 — AWS S3; set STORAGE_BUCKET / STORAGE_ACCESS_KEY_ID / STORAGE_ACCESS_KEY (or use the osmo AWS TF outputs when s3_bucket_enabled = true) @@ -293,9 +297,9 @@ will fail at runtime with 401/403 from S3. There is no safety net here.${NC} EOF ;; - minio) - log_error "Workload identity is not supported for the minio backend (no cloud-vendor IdP)." - log_error "Use --auth-method static for minio, or switch to azure-blob / byo." + minio|rustfs) + log_error "Workload identity is not supported for the $BACKEND backend (no cloud-vendor IdP)." + log_error "Use --auth-method static for $BACKEND, or switch to azure-blob / byo." exit 2 ;; esac diff --git a/deployments/scripts/deploy-osmo-minimal.sh b/deployments/scripts/deploy-osmo-minimal.sh index b0e845a61..d65735770 100755 --- a/deployments/scripts/deploy-osmo-minimal.sh +++ b/deployments/scripts/deploy-osmo-minimal.sh @@ -136,11 +136,15 @@ General Options: pods pull anonymously, works for public images only. Set explicitly to reference a pre-created secret (e.g. AKS-managed "imagepullsecret"). - --storage-backend X Storage backend: auto|minio|s3|azure-blob|byo|none (default: auto) + --storage-backend X Storage backend: auto|minio|rustfs|s3|azure-blob|byo|none (default: auto) + rustfs installs the in-cluster RustFS S3 store + (rustfs.com) instead of MinIO; the two are mutually + exclusive (selecting rustfs skips MinIO, including the + MicroK8s minio addon). --auth-method X Storage auth: static|workload-identity (default: static) workload-identity REQUIRES caller-provisioned cloud identity (UAMI for Azure, IAM role for AWS) + RBAC. - Not valid for --storage-backend minio. + Not valid for --storage-backend minio or rustfs. --workload-identity-client-id ID Azure UAMI client ID (required for azure-blob + WI) --workload-identity-role-arn ARN @@ -668,6 +672,9 @@ bootstrap_microk8s() { log_info "Bootstrapping MicroK8s..." local args=() [[ "$ENABLE_MICROK8S_GPU" == "true" ]] && args+=(--gpu) + # Pass the storage backend so the minio addon is only enabled for the + # minio/auto backends — rustfs (or any other) must not bring up MinIO. + args+=(--storage-backend "$STORAGE_BACKEND") sudo "$SCRIPT_DIR/microk8s/install.sh" "${args[@]}" fi # Stub the `nvidia` RuntimeClass when running CPU-only. Older chart versions @@ -701,8 +708,11 @@ install_cluster_dependencies() { NO_GPU="$NO_GPU" bash "$SCRIPT_DIR/install-kai-scheduler.sh" NO_GPU="$NO_GPU" bash "$SCRIPT_DIR/install-gpu-operator.sh" - # MinIO is only installed if the user actually selected it as the backend. - if [[ "$STORAGE_BACKEND" == "minio" ]] || [[ "$STORAGE_BACKEND" == "auto" && "$PROVIDER" == "microk8s" ]]; then + # In-cluster object stores are only installed when explicitly selected. + # MinIO and RustFS are mutually exclusive — never install both. + if [[ "$STORAGE_BACKEND" == "rustfs" ]]; then + bash "$SCRIPT_DIR/install-rustfs.sh" + elif [[ "$STORAGE_BACKEND" == "minio" ]] || [[ "$STORAGE_BACKEND" == "auto" && "$PROVIDER" == "microk8s" ]]; then bash "$SCRIPT_DIR/install-minio.sh" fi diff --git a/deployments/scripts/install-rustfs.sh b/deployments/scripts/install-rustfs.sh new file mode 100755 index 000000000..39f764e8b --- /dev/null +++ b/deployments/scripts/install-rustfs.sh @@ -0,0 +1,202 @@ +#!/bin/bash +# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# SPDX-License-Identifier: Apache-2.0 + +############################################################################### +# Install RustFS (in-cluster S3 backend for OSMO workflow storage) +# +# Used when --storage-backend rustfs is selected. RustFS (rustfs.com) is a +# self-hosted, S3-compatible object store and a drop-in alternative to MinIO. +# Installed via the official Helm chart (https://charts.rustfs.com) in +# standalone mode (single pod, single PVC) — the right shape for the +# single-node / eval clusters this deployer targets. +# +# Critical performance settings (set unconditionally — see chart configmap): +# config.rustfs.obs_environment -> RUSTFS_OBS_ENVIRONMENT = "production" +# config.rustfs.log_level -> RUSTFS_OBS_LOGGER_LEVEL = "warn" +# Leaving these at the chart defaults ("development" / "info") makes RustFS log +# verbosely on the hot path and degrades throughput significantly. +# +# RustFS runs without resource limits — its throughput is sensitive to CPU +# throttling, and the chart's tiny defaults (200m CPU / 512Mi mem limits) would +# hobble it. We emit `resources: {}` so neither requests nor limits are set. +# +# MinIO exclusivity: RustFS and MinIO are mutually exclusive, but this script +# never uninstalls an existing MinIO — it simply doesn't install or add one. +# microk8s/install.sh skips enabling the `minio` addon for --storage-backend +# rustfs, so a fresh bootstrap never brings it up in the first place. +# +# Skips when: +# - a rustfs helm release already exists in the target namespace +# - a ready rustfs Deployment already exists in the target namespace +############################################################################### + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +source "$SCRIPT_DIR/common.sh" + +RUSTFS_NAMESPACE="${RUSTFS_NAMESPACE:-rustfs}" +RUSTFS_RELEASE="${RUSTFS_RELEASE:-rustfs}" +RUSTFS_CHART_REPO_NAME="${RUSTFS_CHART_REPO_NAME:-rustfs}" +RUSTFS_CHART_REPO_URL="${RUSTFS_CHART_REPO_URL:-https://charts.rustfs.com}" +# Pin the chart version for reproducible installs. Empty = latest in the repo. +RUSTFS_CHART_VERSION="${RUSTFS_CHART_VERSION:-}" +# Optional image tag override (chart appVersion default when unset). +RUSTFS_IMAGE_TAG="${RUSTFS_IMAGE_TAG:-}" +RUSTFS_STORAGE_SIZE="${RUSTFS_STORAGE_SIZE:-20Gi}" +RUSTFS_LOG_STORAGE_SIZE="${RUSTFS_LOG_STORAGE_SIZE:-1Gi}" +# StorageClass for the RustFS PVCs. Empty = use the cluster default, falling +# back to the first StorageClass found (same logic as install-minio.sh). +RUSTFS_STORAGE_CLASS="${RUSTFS_STORAGE_CLASS:-}" +# The chart rejects the well-known default credentials (rustfsadmin/rustfsadmin) +# unless secret.allowInsecureDefaults=true, so both keys must be non-default. +RUSTFS_ACCESS_KEY="${RUSTFS_ACCESS_KEY:-osmoadmin}" +RUSTFS_SECRET_KEY="${RUSTFS_SECRET_KEY:-}" +RUSTFS_ROLLOUT_TIMEOUT="${RUSTFS_ROLLOUT_TIMEOUT:-5m}" + +KUBECTL="${KUBECTL:-kubectl}" +HELM="${HELM:-helm}" + +detect_existing_rustfs() { + if $HELM status "$RUSTFS_RELEASE" -n "$RUSTFS_NAMESPACE" &>/dev/null; then + echo "helm release $RUSTFS_RELEASE/$RUSTFS_NAMESPACE" + return 0 + fi + if $KUBECTL get svc "$RUSTFS_RELEASE-svc" -n "$RUSTFS_NAMESPACE" &>/dev/null \ + && [[ "$($KUBECTL get deployment "$RUSTFS_RELEASE" -n "$RUSTFS_NAMESPACE" -o jsonpath='{.status.availableReplicas}' 2>/dev/null)" -ge 1 ]]; then + echo "deployment $RUSTFS_RELEASE/$RUSTFS_NAMESPACE" + return 0 + fi + return 1 +} + +main() { + check_command "$KUBECTL" + check_command "$HELM" + + # Note: if MinIO is already installed we leave it alone — RustFS and MinIO + # are mutually exclusive, but we never uninstall a pre-existing MinIO. + if microk8s_addon_enabled minio; then + log_warning "MicroK8s 'minio' addon is enabled. RustFS is being installed alongside it;" + log_warning "OSMO will use RustFS for storage. To remove MinIO, run 'microk8s disable minio' yourself." + fi + + local detection + if detection=$(detect_existing_rustfs); then + log_warning "RustFS already provided by: $detection — skipping" + return 0 + fi + + if [[ -z "$RUSTFS_SECRET_KEY" ]]; then + check_command openssl + RUSTFS_SECRET_KEY=$(openssl rand -base64 24 | tr -d '/+=' | head -c 32) + log_info "Generated RustFS secret key (set RUSTFS_SECRET_KEY to override)" + fi + + # Resolve PVC StorageClass: explicit override -> cluster default -> first SC. + # The chart writes storageClassName verbatim into the PVCs, so an empty + # value would disable the default-class fallback and leave them Pending. + if [[ -z "$RUSTFS_STORAGE_CLASS" ]]; then + RUSTFS_STORAGE_CLASS="$($KUBECTL get storageclass \ + -o jsonpath='{range .items[?(@.metadata.annotations.storageclass\.kubernetes\.io/is-default-class=="true")]}{.metadata.name}{"\n"}{end}' \ + 2>/dev/null | head -n1)" + fi + if [[ -z "$RUSTFS_STORAGE_CLASS" ]]; then + RUSTFS_STORAGE_CLASS="$($KUBECTL get storageclass \ + -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || true)" + fi + if [[ -n "$RUSTFS_STORAGE_CLASS" ]]; then + log_info "Using StorageClass: $RUSTFS_STORAGE_CLASS" + else + log_warning "No StorageClass found; RustFS PVCs may stay Pending" + fi + + log_info "Installing RustFS into namespace $RUSTFS_NAMESPACE (standalone mode)" + + $HELM repo add "$RUSTFS_CHART_REPO_NAME" "$RUSTFS_CHART_REPO_URL" --force-update >/dev/null + $HELM repo update "$RUSTFS_CHART_REPO_NAME" >/dev/null + + local values_file + values_file="$(mktemp)" + trap 'rm -f "$values_file"' RETURN + + cat > "$values_file" <> "$values_file" + fi + + if [[ -n "$RUSTFS_IMAGE_TAG" ]]; then + cat >> "$values_file" <= 525 +# sudo ./install.sh # CPU-only, minio addon enabled +# sudo ./install.sh --gpu # GPU instance, NVIDIA driver >= 525 +# sudo ./install.sh --storage-backend rustfs # skip the minio addon (RustFS path) set -euo pipefail CHANNEL="${MICROK8S_CHANNEL:-1.31/stable}" ENABLE_GPU=false +# Storage backend the OSMO deploy will use. The `minio` addon is only enabled +# for the minio/auto backends; selecting rustfs (or any other backend) skips it +# so MinIO isn't present even as an addon. Defaults to auto to preserve the +# standalone `sudo ./install.sh` behavior (auto -> minio on single-node). +STORAGE_BACKEND="${STORAGE_BACKEND:-auto}" REAL_USER="${SUDO_USER:-${USER:-ubuntu}}" REAL_HOME=$(getent passwd "$REAL_USER" | cut -d: -f6) SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" -for arg in "$@"; do [[ "$arg" == "--gpu" ]] && ENABLE_GPU=true; done +# Preserve the original invocation for the run-as-root error message before the +# parse loop consumes the positional parameters. +ORIG_ARGS="$*" + +while [[ $# -gt 0 ]]; do + case "$1" in + --gpu) ENABLE_GPU=true; shift ;; + --storage-backend) STORAGE_BACKEND="$2"; shift 2 ;; + *) shift ;; + esac +done # ── Preflight ───────────────────────────────────────────────────────────────── PASS=true if [[ "$EUID" -ne 0 ]]; then - echo "ERROR: must be run as root — use: sudo $0 $*" + echo "ERROR: must be run as root — use: sudo $0 $ORIG_ARGS" PASS=false fi @@ -134,7 +150,20 @@ fi # Note: `registry` is intentionally NOT enabled — OSMO doesn't use a local # image registry. Add it if your workflow needs `localhost:32000`. echo "==> Enabling addons" -microk8s enable dns hostpath-storage helm3 rbac minio +microk8s enable dns hostpath-storage helm3 rbac + +# The `minio` addon is only for the minio/auto storage backends. For rustfs (or +# any other backend) it must NOT be enabled — MinIO and RustFS are mutually +# exclusive, so MinIO mustn't be present even as an addon. +case "$STORAGE_BACKEND" in + minio|auto) + echo "==> Enabling minio addon (storage backend: $STORAGE_BACKEND)" + microk8s enable minio + ;; + *) + echo "==> Skipping minio addon (storage backend: $STORAGE_BACKEND)" + ;; +esac # ── 5. GPU addon ───────────────────────────────────────────────────────────── # Symlink workaround needed when host driver is pre-installed (vs container- diff --git a/deployments/scripts/storage/BUILD b/deployments/scripts/storage/BUILD new file mode 100644 index 000000000..f1d5ed252 --- /dev/null +++ b/deployments/scripts/storage/BUILD @@ -0,0 +1,22 @@ +""" +SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION. All rights reserved. +SPDX-License-Identifier: Apache-2.0 +""" + +sh_test( + name = "minio_addressing_style_test", + srcs = ["tests/minio_addressing_style_test.sh"], + data = [ + "common.sh", + "minio.sh", + ], +) + +sh_test( + name = "rustfs_addressing_style_test", + srcs = ["tests/rustfs_addressing_style_test.sh"], + data = [ + "common.sh", + "rustfs.sh", + ], +) diff --git a/deployments/scripts/storage/rustfs.sh b/deployments/scripts/storage/rustfs.sh new file mode 100755 index 000000000..b1128379c --- /dev/null +++ b/deployments/scripts/storage/rustfs.sh @@ -0,0 +1,133 @@ +#!/bin/bash +# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# RustFS storage backend for configure-storage.sh. +# Expects KUBECTL, NAMESPACE, OUTPUT_VALUES in env (set by the dispatcher). +# +# RustFS (rustfs.com) is a self-hosted, S3-compatible object store installed by +# install-rustfs.sh via the official Helm chart. This helper mirrors minio.sh: +# it discovers credentials, ensures the `osmo-workflows` bucket exists, writes +# the 3 workflow credential Secrets, and emits the Helm values fragment. +# +# Discovers RustFS credentials in priority order: +# 1. RUSTFS_ACCESS_KEY + RUSTFS_SECRET_KEY env vars +# 2. chart secret (rustfs/-secret, keys RUSTFS_ACCESS_KEY/RUSTFS_SECRET_KEY) + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +source "$SCRIPT_DIR/common.sh" + +KUBECTL="${KUBECTL:-kubectl}" +NAMESPACE="${NAMESPACE:?NAMESPACE not set}" +OUTPUT_VALUES="${OUTPUT_VALUES:?OUTPUT_VALUES not set}" +AUTH_METHOD="${AUTH_METHOD:-static}" +NGC_SECRET_NAME="${NGC_SECRET_NAME:-}" + +# RustFS is a self-hosted in-cluster S3 — no cloud-vendor identity provider. +# Workload identity is meaningless here (same as minio). +if [[ "$AUTH_METHOD" == "workload-identity" ]]; then + cat >&2 <<'MSG' +[ERROR] --auth-method workload-identity is not supported for the `rustfs` backend. + +RustFS is self-hosted; there is no cloud-vendor identity provider to federate +against. Use --auth-method static (default) for RustFS, or switch to +--storage-backend azure-blob / byo to use Azure WI / AWS IRSA. +MSG + exit 2 +fi + +RUSTFS_RELEASE="${RUSTFS_RELEASE:-rustfs}" +RUSTFS_BUCKET="${RUSTFS_BUCKET:-${OSMO_WORKFLOW_BUCKET:-osmo-workflows}}" +RUSTFS_NAMESPACE="${RUSTFS_NAMESPACE:-rustfs}" +RUSTFS_ADDRESSING_STYLE="${RUSTFS_ADDRESSING_STYLE:-${STORAGE_ADDRESSING_STYLE:-path}}" +validate_addressing_style "$RUSTFS_ADDRESSING_STYLE" +RUSTFS_SVC_DNS="${RUSTFS_RELEASE}-svc.${RUSTFS_NAMESPACE}.svc.cluster.local" +# Discover the Service port (chart default 9000). Fall back to 9000. +RUSTFS_SVC_PORT=$($KUBECTL get svc "${RUSTFS_RELEASE}-svc" -n "$RUSTFS_NAMESPACE" \ + -o jsonpath='{.spec.ports[?(@.name=="endpoint")].port}' 2>/dev/null || true) +RUSTFS_SVC_PORT="${RUSTFS_SVC_PORT:-9000}" +RUSTFS_ENDPOINT_URL="http://${RUSTFS_SVC_DNS}:${RUSTFS_SVC_PORT}" + +read_creds_from_chart_secret() { + # install-rustfs.sh deploys the rustfs chart, which writes credentials to + # secret `-secret` with keys RUSTFS_ACCESS_KEY / RUSTFS_SECRET_KEY. + local secret_name="${RUSTFS_RELEASE}-secret" + RUSTFS_USER=$($KUBECTL get secret "$secret_name" -n "$RUSTFS_NAMESPACE" \ + -o jsonpath='{.data.RUSTFS_ACCESS_KEY}' 2>/dev/null | base64 -d 2>/dev/null || echo "") + RUSTFS_PASS=$($KUBECTL get secret "$secret_name" -n "$RUSTFS_NAMESPACE" \ + -o jsonpath='{.data.RUSTFS_SECRET_KEY}' 2>/dev/null | base64 -d 2>/dev/null || echo "") + [[ -n "$RUSTFS_USER" && -n "$RUSTFS_PASS" ]] +} + +# 1. Discover credentials +if [[ -n "${RUSTFS_ACCESS_KEY:-}" && -n "${RUSTFS_SECRET_KEY:-}" ]]; then + RUSTFS_USER="$RUSTFS_ACCESS_KEY" + RUSTFS_PASS="$RUSTFS_SECRET_KEY" + echo "[INFO] Using RustFS credentials from env vars" +elif read_creds_from_chart_secret; then + echo "[INFO] Using RustFS credentials from chart secret ${RUSTFS_RELEASE}-secret" +else + echo "[ERROR] Could not discover RustFS credentials. Set RUSTFS_ACCESS_KEY + RUSTFS_SECRET_KEY" >&2 + exit 1 +fi + +# 2. Create the bucket via the AWS CLI running as a one-shot pod inside the +# cluster. RustFS is 100% S3-compatible, so we use the vendor-neutral +# `aws s3api` (Apache-2.0) rather than MinIO's `mc` (AGPLv3) — pulling the +# MinIO client to bootstrap a MinIO alternative would be both ironic and a +# license mismatch for this Apache-2.0 repo. Path-style addressing is forced +# to match how OSMO talks to the in-cluster endpoint (a bare Service DNS name +# isn't virtual-host addressable). head-bucket makes this idempotent without +# parsing provider-specific "already exists" error strings. A unique +# per-invocation pod name avoids collisions with a prior run's helper pod +# stuck Terminating; `--rm` reaps it; `timeout` guards against stuck image +# pull / Pending-forever scheduling. +BUCKET_SETUP_TIMEOUT="${BUCKET_SETUP_TIMEOUT:-300}" +# Pinned to an immutable, multi-arch (amd64+arm64) manifest-list digest rather +# than the floating :latest, so the bootstrap image is reproducible. The tag is +# kept for readability; containerd resolves by digest. +AWS_CLI_IMAGE="${AWS_CLI_IMAGE:-amazon/aws-cli:2.31.10@sha256:c3545440ffb85aac40c104d7fe5cb885d0ed26e91d95a433094a9dba9ddfacd6}" +BUCKET_SETUP_POD="rustfs-bucket-setup-$RANDOM-$RANDOM" +echo "[INFO] Ensuring RustFS bucket $RUSTFS_BUCKET exists (helper pod: $BUCKET_SETUP_POD)" +# Credentials and connection details are passed as pod env vars (--env), never +# interpolated into the /bin/sh command string — the script body is single- +# quoted and reads everything from the environment. $KUBECTL is quoted so a +# multi-word override (e.g. "microk8s kubectl") doesn't word-split incorrectly. +timeout "$BUCKET_SETUP_TIMEOUT" \ + "$KUBECTL" run "$BUCKET_SETUP_POD" --rm -i --restart=Never \ + --namespace="$RUSTFS_NAMESPACE" \ + --image="$AWS_CLI_IMAGE" \ + --env="AWS_ACCESS_KEY_ID=$RUSTFS_USER" \ + --env="AWS_SECRET_ACCESS_KEY=$RUSTFS_PASS" \ + --env="AWS_DEFAULT_REGION=us-east-1" \ + --env="RUSTFS_ENDPOINT_URL=$RUSTFS_ENDPOINT_URL" \ + --env="RUSTFS_BUCKET=$RUSTFS_BUCKET" \ + --command -- \ + /bin/sh -c ' + set -e + mkdir -p "$HOME/.aws" + printf "[default]\ns3 =\n addressing_style = path\n" > "$HOME/.aws/config" + if aws --endpoint-url "$RUSTFS_ENDPOINT_URL" s3api head-bucket --bucket "$RUSTFS_BUCKET" 2>/dev/null; then + echo "Bucket already exists: $RUSTFS_BUCKET" + else + aws --endpoint-url "$RUSTFS_ENDPOINT_URL" s3api create-bucket --bucket "$RUSTFS_BUCKET" + echo "Bucket ready: $RUSTFS_BUCKET" + fi + ' || { echo "[ERROR] aws-cli bucket setup failed"; exit 1; } + +# 3. Create 3 K8s Secrets, one per workflow_* credential reference. +create_workflow_cred_secrets \ + "$RUSTFS_USER" "$RUSTFS_PASS" "s3://$RUSTFS_BUCKET" "us-east-1" "$RUSTFS_ENDPOINT_URL" \ + "$RUSTFS_ADDRESSING_STYLE" + +# 4. Emit Helm values fragment. +emit_static_values_fragment rustfs "s3://$RUSTFS_BUCKET" + +echo "[INFO] RustFS storage configured:" +echo " bucket: s3://$RUSTFS_BUCKET" +echo " endpoint: $RUSTFS_ENDPOINT_URL" +echo " addressing: $RUSTFS_ADDRESSING_STYLE" +echo " secrets: osmo-workflow-{data,log,app}-cred in $NAMESPACE" +echo " values: $OUTPUT_VALUES" diff --git a/deployments/scripts/storage/tests/minio_addressing_style_test.sh b/deployments/scripts/storage/tests/minio_addressing_style_test.sh new file mode 100755 index 000000000..823799c19 --- /dev/null +++ b/deployments/scripts/storage/tests/minio_addressing_style_test.sh @@ -0,0 +1,92 @@ +#!/usr/bin/env bash +# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +set -euo pipefail + +if [[ -n "${RUNFILES_DIR:-}" && -f "$RUNFILES_DIR/_main/deployments/scripts/storage/minio.sh" ]]; then + MINIO_SCRIPT="$RUNFILES_DIR/_main/deployments/scripts/storage/minio.sh" +else + SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" + MINIO_SCRIPT="$SCRIPT_DIR/minio.sh" +fi +TEST_DIR="${TEST_TMPDIR:-$(mktemp -d)}" +KUBECTL_LOG="$TEST_DIR/kubectl.log" +FAKE_BIN="$TEST_DIR/bin" +mkdir -p "$FAKE_BIN" + +cat > "$FAKE_BIN/kubectl" <<'EOF' +#!/usr/bin/env bash +set -euo pipefail +echo "$*" >> "$KUBECTL_LOG" +case "$1 $2" in + "get svc") + echo 9000 + ;; + "run minio-bucket-setup-"*) + ;; + "create secret") + printf 'apiVersion: v1\nkind: Secret\n' + ;; + "apply -f") + cat >/dev/null + ;; + *) + echo "unexpected kubectl args: $*" >&2 + exit 1 + ;; +esac +EOF +chmod +x "$FAKE_BIN/kubectl" + +cat > "$FAKE_BIN/timeout" <<'EOF' +#!/usr/bin/env bash +set -euo pipefail +shift +"$@" +EOF +chmod +x "$FAKE_BIN/timeout" + +run_minio() { + local addressing_style="${1:-}" + : > "$KUBECTL_LOG" + if [[ -n "$addressing_style" ]]; then + env \ + PATH="$FAKE_BIN:$PATH" \ + KUBECTL=kubectl \ + KUBECTL_LOG="$KUBECTL_LOG" \ + NAMESPACE=osmo \ + OUTPUT_VALUES="$TEST_DIR/values.yaml" \ + MINIO_ROOT_USER=minio \ + MINIO_ROOT_PASSWORD=password \ + MINIO_ADDRESSING_STYLE="$addressing_style" \ + bash "$MINIO_SCRIPT" + else + # Clear any inherited addressing-style vars so the default-path + # assertion is deterministic regardless of the runner's environment. + env -u MINIO_ADDRESSING_STYLE -u STORAGE_ADDRESSING_STYLE \ + PATH="$FAKE_BIN:$PATH" \ + KUBECTL=kubectl \ + KUBECTL_LOG="$KUBECTL_LOG" \ + NAMESPACE=osmo \ + OUTPUT_VALUES="$TEST_DIR/values.yaml" \ + MINIO_ROOT_USER=minio \ + MINIO_ROOT_PASSWORD=password \ + bash "$MINIO_SCRIPT" + fi +} + +assert_kubectl_log_contains() { + local expected="$1" + if ! grep -q -- "$expected" "$KUBECTL_LOG"; then + echo "Expected kubectl log to contain: $expected" >&2 + cat "$KUBECTL_LOG" >&2 + exit 1 + fi +} + +run_minio +assert_kubectl_log_contains "--from-literal=addressing_style=path" + +run_minio virtual +assert_kubectl_log_contains "--from-literal=addressing_style=virtual" diff --git a/deployments/scripts/storage/tests/rustfs_addressing_style_test.sh b/deployments/scripts/storage/tests/rustfs_addressing_style_test.sh new file mode 100644 index 000000000..4702a49d8 --- /dev/null +++ b/deployments/scripts/storage/tests/rustfs_addressing_style_test.sh @@ -0,0 +1,92 @@ +#!/usr/bin/env bash +# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +set -euo pipefail + +if [[ -n "${RUNFILES_DIR:-}" && -f "$RUNFILES_DIR/_main/deployments/scripts/storage/rustfs.sh" ]]; then + RUSTFS_SCRIPT="$RUNFILES_DIR/_main/deployments/scripts/storage/rustfs.sh" +else + SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" + RUSTFS_SCRIPT="$SCRIPT_DIR/rustfs.sh" +fi +TEST_DIR="${TEST_TMPDIR:-$(mktemp -d)}" +KUBECTL_LOG="$TEST_DIR/kubectl.log" +FAKE_BIN="$TEST_DIR/bin" +mkdir -p "$FAKE_BIN" + +cat > "$FAKE_BIN/kubectl" <<'EOF' +#!/usr/bin/env bash +set -euo pipefail +echo "$*" >> "$KUBECTL_LOG" +case "$1 $2" in + "get svc") + echo 9000 + ;; + "run rustfs-bucket-setup-"*) + ;; + "create secret") + printf 'apiVersion: v1\nkind: Secret\n' + ;; + "apply -f") + cat >/dev/null + ;; + *) + echo "unexpected kubectl args: $*" >&2 + exit 1 + ;; +esac +EOF +chmod +x "$FAKE_BIN/kubectl" + +cat > "$FAKE_BIN/timeout" <<'EOF' +#!/usr/bin/env bash +set -euo pipefail +shift +"$@" +EOF +chmod +x "$FAKE_BIN/timeout" + +run_rustfs() { + local addressing_style="${1:-}" + : > "$KUBECTL_LOG" + if [[ -n "$addressing_style" ]]; then + env \ + PATH="$FAKE_BIN:$PATH" \ + KUBECTL=kubectl \ + KUBECTL_LOG="$KUBECTL_LOG" \ + NAMESPACE=osmo \ + OUTPUT_VALUES="$TEST_DIR/values.yaml" \ + RUSTFS_ACCESS_KEY=osmoadmin \ + RUSTFS_SECRET_KEY=password \ + RUSTFS_ADDRESSING_STYLE="$addressing_style" \ + bash "$RUSTFS_SCRIPT" + else + # Clear any inherited addressing-style vars so the default-path + # assertion is deterministic regardless of the runner's environment. + env -u RUSTFS_ADDRESSING_STYLE -u STORAGE_ADDRESSING_STYLE \ + PATH="$FAKE_BIN:$PATH" \ + KUBECTL=kubectl \ + KUBECTL_LOG="$KUBECTL_LOG" \ + NAMESPACE=osmo \ + OUTPUT_VALUES="$TEST_DIR/values.yaml" \ + RUSTFS_ACCESS_KEY=osmoadmin \ + RUSTFS_SECRET_KEY=password \ + bash "$RUSTFS_SCRIPT" + fi +} + +assert_kubectl_log_contains() { + local expected="$1" + if ! grep -q -- "$expected" "$KUBECTL_LOG"; then + echo "Expected kubectl log to contain: $expected" >&2 + cat "$KUBECTL_LOG" >&2 + exit 1 + fi +} + +run_rustfs +assert_kubectl_log_contains "--from-literal=addressing_style=path" + +run_rustfs virtual +assert_kubectl_log_contains "--from-literal=addressing_style=virtual"