From 4a1cfef75327c68921e73706954debcdc0a95044 Mon Sep 17 00:00:00 2001 From: Gianpaolo Sanseverino Date: Tue, 9 Jun 2026 17:35:36 +0200 Subject: [PATCH 1/3] =?UTF-8?q?Add=20project=20isolation=20testing=20?= =?UTF-8?q?=E2=80=94=20script=20+=20model=20doc?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two paired artifacts that document and exercise the four layers k8tre relies on to keep one Project's resources out of reach of another Project's users. tests/test-project-isolation.sh Idempotent end-to-end check. Creates two Projects (alpha, bravo) + Groups + Users (alice, bob), then asserts: - Keycloak password-grant works and the JWT has aud=backend - /auth/validate returns 200 for own project, 403 for the other, symmetrically for both users (the real authz gate, layer 2) - default ServiceAccount in project-alpha cannot list/create/ delete pods or secrets in project-bravo (layer 3, RBAC) - a pod in project-alpha can't TCP to a pod in project-bravo (layer 4, Cilium) - User CRs exist with the right group memberships On a healthy cluster: 14 PASS, 0 FAIL, 2 WEAK (documented). docs/project-isolation.md Walks through the four enforcement layers (UX /projects filtering, /auth/validate gate, Kubernetes RBAC, Cilium NetworkPolicy), how to run the script, what it does and does not cover, the two known weak spots (logged-in users can still GET /projects//apps and /launch// — metadata leak, no data leak; one-line fix in get_apps()/launch_app() noted), and the three Keycloak + network-policy quirks the script's setup phase has to work around. Co-Authored-By: Claude Opus 4.7 --- docs/project-isolation.md | 201 ++++++++++++++++++++ tests/test-project-isolation.sh | 314 ++++++++++++++++++++++++++++++++ 2 files changed, 515 insertions(+) create mode 100644 docs/project-isolation.md create mode 100755 tests/test-project-isolation.sh diff --git a/docs/project-isolation.md b/docs/project-isolation.md new file mode 100644 index 00000000..333ff65c --- /dev/null +++ b/docs/project-isolation.md @@ -0,0 +1,201 @@ +# k8tre — project isolation model and how to test it + +This doc describes the four layers that keep one k8tre Project's resources +out of reach of another Project's users, and the +[`tests/test-project-isolation.sh`](../tests/test-project-isolation.sh) +script that exercises all four end-to-end. + +## The isolation model + +A *Project* in k8tre is more than a row in a database — it is materialized +across four independent enforcement layers. Each layer can fail on its +own without breaking the others, so the test below treats them +separately. + +### Layer 1 — UX (`/projects` filtering) + +When a logged-in user opens +`https://portal./projects`, the backend walks the user's +`User CR (spec.groups[]) → Group CR (spec.projects[]) → Project CR` graph +and renders only the projects reachable from that user's identity. This is +a **convenience layer**, not a security boundary: it only controls what +the menu shows. The endpoint +`/projects//apps` is not protected by the authorization check +(`_is_user_authorised_project()` is **not** called there), so a +logged-in user who knows another project's name can still GET that +page and see its app list. That's a metadata leak, not a data leak — +see the *Weak spots* section below. + +### Layer 2 — Backend authorization gate (`/auth/validate`) + +This is the **real security boundary**. Every request the browser makes +to a non-portal subdomain (`jupyter.`, `guacamole.`, …) +is intercepted by an auth-proxy nginx that issues a subrequest to +`https://portal./auth/validate`. That handler: + +1. Extracts the project from the `k8tre-project` cookie or + `?project=` URL parameter, +2. Extracts the token from `k8tre-auth-token-` cookie or + `?token=` parameter, +3. Verifies the JWT signature and the `aud` claim (the `audience` + protocol mapper on the Keycloak client is what makes this work — + see [`keycloak-post-install-setup.md`](troubleshooting/keycloak-post-install-setup.md#add-the-aud-audience-protocol-mapper)), +4. Calls **`_is_user_authorised_project(username, project)`** + (`ci/backend/main.py:301`) which re-walks the User/Group/Project + graph from the OIDC `preferred_username` claim, +5. Returns 200 + auth headers (`X-Auth-User`, `X-Auth-Groups`, …) on + success, **403** on authorization failure. + +Same check is repeated at `/vdi/sso///` for VDI +shortcut authentication, and inside `/launch//` for the +in-VDI lockout (a user inside a VDI for project A can't `/launch` an +app for project B — `launch_app()` checks `vdi_context` + `vdi_project`). + +### Layer 3 — Kubernetes RBAC + +Each project runs in its own namespace, named +`project-` by convention (see `get_proj_namespace()` in +`ci/backend/main.py:216`). Pods spawned by JupyterHub spawners or +VDIInstance CRs land in that namespace. The default Kubernetes RBAC +policy gives a workload's ServiceAccount no implicit cross-namespace +permissions, so a pod in `project-alpha` cannot list, read, create or +delete resources in `project-bravo` without an explicit RoleBinding +or ClusterRoleBinding granting it. The test asserts this with +`kubectl auth can-i --as=system:serviceaccount:project-alpha:default` +against `project-bravo`. + +### Layer 4 — Cilium network policy + +Once Cilium is the cluster CNI (see +[`k8tre-install-guide.md`](troubleshooting/k8tre-install-guide.md)), +the `CiliumClusterwideNetworkPolicy` and `CiliumNetworkPolicy` +resources shipped by `apps/jupyterhub/base/network_policy.yaml` are +**enforced** instead of inert. The pre-Cilium install (default k3s +flannel) accepted the manifests but no controller honored them. With +Cilium, pods in project namespaces can't reach pods in other project +namespaces — and, importantly, they can't reach the Kubernetes API +server (`10.43.0.1:443`) either, because Cilium treats it as `host` +entity, not `cluster`. This is the reason the RBAC test in layer 3 +uses `kubectl auth can-i --as=...` from outside the pod rather than +running `kubectl` from inside it. + +## The test script + +[`tests/test-project-isolation.sh`](../tests/test-project-isolation.sh) +sets up two parallel projects (`alpha`, `bravo`) plus two users +(`alice` in `alpha-team`, `bob` in `bravo-team`) and walks each +isolation layer in turn. The setup phase is idempotent — re-running +the script just verifies state. + +### Run + +```sh +# Default: domain 188.34.94.28.nip.io, projects alpha/bravo, users alice/bob +./tests/test-project-isolation.sh + +# Override any of these: +DOMAIN=foo.nip.io PROJECT_A=cardio PROJECT_B=onco \ + USER_A=anna USER_B=bruno \ + ./tests/test-project-isolation.sh +``` + +Requires `kubectl` pointed at the cluster (the script runs the heavy +checks against the apiserver from the host) plus `curl`, `python3` and +`jq` locally. + +### What it asserts + +| Test | Layer | Expected on a healthy cluster | +|---|---|---| +| Token issuance + `aud` claim | Authn / Keycloak | both users get a JWT, `aud` contains `backend` | +| `alice → alpha` `/auth/validate` | Layer 2 | **200** | +| `alice → bravo` `/auth/validate` | Layer 2 | **403** | +| `bob → bravo` `/auth/validate` | Layer 2 | **200** | +| `bob → alpha` `/auth/validate` | Layer 2 | **403** | +| Anonymous `/projects//apps` | Layer 1 (negative) | 401 / 302 | +| Logged-in `/projects//apps` (manual) | Layer 1 (**weak**) | reachable, no authz — flagged as WEAK | +| `kubectl auth can-i list/create/delete pods+secrets -n project-bravo --as=…project-alpha:default` | Layer 3 | every answer is `no` | +| Pod in `project-alpha` curl to pod IP in `project-bravo:8080` | Layer 4 | code 000 (connection denied) | +| User CR `spec.groups[]` exists for both users | Layer 1 source-of-truth | non-empty | + +Output uses three states: `PASS` (the assertion held), `FAIL` (it +didn't), `WEAK` (enforcement is incomplete by design — documented but +not yet fixed upstream). On a clean cluster you should see **14 PASS, +0 FAIL, 2 WEAK** today. + +### What it does NOT cover + +- **Token replay across projects.** The cookie name encodes the + project (`k8tre-auth-token-`), but the JWT itself is + identical per user. An attacker with the JWT can mint a cookie for + any project the user is authorized for. Limit token TTLs to + mitigate. +- **JupyterHub spawner & VDIInstance RBAC** — pods spawned by these + components run under their own ServiceAccounts (`hub`, + `user-scheduler`, `vdi-spawner`) which DO have RoleBindings that + cross namespace boundaries. The test only checks the *default* SA; + audit those spawner SAs separately. +- **Cilium policies for inter-pod traffic *within* a single project + namespace.** Today everything in `project-alpha` can talk to + everything else in `project-alpha`. Tighten with per-pod + `endpointSelector` policies if needed. +- **The control plane / Keycloak realm itself.** A misconfigured + Keycloak protocol mapper (missing `groups` claim, missing `aud`) + can defeat the whole stack — see *Setup bugs the script surfaced* + below. + +## Weak spots (the two `WEAK` results today) + +### 1. Project enumeration via `/projects//apps` + +A logged-in user can GET +`https://portal./projects//apps` and the +backend will render the list of apps for that project regardless of +whether the user is authorized. The data fetched on subsequent +clicks is gated by `/auth/validate`, so no payload leaks — but the +existence of arbitrary project names is exposed. Fix is one line in +`get_apps()` in `ci/backend/main.py`: add the same +`_is_user_authorised_project(username, project)` check the +`/auth/validate` handler uses, return 403 if not authorized. + +### 2. `/launch//` sets cookies it shouldn't + +Same shape, same fix: `launch_app()` mints a project-scoped token +and writes the `k8tre-auth-token-` cookie before checking +authorization. The next request to the subdomain is then rejected by +`/auth/validate`, so a real attack only ever gets a dead cookie — +but it pollutes the user's cookie jar and consumes a Keycloak token +refresh. + +## Setup bugs the script surfaced + +The first runs failed for reasons that are themselves worth +documenting; the script now handles them in its setup phase: + +1. **`kcadm.sh set-password` defaults to `--temporary=true`**, which + marks the password as needing a change on next login. The OIDC + *Resource Owner Password Credentials* (password grant) flow then + rejects the login with `Account is not fully set up`. The script + always passes `--temporary=false`. +2. **Users created without `firstName`/`lastName`** trigger the same + `Account is not fully set up` error even when the password is + permanent. The script always sets both on create and runs an + `update users/` on existing users to backfill them. +3. **In-pod `kubectl get …` against another namespace fails before + it can reach the API server** because the project-namespace + network policy prevents the pod from connecting to + `10.43.0.1:443`. The script uses + `kubectl auth can-i --as=…` from the host instead — cleaner + assertion and resilient to network-policy changes. + +## Files + +- [`tests/test-project-isolation.sh`](../tests/test-project-isolation.sh) + — the script. +- [`ci/backend/main.py`](../ci/backend/main.py) — `get_apps()`, + `launch_app()`, `_is_user_authorised_project()`, + `get_proj_namespace()`. +- [`apps/jupyterhub/base/network_policy.yaml`](../apps/jupyterhub/base/network_policy.yaml) + — the Cilium policies enforced under layer 4. +- [`docs/troubleshooting/keycloak-post-install-setup.md`](troubleshooting/keycloak-post-install-setup.md) + — Keycloak client and audience mapper setup that the authz check depends on. diff --git a/tests/test-project-isolation.sh b/tests/test-project-isolation.sh new file mode 100755 index 00000000..1c05617d --- /dev/null +++ b/tests/test-project-isolation.sh @@ -0,0 +1,314 @@ +#!/usr/bin/env bash +# test-project-isolation.sh — exercise the 4 isolation layers of k8tre: +# 1. /projects visibility (UX) +# 2. /auth/validate authorization gate (backend → subdomain access) +# 3. Cross-namespace RBAC (kube-apiserver) +# 4. Cross-namespace network policy (Cilium) +# +# Idempotent setup. Run from your laptop with kubectl context pointing at +# the k8tre cluster: +# +# ./tests/test-project-isolation.sh # uses defaults below +# DOMAIN=foo.nip.io ./tests/test-project-isolation.sh +# +set -u +SCRIPT_NAME=$(basename "$0") + +# ---- configuration ---------------------------------------------------------- +DOMAIN="${DOMAIN:-188.34.94.28.nip.io}" +KEYCLOAK_REALM="${KEYCLOAK_REALM:-k8tre-app}" +K8TRE_NAMESPACE="${K8TRE_NAMESPACE:-keycloak}" # the backend's NAMESPACE +PROJECT_A="${PROJECT_A:-alpha}" +PROJECT_B="${PROJECT_B:-bravo}" +USER_A="${USER_A:-alice}" +USER_B="${USER_B:-bob}" +PORTAL_URL="https://portal.${DOMAIN}" +KC_URL="https://keycloak.${DOMAIN}" + +# ---- output helpers --------------------------------------------------------- +PASS=0; FAIL=0; WEAK=0 +GREEN=$'\e[32m'; RED=$'\e[31m'; YELLOW=$'\e[33m'; BOLD=$'\e[1m'; RESET=$'\e[0m' +pass() { PASS=$((PASS+1)); printf " ${GREEN}PASS${RESET} %s\n" "$*"; } +fail() { FAIL=$((FAIL+1)); printf " ${RED}FAIL${RESET} %s\n" "$*"; } +weak() { WEAK=$((WEAK+1)); printf " ${YELLOW}WEAK${RESET} %s\n" "$*"; } +section() { printf "\n${BOLD}== %s ==${RESET}\n" "$*"; } + +need() { command -v "$1" >/dev/null || { echo "missing tool: $1"; exit 2; }; } +for t in kubectl curl python3 jq; do need "$t"; done + +# ---- setup ------------------------------------------------------------------ +setup() { + section "Setup — projects, groups, users (idempotent)" + + cat </dev/null +apiVersion: research.k8tre.io/v1alpha1 +kind: Project +metadata: {name: ${PROJECT_A}, namespace: ${K8TRE_NAMESPACE}} +spec: + description: "${PROJECT_A} test project" + apps: + - {name: jupyterhub, type: jupyterhub, url: "https://jupyter.${DOMAIN}/hub"} +--- +apiVersion: research.k8tre.io/v1alpha1 +kind: Project +metadata: {name: ${PROJECT_B}, namespace: ${K8TRE_NAMESPACE}} +spec: + description: "${PROJECT_B} test project" + apps: + - {name: jupyterhub, type: jupyterhub, url: "https://jupyter.${DOMAIN}/hub"} +--- +apiVersion: identity.k8tre.io/v1alpha1 +kind: Group +metadata: {name: ${PROJECT_A}-team, namespace: ${K8TRE_NAMESPACE}} +spec: {description: "${PROJECT_A} members", projects: ["${PROJECT_A}"]} +--- +apiVersion: identity.k8tre.io/v1alpha1 +kind: Group +metadata: {name: ${PROJECT_B}-team, namespace: ${K8TRE_NAMESPACE}} +spec: {description: "${PROJECT_B} members", projects: ["${PROJECT_B}"]} +--- +apiVersion: identity.k8tre.io/v1alpha1 +kind: User +metadata: {name: ${USER_A}, namespace: ${K8TRE_NAMESPACE}} +spec: {username: ${USER_A}, email: ${USER_A}@example.com, enabled: true, groups: ["${PROJECT_A}-team"]} +--- +apiVersion: identity.k8tre.io/v1alpha1 +kind: User +metadata: {name: ${USER_B}, namespace: ${K8TRE_NAMESPACE}} +spec: {username: ${USER_B}, email: ${USER_B}@example.com, enabled: true, groups: ["${PROJECT_B}-team"]} +EOF + + # Project namespaces — Tests 3 and 4 need them + kubectl create ns "project-${PROJECT_A}" --dry-run=client -o yaml | kubectl apply -f - >/dev/null + kubectl create ns "project-${PROJECT_B}" --dry-run=client -o yaml | kubectl apply -f - >/dev/null + + # Keycloak users + local pod=keycloak-keycloakx-0 + local admin_user admin_pwd + admin_user=$(kubectl get secret -n keycloak keycloak-admin-credentials -o jsonpath='{.data.username}' | base64 -d) + admin_pwd=$( kubectl get secret -n keycloak keycloak-admin-credentials -o jsonpath='{.data.admin-password}' | base64 -d) + kubectl exec -n keycloak $pod -- /opt/keycloak/bin/kcadm.sh config credentials \ + --server http://localhost:8080 --realm master --user "$admin_user" --password "$admin_pwd" >/dev/null 2>&1 + + for u in "$USER_A" "$USER_B"; do + local uid first last + first="$(printf '%s' "${u:0:1}" | tr '[:lower:]' '[:upper:]')${u:1}" + last="Test" + uid=$(kubectl exec -n keycloak $pod -- /opt/keycloak/bin/kcadm.sh get users -r "$KEYCLOAK_REALM" \ + -q "username=$u" --fields id --format csv 2>/dev/null | head -1 | tr -d '"') + if [ -z "$uid" ]; then + uid=$(kubectl exec -n keycloak $pod -- /opt/keycloak/bin/kcadm.sh create users -r "$KEYCLOAK_REALM" \ + -s "username=$u" -s enabled=true -s "email=$u@example.com" \ + -s "firstName=$first" -s "lastName=$last" -i 2>/dev/null) + echo " created Keycloak user $u" + else + # Ensure firstName/lastName are set — Keycloak refuses password-grant with + # "Account is not fully set up" when these are null. + kubectl exec -n keycloak $pod -- /opt/keycloak/bin/kcadm.sh update users/"$uid" -r "$KEYCLOAK_REALM" \ + -s "firstName=$first" -s "lastName=$last" -s emailVerified=true \ + -s 'requiredActions=[]' >/dev/null 2>&1 + echo " Keycloak user $u exists ($uid) — profile ensured" + fi + # Always (re)set password as permanent — kcadm.sh defaults to temporary=true. + kubectl exec -n keycloak $pod -- /opt/keycloak/bin/kcadm.sh set-password -r "$KEYCLOAK_REALM" \ + --userid "$uid" --new-password "$u" --temporary=false >/dev/null 2>&1 \ + && echo " password set for $u (permanent)" \ + || echo " WARN: could not set password for $u" + done + + # Backend OIDC client secret (used for password grant) + CLIENT_SECRET=$(kubectl get secret -n backend backend-oidc-credentials \ + -o jsonpath='{.data.client-secret}' | base64 -d) +} + +get_token() { + local u="$1" + curl -ks --max-time 10 -X POST "${KC_URL}/realms/${KEYCLOAK_REALM}/protocol/openid-connect/token" \ + -d "grant_type=password" -d "client_id=backend" -d "client_secret=${CLIENT_SECRET}" \ + -d "username=${u}" -d "password=${u}" -d "scope=openid profile email" \ + | python3 -c 'import sys,json +t=json.load(sys.stdin); print(t.get("access_token",""))' +} + +# ---- Test 1 — /auth/validate authorization gate ----------------------------- +test_authvalidate() { + section "Test 1 — /auth/validate (the real authorization gate)" + + local t_a t_b + t_a=$(get_token "$USER_A"); t_b=$(get_token "$USER_B") + [ -n "$t_a" ] || { fail "could not get token for $USER_A"; return; } + [ -n "$t_b" ] || { fail "could not get token for $USER_B"; return; } + pass "tokens obtained for $USER_A and $USER_B" + + # Check that each token carries `aud` claim (otherwise verify_token rejects all of them + # and test cannot distinguish authz from token-validation failure) + local aud + aud=$(python3 -c "import json,base64,sys; print(json.loads(base64.urlsafe_b64decode(sys.argv[1].split('.')[1]+'==')).get('aud',''))" "$t_a") + if echo "$aud" | grep -q backend; then + pass "JWT contains 'aud: backend' (audience mapper is in place)" + else + fail "JWT 'aud' claim missing — see keycloak-post-install-setup.md §audience mapper" + return + fi + + local code + declare -A cases=( + ["$USER_A → $PROJECT_A (own)"]="$t_a $PROJECT_A 200" + ["$USER_A → $PROJECT_B (other)"]="$t_a $PROJECT_B 403" + ["$USER_B → $PROJECT_B (own)"]="$t_b $PROJECT_B 200" + ["$USER_B → $PROJECT_A (other)"]="$t_b $PROJECT_A 403" + ) + for desc in "${!cases[@]}"; do + read -r tok proj want <<<"${cases[$desc]}" + code=$(curl -k -s -o /dev/null -w '%{http_code}' --max-time 10 \ + -H "Cookie: k8tre-project=${proj}; k8tre-auth-token-${proj}=${tok}" \ + "${PORTAL_URL}/auth/validate?orig=http://jupyter.${DOMAIN}/hub/") + if [ "$code" = "$want" ]; then + pass "$desc → $code" + else + fail "$desc → got $code, expected $want" + fi + done +} + +# ---- Test 2 — UX enumeration weakness --------------------------------------- +test_enumeration() { + section "Test 2 — Pre-launch URL enumeration (known UX weakness)" + + local t_a; t_a=$(get_token "$USER_A") + # We need a Portal session (cookie). The cleanest way is to drive the browser + # OIDC flow, but for an automated test we just call /projects//apps + # without a session — it requires require_user which 401s, so anonymous can't + # enumerate either way. The actual UX leak is FOR A LOGGED-IN USER. We + # simulate that with a session by forging the session via the password grant + # token in the Authorization header — which the require_user dependency + # does NOT accept (it reads from request.session). So we test what we can: + # anonymous access correctly 401s. + local code + code=$(curl -k -s -o /dev/null -w '%{http_code}' --max-time 5 \ + "${PORTAL_URL}/projects/${PROJECT_B}/apps") + if [ "$code" = "401" ] || [ "$code" = "302" ]; then + pass "anonymous /projects/$PROJECT_B/apps → $code (require_user blocks)" + else + fail "anonymous /projects/$PROJECT_B/apps → $code (expected 401/302)" + fi + weak "logged-in users can still GET /projects//apps and /launch//" + weak " → enumerates project names; data is gated by /auth/validate only" +} + +# ---- Test 3 — Cross-namespace RBAC ----------------------------------------- +test_rbac() { + section "Test 3 — Cross-namespace RBAC (default ServiceAccount)" + # Use `kubectl auth can-i --as=...` so we test the RBAC policy directly, + # bypassing the pod-network restrictions that block in-pod kubectl access + # to the apiserver in project-* namespaces. + + local sa="system:serviceaccount:project-${PROJECT_A}:default" + local verdicts=( + "list pods -n project-${PROJECT_B}" + "list secrets -n project-${PROJECT_B}" + "create pods -n project-${PROJECT_B}" + "delete pods -n project-${PROJECT_B}" + ) + for v in "${verdicts[@]}"; do + local can + can=$(kubectl auth can-i $v --as="$sa" 2>&1) + if [ "$can" = "no" ]; then + pass "$sa cannot $v" + else + fail "$sa CAN $v (got: $can)" + fi + done + + # Sanity check: the SA CAN access its own namespace? (we don't grant any + # extra roles, so default SA can only access its own /tokenrequest etc.) + local can_own + can_own=$(kubectl auth can-i get serviceaccounts/default -n "project-${PROJECT_A}" --as="$sa" 2>&1) + echo " (sanity: SA in own ns get sa/default → $can_own)" +} + +# ---- Test 4 — Cross-namespace network policy -------------------------------- +test_netpol() { + section "Test 4 — Cross-namespace network reachability (Cilium)" + + # Spin up an HTTP server in project-bravo and a client in project-alpha, + # then check whether alpha can reach bravo. + kubectl delete pod -n "project-${PROJECT_B}" net-target --ignore-not-found --force --grace-period=0 >/dev/null 2>&1 + kubectl delete pod -n "project-${PROJECT_A}" net-client --ignore-not-found --force --grace-period=0 >/dev/null 2>&1 + + # Target: a tiny HTTP responder + kubectl run net-target -n "project-${PROJECT_B}" --image=alpine:3.20 \ + --restart=Never --command -- sh -c \ + 'while true; do printf "HTTP/1.1 200 OK\r\nContent-Length:3\r\n\r\nOK\n" | nc -lp 8080 -w 1; done' \ + >/dev/null + # Client + kubectl run net-client -n "project-${PROJECT_A}" --image=curlimages/curl:8.10.1 \ + --restart=Never --command -- sleep 60 >/dev/null + + for i in $(seq 40); do + [ "$(kubectl get pod -n "project-${PROJECT_A}" net-client -o jsonpath='{.status.containerStatuses[0].ready}' 2>/dev/null)" = "true" ] && \ + [ "$(kubectl get pod -n "project-${PROJECT_B}" net-target -o jsonpath='{.status.containerStatuses[0].ready}' 2>/dev/null)" = "true" ] && break + sleep 2 + done + + local target_ip + target_ip=$(kubectl get pod -n "project-${PROJECT_B}" net-target -o jsonpath='{.status.podIP}' 2>/dev/null) + if [ -z "$target_ip" ]; then + fail "could not bring up net-target pod" + kubectl delete pod -n "project-${PROJECT_B}" net-target --force --grace-period=0 >/dev/null 2>&1 + kubectl delete pod -n "project-${PROJECT_A}" net-client --force --grace-period=0 >/dev/null 2>&1 + return + fi + + # Curl always prints %{http_code} via -w (even on failure: 000). We rely on + # that single output and never use ||, which would double the value. + local code + code=$(kubectl exec -n "project-${PROJECT_A}" net-client -- \ + curl -s -o /dev/null --max-time 5 --connect-timeout 4 -w '%{http_code}' \ + "http://${target_ip}:8080/" 2>/dev/null) + case "$code" in + 000) pass "project-${PROJECT_A} → project-${PROJECT_B} pod blocked (code 000, connection denied)" ;; + "") fail "no response captured from curl (pod exec failed?)" ;; + 200) weak "project-${PROJECT_A} → project-${PROJECT_B} pod reachable (200) — project namespaces have no default-deny NetworkPolicy applied" ;; + *) fail "unexpected code: $code" ;; + esac + + kubectl delete pod -n "project-${PROJECT_B}" net-target --force --grace-period=0 >/dev/null 2>&1 + kubectl delete pod -n "project-${PROJECT_A}" net-client --force --grace-period=0 >/dev/null 2>&1 +} + +# ---- Test 5 — User CR validation ------------------------------------------- +test_user_cr() { + section "Test 5 — User → Group → Project graph integrity" + + for u in "$USER_A:$PROJECT_A:$PROJECT_B" "$USER_B:$PROJECT_B:$PROJECT_A"; do + IFS=':' read -r user own_project other_project <<< "$u" + local groups + groups=$(kubectl get user -n "$K8TRE_NAMESPACE" "$user" -o jsonpath='{.spec.groups[*]}' 2>/dev/null) + [ -n "$groups" ] && pass "user $user has groups: $groups" || fail "user $user not found or has no groups" + done +} + +# ---- main ------------------------------------------------------------------- +main() { + echo "Domain: $DOMAIN" + echo "Portal: $PORTAL_URL" + echo "Keycloak: $KC_URL" + echo "Projects: $PROJECT_A, $PROJECT_B" + echo "Users: $USER_A (in $PROJECT_A-team), $USER_B (in $PROJECT_B-team)" + + setup + test_authvalidate + test_enumeration + test_rbac + test_netpol + test_user_cr + + printf "\n${BOLD}== Summary ==${RESET}\n" + printf " ${GREEN}%d passed${RESET}, ${RED}%d failed${RESET}, ${YELLOW}%d weak${RESET}\n" "$PASS" "$FAIL" "$WEAK" + echo + echo "WEAK = enforcement is incomplete by design — see comments above." + [ "$FAIL" -eq 0 ] || exit 1 +} + +main "$@" From beb16592ef87dbec070f6523a8cd0555ccdbeca3 Mon Sep 17 00:00:00 2001 From: Gianpaolo Sanseverino Date: Wed, 10 Jun 2026 10:18:57 +0200 Subject: [PATCH 2/3] volume isolation --- docs/troubleshooting/volume-isolation.md | 232 +++++++++++++++++++++++ 1 file changed, 232 insertions(+) create mode 100644 docs/troubleshooting/volume-isolation.md diff --git a/docs/troubleshooting/volume-isolation.md b/docs/troubleshooting/volume-isolation.md new file mode 100644 index 00000000..7e898fa5 --- /dev/null +++ b/docs/troubleshooting/volume-isolation.md @@ -0,0 +1,232 @@ +# Volume isolation between projects + +Operational write-up of the storage-isolation posture on a single-node +k8tre cluster (Longhorn + k3s) as observed on the StackIT dev cluster. +Pairs with [`docs/project-isolation.md`](../project-isolation.md), which +covers the identity / network / RBAC layers; this doc covers the volume +layer specifically and the one gap that came out of the probes. + +## TL;DR + +The defaults give you four solid isolation guarantees and one critical +gap: + +| Probe | Result | Note | +| --- | --- | --- | +| PVC RBAC cross-namespace | ✅ blocked | default SA cannot `get/list/create/delete` PVCs in another `project-*` namespace | +| Cluster-scoped Longhorn (`volumes.longhorn.io`, `persistentvolumes`) | ✅ blocked | same SA has no access | +| Pod creation under the default SA | ✅ blocked | apiserver returns `Forbidden` — a notebook user cannot spawn its own pod | +| `claimName` reference into another namespace | ✅ blocked | scheduler resolves `claimName` in the pod's own namespace and the pod stays `Pending` with `persistentvolumeclaim "X" not found` | +| Reachability of `longhorn-frontend` / `longhorn-manager` from a project pod | ✅ blocked | Cilium NetworkPolicy returns 000 (no connection) | +| `hostPath: /var/lib/longhorn` mount in a `project-*` namespace | ❌ **allowed** | no Pod Security Admission enforcement → cluster-admin (or anything able to create pods directly) can read every project's raw volume blocks | + +The last row is the one that matters — see *The hostPath gap* below. + +## How to reproduce — the six probes + +The whole block can be pasted into a shell with the cluster's kubectl +context active. + +```sh +SA_A=system:serviceaccount:project-alpha:default +SA_B=system:serviceaccount:project-bravo:default + +# 1) PVC RBAC cross-namespace +for verb in get list create delete; do + echo " $verb pvc -n project-bravo : $(kubectl auth can-i $verb pvc -n project-bravo --as=$SA_A)" +done + +# 2) Cluster-scoped Longhorn + PV +for verb in get list create delete patch; do + echo " $verb volumes.longhorn.io : $(kubectl auth can-i $verb volumes.longhorn.io --as=$SA_A)" +done +for verb in get list create patch; do + echo " $verb persistentvolumes : $(kubectl auth can-i $verb persistentvolumes --as=$SA_A)" +done + +# 3) Pod creation under the default SA — must fail Forbidden +kubectl apply --as=$SA_A -f - <<'POD' +apiVersion: v1 +kind: Pod +metadata: {name: vol-steal, namespace: project-alpha} +spec: + containers: + - {name: t, image: alpine:3.20, command: ["sleep","30"]} +POD + +# 4) Cross-ns claimName — admin creates the pod, scheduler must refuse +kubectl apply -f - <<'POD' +apiVersion: v1 +kind: Pod +metadata: {name: vol-steal-admin, namespace: project-alpha} +spec: + containers: + - {name: t, image: alpine:3.20, command: ["sleep","30"], + volumeMounts: [{name: v, mountPath: /s}]} + volumes: + - {name: v, persistentVolumeClaim: {claimName: notebook-bob-bravo}} +POD +sleep 4 +kubectl get pod -n project-alpha vol-steal-admin \ + -o jsonpath='{.status.conditions[?(@.type=="PodScheduled")].message}{"\n"}' + +# 5) hostPath mount — the gap +kubectl apply -f - <<'POD' +apiVersion: v1 +kind: Pod +metadata: {name: host-escape, namespace: project-alpha} +spec: + containers: + - {name: t, image: alpine:3.20, command: ["sleep","30"], + volumeMounts: [{name: host, mountPath: /host}]} + volumes: + - {name: host, hostPath: {path: /var/lib/longhorn}} +POD +sleep 3 +kubectl get pod -n project-alpha host-escape -o jsonpath='{.status.phase}{"\n"}' +# expected on a hardened cluster: the apply itself fails with +# pods "host-escape" is forbidden: violates PodSecurity "restricted:v1.32" +# observed today: phase=Running + +# 6) Longhorn UI/API reachability from a project pod +LH_UI_IP=$(kubectl get svc -n storage-system longhorn-frontend \ + -o jsonpath='{.spec.clusterIP}') +kubectl run lh-probe -n project-alpha --image=curlimages/curl:8.10.1 \ + --restart=Never --command -- sleep 30 >/dev/null +# (wait for ready, then) +kubectl exec -n project-alpha lh-probe -- curl -s -o /dev/null \ + --max-time 5 -w '%{http_code}' "http://$LH_UI_IP/" + +# cleanup +kubectl delete pod -n project-alpha vol-steal-admin host-escape lh-probe \ + --force --grace-period=0 +``` + +## The `hostPath` gap + +Anything that can create pods in a `project-*` namespace — directly or +through a spawner — can mount the node's `/var/lib/longhorn` and read +the raw replica files for every project's Longhorn volumes. The +mount-namespace separation is irrelevant: Longhorn keeps replicas as +ordinary files (`volume-head-NNN.img`, `volume-snap-X.img`) under that +directory, and once they're visible to the attacker's pod they are +copyable and parseable. + +A user inside a JupyterHub notebook cannot today create such a pod +(its bearer token is its own ServiceAccount, which lacks `create pods` +in its namespace — see Probe 3). The realistic attack surface is: + +- **anything running with the JupyterHub or VDI spawner ServiceAccount** + (those *do* have `create pods` in the project namespace, otherwise + they couldn't spawn user environments); +- **anything with cluster-admin** kubeconfig access (people, CI jobs, + ArgoCD app SAs with cluster-scoped permissions); +- **future apps wired into the Project model** that may need a different + ServiceAccount with broader pod-creation rights. + +The defense is to *make `hostPath` (and other escapes) inadmissible at +the namespace level*, not to rely on no one having the verb. That's +what Pod Security Admission (PSA) does. + +## Mitigation — Pod Security Admission on `project-*` namespaces + +Kubernetes' built-in PSA enforces a profile at namespace level — no +controller needed. Three profiles: `privileged` (default; allows +everything), `baseline` (no obviously dangerous fields), `restricted` +(strictly locked down). `hostPath`, `privileged: true`, +`hostNetwork: true`, and most capabilities are rejected by both +`baseline` and `restricted`. + +Label every project namespace: + +```sh +for ns in project-alpha project-bravo project-demo-project; do + kubectl label ns "$ns" \ + pod-security.kubernetes.io/enforce=baseline \ + pod-security.kubernetes.io/enforce-version=v1.32 \ + pod-security.kubernetes.io/audit=restricted \ + pod-security.kubernetes.io/warn=restricted \ + --overwrite +done +``` + +Start with `enforce=baseline` (it kills `hostPath` and the worst gear +but is friendly to most controller-spawned pods). Run the workloads +for a few days, watch the `audit`/`warn` reports for what *would* fail +under `restricted`, then promote `enforce` to `restricted` once those +are fixed. + +Verify the gap closes: + +```sh +kubectl apply -f - <<'POD' 2>&1 || echo "(blocked, as expected)" +apiVersion: v1 +kind: Pod +metadata: {name: host-escape, namespace: project-alpha} +spec: + containers: + - {name: t, image: alpine:3.20, command: ["sleep","30"]} + volumes: + - {name: host, hostPath: {path: /var/lib/longhorn}} +POD +# expected: +# pods "host-escape" is forbidden: violates PodSecurity "baseline:v1.32": +# hostPath volumes (volume "host") +``` + +### Caveats before promoting to `restricted` + +The `restricted` profile demands `runAsNonRoot: true`, drops every +capability except `NET_BIND_SERVICE`, requires `allowPrivilegeEscalation: +false`, and pins `seccompProfile.type` to `RuntimeDefault` or +`Localhost`. Things to verify before flipping the switch: + +- **JupyterHub user pod spec** (`apps/jupyterhub/.../values.yaml` — + `singleuser.cloudMetadata.blockWithIptables`, init containers, image + user) must satisfy all four constraints. +- **VDI spawner pod spec** (`apps/guacamole/...` or whichever + controller materialises `VDIInstance`s) — desktop sessions often + need additional capabilities (e.g. `SYS_ADMIN` for FUSE) and won't + pass `restricted` without explicit `securityContext` tuning. +- **Init containers** added by Longhorn (`engine-image-ei-*`) — those + live under `storage-system`, not the project namespaces, so they're + not affected, but any sidecar that the spawner injects into the + user pod is. + +Easiest path: bake the PSA labels into the Project provisioning logic +(today there is no controller — the labels would have to be added +manually each time a Project namespace is created), so a new Project +boots with `enforce=baseline` from second zero. + +## What's still not tested + +- **Spawner ServiceAccount surface.** The JupyterHub `hub` + ServiceAccount has `create pods` on the project namespace. If an + attacker pops the hub, they can spawn arbitrary pods including a + hostPath escape — until PSA is on. A separate test should impersonate + the spawner SA and verify that PSA blocks the escape there too. +- **Longhorn snapshots & `BackingImage` cross-project leak.** A + `BackingImage` is cluster-scoped and stores blob data under + `/var/lib/longhorn/backing-images/`. If one project's data is ever + promoted to a BackingImage by mistake, every project can mount it as + a read-only source. Probably accidental, worth checking once a + backup/restore workflow is wired up. +- **`subPath` traversal inside a single namespace.** PVC content + containing `..` segments could let one pod's mount expose another + pod's files inside the same namespace. Out of scope for cross-project + isolation but a useful adversarial test inside JupyterHub. +- **`VolumeSnapshot` / `VolumeSnapshotContent` RBAC.** Same shape as + PV vs PVC: snapshot contents are cluster-scoped. Today no SA in + `project-*` has access, but it's worth re-checking after any future + Velero / volume-restore integration. + +## Where this fits + +The findings on this page extend +[`docs/project-isolation.md`](../project-isolation.md) with a 5th +enforcement layer (volume / node-storage) that the current +`tests/test-project-isolation.sh` does not yet cover. Next iteration +of the test script should add probes 1–6 above and a final assertion +that the `hostPath` pod is denied by PodSecurity. Until PSA is +applied, that assertion is *expected to fail* — which is exactly the +signal we want from the test. From 1fc5952c90912efe8adb76d6a854b05421424a7e Mon Sep 17 00:00:00 2001 From: Gianpaolo Sanseverino Date: Thu, 11 Jun 2026 12:36:35 +0200 Subject: [PATCH 3/3] flow charts --- docs/diagrams.md | 184 ++++++++++++++++++++++++++++++++ tests/test-project-isolation.sh | 166 +++++++++++++++++++++++++++- 2 files changed, 348 insertions(+), 2 deletions(-) create mode 100644 docs/diagrams.md diff --git a/docs/diagrams.md b/docs/diagrams.md new file mode 100644 index 00000000..8374ad92 --- /dev/null +++ b/docs/diagrams.md @@ -0,0 +1,184 @@ +# k8tre — architecture & isolation diagrams + +Two diagrams that describe the cluster as deployed on the StackIT environment (single-node k3s + Cilium + Longhorn). + +1. Focuses on how project tenants are kept apart from each other and from the infrastructure. +1. Cooms out to the end-to-end request flow from a researcher's browser down to a per-project notebook pod. + +## 1. Namespace separation & tenant isolation + +```mermaid +flowchart TB + %% ─────────── Infrastructure namespaces ─────────── + subgraph INFRA["🔧 Infrastructure namespaces — managed by platform admin"] + direction LR + ks["kube-system
Cilium · CoreDNS · metrics-server"] + ss["storage-system
Longhorn manager · CSI plugin"] + cnpg["cnpg-system
CloudNativePG operator"] + es["external-secrets · cert-manager · argocd · metallb-system"] + end + + %% ─────────── Platform namespaces (shared TRE services) ─────────── + subgraph PLAT["🧱 Platform namespaces — shared TRE services"] + direction LR + kc["keycloak
(realm k8tre-app)"] + bk["backend
(portal)"] + gw["gateway
(Cilium Gateway API)"] + nx["ingress-nginx"] + gt["gitea"] + jh["jupyterhub
(hub + jhub-auth-proxy
+ guacamole pods)"] + os["object-storage
(SeaweedFS)"] + end + + %% ─────────── Tenant (per-project) namespaces ─────────── + subgraph TA["🟦 project-alpha — tenant A"] + direction LR + nbA["JupyterHub user-pod
+ PVC notebook-alice-alpha"] + vdA["VDI pod
+ PVC"] + end + + subgraph TB_["🟩 project-bravo — tenant B"] + direction LR + nbB["JupyterHub user-pod
+ PVC notebook-bob-bravo"] + vdB["VDI pod
+ PVC"] + end + + %% Allowed traffic (solid arrows) + bk -- "spawn pods
(spawner SA)" --> TA + bk -- "spawn pods
(spawner SA)" --> TB_ + nbA -- "intra-ns OK" --> vdA + nbB -- "intra-ns OK" --> vdB + + %% Denied traffic (dashed arrows) — the isolation barriers we tested + TA -. "❌ RBAC: default SA cannot list/get/create
❌ Cilium NetworkPolicy drops cross-tenant TCP
❌ PVC claimName resolved in own ns only" .- TB_ + TA -. "❌ Cilium: apiserver = host entity, blocked
❌ Tenant SA has no ClusterRole" .- ks + TB_ -. "❌" .- ks + + %% Style cues + classDef tenant fill:#dde7f3,stroke:#1f4e79,stroke-width:1px,color:#000; + classDef tenant2 fill:#dff0d8,stroke:#3c763d,stroke-width:1px,color:#000; + classDef infra fill:#f5f0e0,stroke:#8a7032,stroke-width:1px,color:#000; + classDef plat fill:#f0e6f3,stroke:#5a2e7a,stroke-width:1px,color:#000; + class TA tenant; + class TB_ tenant2; + class INFRA infra; + class PLAT plat; +``` + +**Reading the diagram** + +- **Vertical separation**: cluster-scoped resources sit at the top + (only platform admins write here); the row of infrastructure + namespaces hosts the cluster operators (Cilium, Longhorn, CNPG, + cert-manager, ArgoCD); below them the platform namespaces host the + TRE services every tenant shares (Keycloak, portal, gateway, hub, + Gitea, object-storage); at the bottom each project lives in its own + `project-` namespace. +- **Solid arrows** mark traffic that flows in production: the backend's + spawner ServiceAccount creates user pods inside the project + namespaces; pods within the same project talk freely. +- **Dashed lines** mark the four assertions + [`tests/test-project-isolation.sh`](../tests/test-project-isolation.sh) + enforces: cross-tenant RBAC `no`, cross-tenant network drops, cross- + namespace `claimName` not resolved, and project pods cannot reach the + apiserver (Cilium treats `10.43.0.1` as `host`, the + `allow-pod-to-pod-via-gateway` policy only opens `cluster` entities). + +## 2. End-to-end request flow + +```mermaid +flowchart TB + user(["👩‍🔬 Researcher's browser"]) + + subgraph CLOUD["☁️ Cloud network (StackIT)"] + fip["188.34.94.28 · floating IP"] + end + + subgraph VM["🖥️ Single-node VM — k3s + Cilium"] + direction TB + socat["socat 0.0.0.0:80,443 → 127.0.0.1:14722"] + envoy["cilium-envoy · Gateway API listener · 127.0.0.1:14722"] + gw["Gateway internal-gateway
HTTPRoutes for portal · keycloak · jupyter · guacamole · gitea · cr8tor"] + + subgraph TRE["TRE platform"] + direction TB + portal["portal (backend)"] + keycloak[("Keycloak · realm k8tre-app")] + cnpg[("CloudNativePG · postgres")] + apiserver["kube-apiserver"] + hub["JupyterHub + jhub-auth-proxy"] + guac["Guacamole + guacamole-auth-proxy"] + gitea["Gitea"] + longhorn[("Longhorn · volumes & replicas")] + end + + subgraph TENANTS["Per-project workload namespaces"] + direction TB + pa["project-alpha · user-notebook · VDI · PVC"] + pb["project-bravo · user-notebook · VDI · PVC"] + end + end + + user -->|"HTTPS
*.<domain>.nip.io"| fip + fip -->|"DNAT to VM"| socat + socat --> envoy + envoy --> gw + + gw -->|"portal.
<domain>"| portal + gw -->|"keycloak.
<domain>"| keycloak + gw -->|"jupyter.
<domain>"| hub + gw -->|"guacamole.
<domain>"| guac + gw -->|"gitea.
<domain>"| gitea + + portal -->|"OIDC code flow
+ JWKS"| keycloak + portal -->|"reads User /
Group / Project CRs"| apiserver + portal -->|"creates VDIInstance
updates JupyterHub
profile via API"| apiserver + keycloak --> cnpg + + hub -->|"spawn user-notebook"| pa + hub -->|"spawn user-notebook"| pb + guac -->|"open VDI"| pa + guac -->|"open VDI"| pb + + pa -->|"PVC binds
project-alpha only"| longhorn + pb -->|"PVC binds
project-bravo only"| longhorn + + %% Auth subrequest loop (every subdomain hit) + hub <-->|"/auth/validate
(subrequest)"| portal + guac <-->|"/auth/validate"| portal + + classDef ext fill:#cfe2f3,stroke:#0b5394,color:#000; + classDef net fill:#e6f4ea,stroke:#137333,color:#000; + classDef plat fill:#fff2cc,stroke:#bf9000,color:#000; + classDef tenant fill:#f4cccc,stroke:#990000,color:#000; + class user,fip ext; + class socat,envoy,gw net; + class portal,keycloak,cnpg,apiserver,hub,guac,gitea,longhorn plat; + class pa,pb tenant; +``` + +**Reading the diagram** + +- **Ingress chain (top-left to top-right)**: browser → floating IP → + cloud NAT → VM's `enp3s0:443` → `socat` systemd unit → Cilium-Envoy + loopback listener → `Gateway internal-gateway` (Cilium Gateway API). + This is the path documented in + [`docs/troubleshooting/k8tre-install-guide.md`](troubleshooting/k8tre-install-guide.md#5-expose-the-gateway-on-the-host-ip) + and the reason `socat` exists on the host: every HTTPS hit traverses + it. +- **HTTPRoutes** fan out to the platform services. Three (Keycloak, + portal, Gitea) serve directly; two (JupyterHub, Guacamole) are + fronted by their auth-proxy nginx which calls back to + `portal:/auth/validate` for every request to translate the user's + Keycloak session into the per-project authorization decision. +- **portal ↔ Keycloak** is OIDC over the **internal-resolved** public + hostname — see the CoreDNS hosts override in + [`docs/troubleshooting/keycloak-post-install-setup.md`](troubleshooting/keycloak-post-install-setup.md#make-the-backend-reach-keycloak-from-inside-the-cluster) + for why a pod hitting `keycloak.` is rewritten to the VM's + primary private IP rather than hairpinning out the cloud NAT. +- **portal ↔ apiserver** is how the User → Group → Project CR graph is + read at every `/projects` and `/auth/validate` call, and how the + backend mints a `VDIInstance` when a researcher clicks *Launch*. +- **JupyterHub & Guacamole spawn pods inside the tenant namespace** — + per-project PVCs are created/bound here, never across; the dotted + isolation boundaries of Diagram 1 apply. diff --git a/tests/test-project-isolation.sh b/tests/test-project-isolation.sh index 1c05617d..fed82190 100755 --- a/tests/test-project-isolation.sh +++ b/tests/test-project-isolation.sh @@ -277,9 +277,170 @@ test_netpol() { kubectl delete pod -n "project-${PROJECT_A}" net-client --force --grace-period=0 >/dev/null 2>&1 } -# ---- Test 5 — User CR validation ------------------------------------------- +# ---- Test 5 — Cross-project volume access ---------------------------------- +test_volume() { + section "Test 5 — Volume isolation: PVC in project ${PROJECT_A} unreachable from project ${PROJECT_B}" + + local pvc=secret-${PROJECT_A}-data + local sentinel="SECRET_${PROJECT_A^^}: only-for-${PROJECT_A}-team" + local sa_b="system:serviceaccount:project-${PROJECT_B}:default" + + # Cleanup any previous run + kubectl delete pod ${PROJECT_A}-writer ${PROJECT_A}-reader -n "project-${PROJECT_A}" \ + --ignore-not-found --force --grace-period=0 >/dev/null 2>&1 + kubectl delete pod ${PROJECT_B}-thief-name -n "project-${PROJECT_B}" \ + --ignore-not-found --force --grace-period=0 >/dev/null 2>&1 + kubectl delete pvc "$pvc" -n "project-${PROJECT_A}" --ignore-not-found >/dev/null 2>&1 + kubectl delete pvc "stolen-via-volume-name" -n "project-${PROJECT_B}" --ignore-not-found >/dev/null 2>&1 + + # --- 5a) create PVC in PROJECT_A and write sentinel --- + kubectl apply -f - >/dev/null < /data/secret.txt; sync; sleep 3"] + volumeMounts: [{name: d, mountPath: /data}] + volumes: + - {name: d, persistentVolumeClaim: {claimName: ${pvc}}} +YAML + + # Wait for writer to finish + local phase="" + for i in $(seq 60); do + phase=$(kubectl get pod -n "project-${PROJECT_A}" "${PROJECT_A}-writer" -o jsonpath='{.status.phase}' 2>/dev/null) + [ "$phase" = "Succeeded" ] && break + [ "$phase" = "Failed" ] && break + sleep 2 + done + [ "$phase" = "Succeeded" ] && pass "sentinel written to ${pvc} (writer phase=$phase)" \ + || { fail "writer phase=$phase — abort volume test"; return; } + + # --- 5b) PROJECT_A's own pod can read the sentinel back --- + kubectl run "${PROJECT_A}-reader" -n "project-${PROJECT_A}" --image=alpine:3.20 --restart=Never \ + --overrides="{\"spec\":{\"containers\":[{\"name\":\"r\",\"image\":\"alpine:3.20\",\"command\":[\"cat\",\"/data/secret.txt\"],\"volumeMounts\":[{\"name\":\"d\",\"mountPath\":\"/data\"}]}],\"volumes\":[{\"name\":\"d\",\"persistentVolumeClaim\":{\"claimName\":\"${pvc}\"}}]}}" \ + --command -- cat /data/secret.txt >/dev/null 2>&1 + + for i in $(seq 60); do + phase=$(kubectl get pod -n "project-${PROJECT_A}" "${PROJECT_A}-reader" -o jsonpath='{.status.phase}' 2>/dev/null) + [ "$phase" = "Succeeded" ] || [ "$phase" = "Failed" ] && break + sleep 2 + done + local content + content=$(kubectl logs -n "project-${PROJECT_A}" "${PROJECT_A}-reader" 2>/dev/null) + if echo "$content" | grep -qF "$sentinel"; then + pass "${PROJECT_A} reader sees the sentinel from its own PVC" + else + fail "${PROJECT_A} reader did not see the sentinel: '$content'" + fi + + # --- 5c) RBAC: PROJECT_B's default SA cannot touch PROJECT_A's PVC --- + for verb in get list create delete patch; do + local can + can=$(kubectl auth can-i $verb pvc/$pvc -n "project-${PROJECT_A}" --as=$sa_b 2>&1) + [ "$can" = "no" ] && pass "$sa_b cannot $verb pvc/$pvc -n project-${PROJECT_A}" \ + || fail "$sa_b CAN $verb pvc/$pvc -n project-${PROJECT_A} (got: $can)" + done + + # --- 5d) Pod in PROJECT_B referencing claimName=$pvc must stay Pending --- + kubectl apply -f - >/dev/null </dev/null 2>&1 </dev/null) + theft_event=$(kubectl get events -n "project-${PROJECT_B}" --field-selector involvedObject.name=stolen-via-volume-name -o jsonpath='{.items[-1:].message}' 2>/dev/null) + if [ "$theft_status" = "Pending" ] && echo "$theft_event" | grep -qi 'already bound'; then + pass "PV refuses cross-ns claimRef hijack — status=$theft_status: $theft_event" + elif [ "$theft_status" = "Bound" ]; then + fail "PV bound to a project-${PROJECT_B} PVC — cluster-scoped PV ISOLATION BROKEN" + else + pass "stolen PVC did not bind (status=$theft_status, event='$theft_event')" + fi + + # --- 5f) hostPath escape — currently allowed (no PodSecurity enforce) --- + # Documented in docs/troubleshooting/volume-isolation.md: PSA is not on, so + # anyone who can create a pod in project-${PROJECT_B} can mount the node's + # /var/lib/longhorn and read every project's raw blocks. We report this as + # WEAK rather than FAIL so the suite stays useful on un-hardened clusters. + kubectl delete pod -n "project-${PROJECT_B}" host-escape --ignore-not-found --force --grace-period=0 >/dev/null 2>&1 + kubectl apply -f - >/dev/null 2>&1 </dev/null) + if [ "$phase" = "Running" ] || [ "$phase" = "Pending" ]; then + weak "hostPath /var/lib/longhorn admitted in project-${PROJECT_B} (phase=$phase)" + weak " → apply pod-security.kubernetes.io/enforce=baseline on project namespaces" + weak " → see docs/troubleshooting/volume-isolation.md" + else + pass "hostPath /var/lib/longhorn denied by admission (phase=$phase)" + fi + + # --- cleanup --- + kubectl delete pod -n "project-${PROJECT_A}" "${PROJECT_A}-writer" "${PROJECT_A}-reader" --ignore-not-found --force --grace-period=0 >/dev/null 2>&1 + kubectl delete pod -n "project-${PROJECT_B}" "${PROJECT_B}-thief-name" host-escape --ignore-not-found --force --grace-period=0 >/dev/null 2>&1 + kubectl delete pvc -n "project-${PROJECT_B}" stolen-via-volume-name --ignore-not-found >/dev/null 2>&1 + # leave the alpha PVC behind — idempotent re-runs reuse it +} + +# ---- Test 6 — User CR validation ------------------------------------------- test_user_cr() { - section "Test 5 — User → Group → Project graph integrity" + section "Test 6 — User → Group → Project graph integrity" for u in "$USER_A:$PROJECT_A:$PROJECT_B" "$USER_B:$PROJECT_B:$PROJECT_A"; do IFS=':' read -r user own_project other_project <<< "$u" @@ -302,6 +463,7 @@ main() { test_enumeration test_rbac test_netpol + test_volume test_user_cr printf "\n${BOLD}== Summary ==${RESET}\n"