From 4a1cfef75327c68921e73706954debcdc0a95044 Mon Sep 17 00:00:00 2001
From: Gianpaolo Sanseverino <gia.sanseverino@gmail.com>
Date: Tue, 9 Jun 2026 17:35:36 +0200
Subject: [PATCH 1/3] =?UTF-8?q?Add=20project=20isolation=20testing=20?=
 =?UTF-8?q?=E2=80=94=20script=20+=20model=20doc?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two paired artifacts that document and exercise the four layers k8tre
relies on to keep one Project's resources out of reach of another
Project's users.

  tests/test-project-isolation.sh
    Idempotent end-to-end check. Creates two Projects (alpha, bravo)
    + Groups + Users (alice, bob), then asserts:
      - Keycloak password-grant works and the JWT has aud=backend
      - /auth/validate returns 200 for own project, 403 for the other,
        symmetrically for both users (the real authz gate, layer 2)
      - default ServiceAccount in project-alpha cannot list/create/
        delete pods or secrets in project-bravo (layer 3, RBAC)
      - a pod in project-alpha can't TCP to a pod in project-bravo
        (layer 4, Cilium)
      - User CRs exist with the right group memberships
    On a healthy cluster: 14 PASS, 0 FAIL, 2 WEAK (documented).

  docs/project-isolation.md
    Walks through the four enforcement layers (UX /projects filtering,
    /auth/validate gate, Kubernetes RBAC, Cilium NetworkPolicy), how
    to run the script, what it does and does not cover, the two known
    weak spots (logged-in users can still GET /projects/<other>/apps
    and /launch/<other>/<app> — metadata leak, no data leak; one-line
    fix in get_apps()/launch_app() noted), and the three Keycloak +
    network-policy quirks the script's setup phase has to work around.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 docs/project-isolation.md       | 201 ++++++++++++++++++++
 tests/test-project-isolation.sh | 314 ++++++++++++++++++++++++++++++++
 2 files changed, 515 insertions(+)
 create mode 100644 docs/project-isolation.md
 create mode 100755 tests/test-project-isolation.sh
diff --git a/docs/project-isolation.md b/docs/project-isolation.md
new file mode 100644
index 00000000..333ff65c
--- /dev/null
+++ b/docs/project-isolation.md
@@ -0,0 +1,201 @@
+# k8tre — project isolation model and how to test it
+
+This doc describes the four layers that keep one k8tre Project's resources
+out of reach of another Project's users, and the
+[`tests/test-project-isolation.sh`](../tests/test-project-isolation.sh)
+script that exercises all four end-to-end.
+
+## The isolation model
+
+A *Project* in k8tre is more than a row in a database — it is materialized
+across four independent enforcement layers. Each layer can fail on its
+own without breaking the others, so the test below treats them
+separately.
+
+### Layer 1 — UX (`/projects` filtering)
+
+When a logged-in user opens
+`https://portal.<domain>/projects`, the backend walks the user's
+`User CR (spec.groups[]) → Group CR (spec.projects[]) → Project CR` graph
+and renders only the projects reachable from that user's identity. This is
+a **convenience layer**, not a security boundary: it only controls what
+the menu shows. The endpoint
+`/projects/<other>/apps` is not protected by the authorization check
+(`_is_user_authorised_project()` is **not** called there), so a
+logged-in user who knows another project's name can still GET that
+page and see its app list. That's a metadata leak, not a data leak —
+see the *Weak spots* section below.
+
+### Layer 2 — Backend authorization gate (`/auth/validate`)
+
+This is the **real security boundary**. Every request the browser makes
+to a non-portal subdomain (`jupyter.<domain>`, `guacamole.<domain>`, …)
+is intercepted by an auth-proxy nginx that issues a subrequest to
+`https://portal.<domain>/auth/validate`. That handler:
+
+1. Extracts the project from the `k8tre-project` cookie or
+   `?project=` URL parameter,
+2. Extracts the token from `k8tre-auth-token-<project>` cookie or
+   `?token=` parameter,
+3. Verifies the JWT signature and the `aud` claim (the `audience`
+   protocol mapper on the Keycloak client is what makes this work —
+   see [`keycloak-post-install-setup.md`](troubleshooting/keycloak-post-install-setup.md#add-the-aud-audience-protocol-mapper)),
+4. Calls **`_is_user_authorised_project(username, project)`**
+   (`ci/backend/main.py:301`) which re-walks the User/Group/Project
+   graph from the OIDC `preferred_username` claim,
+5. Returns 200 + auth headers (`X-Auth-User`, `X-Auth-Groups`, …) on
+   success, **403** on authorization failure.
+
+Same check is repeated at `/vdi/sso/<token>/<project>/<app>` for VDI
+shortcut authentication, and inside `/launch/<project>/<app>` for the
+in-VDI lockout (a user inside a VDI for project A can't `/launch` an
+app for project B — `launch_app()` checks `vdi_context` + `vdi_project`).
+
+### Layer 3 — Kubernetes RBAC
+
+Each project runs in its own namespace, named
+`project-<project>` by convention (see `get_proj_namespace()` in
+`ci/backend/main.py:216`). Pods spawned by JupyterHub spawners or
+VDIInstance CRs land in that namespace. The default Kubernetes RBAC
+policy gives a workload's ServiceAccount no implicit cross-namespace
+permissions, so a pod in `project-alpha` cannot list, read, create or
+delete resources in `project-bravo` without an explicit RoleBinding
+or ClusterRoleBinding granting it. The test asserts this with
+`kubectl auth can-i --as=system:serviceaccount:project-alpha:default`
+against `project-bravo`.
+
+### Layer 4 — Cilium network policy
+
+Once Cilium is the cluster CNI (see
+[`k8tre-install-guide.md`](troubleshooting/k8tre-install-guide.md)),
+the `CiliumClusterwideNetworkPolicy` and `CiliumNetworkPolicy`
+resources shipped by `apps/jupyterhub/base/network_policy.yaml` are
+**enforced** instead of inert. The pre-Cilium install (default k3s
+flannel) accepted the manifests but no controller honored them. With
+Cilium, pods in project namespaces can't reach pods in other project
+namespaces — and, importantly, they can't reach the Kubernetes API
+server (`10.43.0.1:443`) either, because Cilium treats it as `host`
+entity, not `cluster`. This is the reason the RBAC test in layer 3
+uses `kubectl auth can-i --as=...` from outside the pod rather than
+running `kubectl` from inside it.
+
+## The test script
+
+[`tests/test-project-isolation.sh`](../tests/test-project-isolation.sh)
+sets up two parallel projects (`alpha`, `bravo`) plus two users
+(`alice` in `alpha-team`, `bob` in `bravo-team`) and walks each
+isolation layer in turn. The setup phase is idempotent — re-running
+the script just verifies state.
+
+### Run
+
+```sh
+# Default: domain 188.34.94.28.nip.io, projects alpha/bravo, users alice/bob
+./tests/test-project-isolation.sh
+
+# Override any of these:
+DOMAIN=foo.nip.io PROJECT_A=cardio PROJECT_B=onco \
+  USER_A=anna USER_B=bruno \
+  ./tests/test-project-isolation.sh
+```
+
+Requires `kubectl` pointed at the cluster (the script runs the heavy
+checks against the apiserver from the host) plus `curl`, `python3` and
+`jq` locally.
+
+### What it asserts
+
+| Test | Layer | Expected on a healthy cluster |
+|---|---|---|
+| Token issuance + `aud` claim | Authn / Keycloak | both users get a JWT, `aud` contains `backend` |
+| `alice → alpha` `/auth/validate` | Layer 2 | **200** |
+| `alice → bravo` `/auth/validate` | Layer 2 | **403** |
+| `bob → bravo` `/auth/validate` | Layer 2 | **200** |
+| `bob → alpha` `/auth/validate` | Layer 2 | **403** |
+| Anonymous `/projects/<other>/apps` | Layer 1 (negative) | 401 / 302 |
+| Logged-in `/projects/<other>/apps` (manual) | Layer 1 (**weak**) | reachable, no authz — flagged as WEAK |
+| `kubectl auth can-i list/create/delete pods+secrets -n project-bravo --as=…project-alpha:default` | Layer 3 | every answer is `no` |
+| Pod in `project-alpha` curl to pod IP in `project-bravo:8080` | Layer 4 | code 000 (connection denied) |
+| User CR `spec.groups[]` exists for both users | Layer 1 source-of-truth | non-empty |
+
+Output uses three states: `PASS` (the assertion held), `FAIL` (it
+didn't), `WEAK` (enforcement is incomplete by design — documented but
+not yet fixed upstream). On a clean cluster you should see **14 PASS,
+0 FAIL, 2 WEAK** today.
+
+### What it does NOT cover
+
+- **Token replay across projects.** The cookie name encodes the
+  project (`k8tre-auth-token-<project>`), but the JWT itself is
+  identical per user. An attacker with the JWT can mint a cookie for
+  any project the user is authorized for. Limit token TTLs to
+  mitigate.
+- **JupyterHub spawner & VDIInstance RBAC** — pods spawned by these
+  components run under their own ServiceAccounts (`hub`,
+  `user-scheduler`, `vdi-spawner`) which DO have RoleBindings that
+  cross namespace boundaries. The test only checks the *default* SA;
+  audit those spawner SAs separately.
+- **Cilium policies for inter-pod traffic *within* a single project
+  namespace.** Today everything in `project-alpha` can talk to
+  everything else in `project-alpha`. Tighten with per-pod
+  `endpointSelector` policies if needed.
+- **The control plane / Keycloak realm itself.** A misconfigured
+  Keycloak protocol mapper (missing `groups` claim, missing `aud`)
+  can defeat the whole stack — see *Setup bugs the script surfaced*
+  below.
+
+## Weak spots (the two `WEAK` results today)
+
+### 1. Project enumeration via `/projects/<other>/apps`
+
+A logged-in user can GET
+`https://portal.<domain>/projects/<any-project-name>/apps` and the
+backend will render the list of apps for that project regardless of
+whether the user is authorized. The data fetched on subsequent
+clicks is gated by `/auth/validate`, so no payload leaks — but the
+existence of arbitrary project names is exposed. Fix is one line in
+`get_apps()` in `ci/backend/main.py`: add the same
+`_is_user_authorised_project(username, project)` check the
+`/auth/validate` handler uses, return 403 if not authorized.
+
+### 2. `/launch/<other>/<app>` sets cookies it shouldn't
+
+Same shape, same fix: `launch_app()` mints a project-scoped token
+and writes the `k8tre-auth-token-<other>` cookie before checking
+authorization. The next request to the subdomain is then rejected by
+`/auth/validate`, so a real attack only ever gets a dead cookie —
+but it pollutes the user's cookie jar and consumes a Keycloak token
+refresh.
+
+## Setup bugs the script surfaced
+
+The first runs failed for reasons that are themselves worth
+documenting; the script now handles them in its setup phase:
+
+1. **`kcadm.sh set-password` defaults to `--temporary=true`**, which
+   marks the password as needing a change on next login. The OIDC
+   *Resource Owner Password Credentials* (password grant) flow then
+   rejects the login with `Account is not fully set up`. The script
+   always passes `--temporary=false`.
+2. **Users created without `firstName`/`lastName`** trigger the same
+   `Account is not fully set up` error even when the password is
+   permanent. The script always sets both on create and runs an
+   `update users/<id>` on existing users to backfill them.
+3. **In-pod `kubectl get …` against another namespace fails before
+   it can reach the API server** because the project-namespace
+   network policy prevents the pod from connecting to
+   `10.43.0.1:443`. The script uses
+   `kubectl auth can-i --as=…` from the host instead — cleaner
+   assertion and resilient to network-policy changes.
+
+## Files
+
+- [`tests/test-project-isolation.sh`](../tests/test-project-isolation.sh)
+  — the script.
+- [`ci/backend/main.py`](../ci/backend/main.py) — `get_apps()`,
+  `launch_app()`, `_is_user_authorised_project()`,
+  `get_proj_namespace()`.
+- [`apps/jupyterhub/base/network_policy.yaml`](../apps/jupyterhub/base/network_policy.yaml)
+  — the Cilium policies enforced under layer 4.
+- [`docs/troubleshooting/keycloak-post-install-setup.md`](troubleshooting/keycloak-post-install-setup.md)
+  — Keycloak client and audience mapper setup that the authz check depends on.
diff --git a/tests/test-project-isolation.sh b/tests/test-project-isolation.sh
new file mode 100755
index 00000000..1c05617d
--- /dev/null
+++ b/tests/test-project-isolation.sh
@@ -0,0 +1,314 @@
+#!/usr/bin/env bash
+# test-project-isolation.sh — exercise the 4 isolation layers of k8tre:
+#   1. /projects visibility (UX)
+#   2. /auth/validate authorization gate (backend → subdomain access)
+#   3. Cross-namespace RBAC (kube-apiserver)
+#   4. Cross-namespace network policy (Cilium)
+#
+# Idempotent setup. Run from your laptop with kubectl context pointing at
+# the k8tre cluster:
+#
+#     ./tests/test-project-isolation.sh                # uses defaults below
+#     DOMAIN=foo.nip.io ./tests/test-project-isolation.sh
+#
+set -u
+SCRIPT_NAME=$(basename "$0")
+
+# ---- configuration ----------------------------------------------------------
+DOMAIN="${DOMAIN:-188.34.94.28.nip.io}"
+KEYCLOAK_REALM="${KEYCLOAK_REALM:-k8tre-app}"
+K8TRE_NAMESPACE="${K8TRE_NAMESPACE:-keycloak}"   # the backend's NAMESPACE
+PROJECT_A="${PROJECT_A:-alpha}"
+PROJECT_B="${PROJECT_B:-bravo}"
+USER_A="${USER_A:-alice}"
+USER_B="${USER_B:-bob}"
+PORTAL_URL="https://portal.${DOMAIN}"
+KC_URL="https://keycloak.${DOMAIN}"
+
+# ---- output helpers ---------------------------------------------------------
+PASS=0; FAIL=0; WEAK=0
+GREEN=$'\e[32m'; RED=$'\e[31m'; YELLOW=$'\e[33m'; BOLD=$'\e[1m'; RESET=$'\e[0m'
+pass()  { PASS=$((PASS+1)); printf "  ${GREEN}PASS${RESET}  %s\n" "$*"; }
+fail()  { FAIL=$((FAIL+1)); printf "  ${RED}FAIL${RESET}  %s\n" "$*"; }
+weak()  { WEAK=$((WEAK+1)); printf "  ${YELLOW}WEAK${RESET}  %s\n" "$*"; }
+section() { printf "\n${BOLD}== %s ==${RESET}\n" "$*"; }
+
+need() { command -v "$1" >/dev/null || { echo "missing tool: $1"; exit 2; }; }
+for t in kubectl curl python3 jq; do need "$t"; done
+
+# ---- setup ------------------------------------------------------------------
+setup() {
+  section "Setup — projects, groups, users (idempotent)"
+
+  cat <<EOF | kubectl apply -f - >/dev/null
+apiVersion: research.k8tre.io/v1alpha1
+kind: Project
+metadata: {name: ${PROJECT_A}, namespace: ${K8TRE_NAMESPACE}}
+spec:
+  description: "${PROJECT_A} test project"
+  apps:
+    - {name: jupyterhub, type: jupyterhub, url: "https://jupyter.${DOMAIN}/hub"}
+---
+apiVersion: research.k8tre.io/v1alpha1
+kind: Project
+metadata: {name: ${PROJECT_B}, namespace: ${K8TRE_NAMESPACE}}
+spec:
+  description: "${PROJECT_B} test project"
+  apps:
+    - {name: jupyterhub, type: jupyterhub, url: "https://jupyter.${DOMAIN}/hub"}
+---
+apiVersion: identity.k8tre.io/v1alpha1
+kind: Group
+metadata: {name: ${PROJECT_A}-team, namespace: ${K8TRE_NAMESPACE}}
+spec: {description: "${PROJECT_A} members", projects: ["${PROJECT_A}"]}
+---
+apiVersion: identity.k8tre.io/v1alpha1
+kind: Group
+metadata: {name: ${PROJECT_B}-team, namespace: ${K8TRE_NAMESPACE}}
+spec: {description: "${PROJECT_B} members", projects: ["${PROJECT_B}"]}
+---
+apiVersion: identity.k8tre.io/v1alpha1
+kind: User
+metadata: {name: ${USER_A}, namespace: ${K8TRE_NAMESPACE}}
+spec: {username: ${USER_A}, email: ${USER_A}@example.com, enabled: true, groups: ["${PROJECT_A}-team"]}
+---
+apiVersion: identity.k8tre.io/v1alpha1
+kind: User
+metadata: {name: ${USER_B}, namespace: ${K8TRE_NAMESPACE}}
+spec: {username: ${USER_B}, email: ${USER_B}@example.com, enabled: true, groups: ["${PROJECT_B}-team"]}
+EOF
+
+  # Project namespaces — Tests 3 and 4 need them
+  kubectl create ns "project-${PROJECT_A}" --dry-run=client -o yaml | kubectl apply -f - >/dev/null
+  kubectl create ns "project-${PROJECT_B}" --dry-run=client -o yaml | kubectl apply -f - >/dev/null
+
+  # Keycloak users
+  local pod=keycloak-keycloakx-0
+  local admin_user admin_pwd
+  admin_user=$(kubectl get secret -n keycloak keycloak-admin-credentials -o jsonpath='{.data.username}' | base64 -d)
+  admin_pwd=$( kubectl get secret -n keycloak keycloak-admin-credentials -o jsonpath='{.data.admin-password}' | base64 -d)
+  kubectl exec -n keycloak $pod -- /opt/keycloak/bin/kcadm.sh config credentials \
+    --server http://localhost:8080 --realm master --user "$admin_user" --password "$admin_pwd" >/dev/null 2>&1
+
+  for u in "$USER_A" "$USER_B"; do
+    local uid first last
+    first="$(printf '%s' "${u:0:1}" | tr '[:lower:]' '[:upper:]')${u:1}"
+    last="Test"
+    uid=$(kubectl exec -n keycloak $pod -- /opt/keycloak/bin/kcadm.sh get users -r "$KEYCLOAK_REALM" \
+            -q "username=$u" --fields id --format csv 2>/dev/null | head -1 | tr -d '"')
+    if [ -z "$uid" ]; then
+      uid=$(kubectl exec -n keycloak $pod -- /opt/keycloak/bin/kcadm.sh create users -r "$KEYCLOAK_REALM" \
+              -s "username=$u" -s enabled=true -s "email=$u@example.com" \
+              -s "firstName=$first" -s "lastName=$last" -i 2>/dev/null)
+      echo "  created Keycloak user $u"
+    else
+      # Ensure firstName/lastName are set — Keycloak refuses password-grant with
+      # "Account is not fully set up" when these are null.
+      kubectl exec -n keycloak $pod -- /opt/keycloak/bin/kcadm.sh update users/"$uid" -r "$KEYCLOAK_REALM" \
+              -s "firstName=$first" -s "lastName=$last" -s emailVerified=true \
+              -s 'requiredActions=[]' >/dev/null 2>&1
+      echo "  Keycloak user $u exists ($uid) — profile ensured"
+    fi
+    # Always (re)set password as permanent — kcadm.sh defaults to temporary=true.
+    kubectl exec -n keycloak $pod -- /opt/keycloak/bin/kcadm.sh set-password -r "$KEYCLOAK_REALM" \
+            --userid "$uid" --new-password "$u" --temporary=false >/dev/null 2>&1 \
+      && echo "  password set for $u (permanent)" \
+      || echo "  WARN: could not set password for $u"
+  done
+
+  # Backend OIDC client secret (used for password grant)
+  CLIENT_SECRET=$(kubectl get secret -n backend backend-oidc-credentials \
+    -o jsonpath='{.data.client-secret}' | base64 -d)
+}
+
+get_token() {
+  local u="$1"
+  curl -ks --max-time 10 -X POST "${KC_URL}/realms/${KEYCLOAK_REALM}/protocol/openid-connect/token" \
+    -d "grant_type=password" -d "client_id=backend" -d "client_secret=${CLIENT_SECRET}" \
+    -d "username=${u}" -d "password=${u}" -d "scope=openid profile email" \
+    | python3 -c 'import sys,json
+t=json.load(sys.stdin); print(t.get("access_token",""))'
+}
+
+# ---- Test 1 — /auth/validate authorization gate -----------------------------
+test_authvalidate() {
+  section "Test 1 — /auth/validate (the real authorization gate)"
+
+  local t_a t_b
+  t_a=$(get_token "$USER_A"); t_b=$(get_token "$USER_B")
+  [ -n "$t_a" ] || { fail "could not get token for $USER_A"; return; }
+  [ -n "$t_b" ] || { fail "could not get token for $USER_B"; return; }
+  pass "tokens obtained for $USER_A and $USER_B"
+
+  # Check that each token carries `aud` claim (otherwise verify_token rejects all of them
+  # and test cannot distinguish authz from token-validation failure)
+  local aud
+  aud=$(python3 -c "import json,base64,sys; print(json.loads(base64.urlsafe_b64decode(sys.argv[1].split('.')[1]+'==')).get('aud',''))" "$t_a")
+  if echo "$aud" | grep -q backend; then
+    pass "JWT contains 'aud: backend' (audience mapper is in place)"
+  else
+    fail "JWT 'aud' claim missing — see keycloak-post-install-setup.md §audience mapper"
+    return
+  fi
+
+  local code
+  declare -A cases=(
+    ["$USER_A → $PROJECT_A (own)"]="$t_a $PROJECT_A 200"
+    ["$USER_A → $PROJECT_B (other)"]="$t_a $PROJECT_B 403"
+    ["$USER_B → $PROJECT_B (own)"]="$t_b $PROJECT_B 200"
+    ["$USER_B → $PROJECT_A (other)"]="$t_b $PROJECT_A 403"
+  )
+  for desc in "${!cases[@]}"; do
+    read -r tok proj want <<<"${cases[$desc]}"
+    code=$(curl -k -s -o /dev/null -w '%{http_code}' --max-time 10 \
+      -H "Cookie: k8tre-project=${proj}; k8tre-auth-token-${proj}=${tok}" \
+      "${PORTAL_URL}/auth/validate?orig=http://jupyter.${DOMAIN}/hub/")
+    if [ "$code" = "$want" ]; then
+      pass "$desc → $code"
+    else
+      fail "$desc → got $code, expected $want"
+    fi
+  done
+}
+
+# ---- Test 2 — UX enumeration weakness ---------------------------------------
+test_enumeration() {
+  section "Test 2 — Pre-launch URL enumeration (known UX weakness)"
+
+  local t_a; t_a=$(get_token "$USER_A")
+  # We need a Portal session (cookie). The cleanest way is to drive the browser
+  # OIDC flow, but for an automated test we just call /projects/<other>/apps
+  # without a session — it requires require_user which 401s, so anonymous can't
+  # enumerate either way. The actual UX leak is FOR A LOGGED-IN USER. We
+  # simulate that with a session by forging the session via the password grant
+  # token in the Authorization header — which the require_user dependency
+  # does NOT accept (it reads from request.session). So we test what we can:
+  # anonymous access correctly 401s.
+  local code
+  code=$(curl -k -s -o /dev/null -w '%{http_code}' --max-time 5 \
+    "${PORTAL_URL}/projects/${PROJECT_B}/apps")
+  if [ "$code" = "401" ] || [ "$code" = "302" ]; then
+    pass "anonymous /projects/$PROJECT_B/apps → $code (require_user blocks)"
+  else
+    fail "anonymous /projects/$PROJECT_B/apps → $code (expected 401/302)"
+  fi
+  weak "logged-in users can still GET /projects/<other>/apps and /launch/<other>/<app>"
+  weak "  → enumerates project names; data is gated by /auth/validate only"
+}
+
+# ---- Test 3 — Cross-namespace RBAC -----------------------------------------
+test_rbac() {
+  section "Test 3 — Cross-namespace RBAC (default ServiceAccount)"
+  # Use `kubectl auth can-i --as=...` so we test the RBAC policy directly,
+  # bypassing the pod-network restrictions that block in-pod kubectl access
+  # to the apiserver in project-* namespaces.
+
+  local sa="system:serviceaccount:project-${PROJECT_A}:default"
+  local verdicts=(
+    "list pods       -n project-${PROJECT_B}"
+    "list secrets    -n project-${PROJECT_B}"
+    "create pods     -n project-${PROJECT_B}"
+    "delete pods     -n project-${PROJECT_B}"
+  )
+  for v in "${verdicts[@]}"; do
+    local can
+    can=$(kubectl auth can-i $v --as="$sa" 2>&1)
+    if [ "$can" = "no" ]; then
+      pass "$sa cannot $v"
+    else
+      fail "$sa CAN $v (got: $can)"
+    fi
+  done
+
+  # Sanity check: the SA CAN access its own namespace? (we don't grant any
+  # extra roles, so default SA can only access its own /tokenrequest etc.)
+  local can_own
+  can_own=$(kubectl auth can-i get serviceaccounts/default -n "project-${PROJECT_A}" --as="$sa" 2>&1)
+  echo "  (sanity: SA in own ns get sa/default → $can_own)"
+}
+
+# ---- Test 4 — Cross-namespace network policy --------------------------------
+test_netpol() {
+  section "Test 4 — Cross-namespace network reachability (Cilium)"
+
+  # Spin up an HTTP server in project-bravo and a client in project-alpha,
+  # then check whether alpha can reach bravo.
+  kubectl delete pod -n "project-${PROJECT_B}" net-target  --ignore-not-found --force --grace-period=0 >/dev/null 2>&1
+  kubectl delete pod -n "project-${PROJECT_A}" net-client  --ignore-not-found --force --grace-period=0 >/dev/null 2>&1
+
+  # Target: a tiny HTTP responder
+  kubectl run net-target -n "project-${PROJECT_B}" --image=alpine:3.20 \
+    --restart=Never --command -- sh -c \
+    'while true; do printf "HTTP/1.1 200 OK\r\nContent-Length:3\r\n\r\nOK\n" | nc -lp 8080 -w 1; done' \
+    >/dev/null
+  # Client
+  kubectl run net-client -n "project-${PROJECT_A}" --image=curlimages/curl:8.10.1 \
+    --restart=Never --command -- sleep 60 >/dev/null
+
+  for i in $(seq 40); do
+    [ "$(kubectl get pod -n "project-${PROJECT_A}" net-client -o jsonpath='{.status.containerStatuses[0].ready}' 2>/dev/null)" = "true" ] && \
+      [ "$(kubectl get pod -n "project-${PROJECT_B}" net-target -o jsonpath='{.status.containerStatuses[0].ready}' 2>/dev/null)" = "true" ] && break
+    sleep 2
+  done
+
+  local target_ip
+  target_ip=$(kubectl get pod -n "project-${PROJECT_B}" net-target -o jsonpath='{.status.podIP}' 2>/dev/null)
+  if [ -z "$target_ip" ]; then
+    fail "could not bring up net-target pod"
+    kubectl delete pod -n "project-${PROJECT_B}" net-target --force --grace-period=0 >/dev/null 2>&1
+    kubectl delete pod -n "project-${PROJECT_A}" net-client --force --grace-period=0 >/dev/null 2>&1
+    return
+  fi
+
+  # Curl always prints %{http_code} via -w (even on failure: 000). We rely on
+  # that single output and never use ||, which would double the value.
+  local code
+  code=$(kubectl exec -n "project-${PROJECT_A}" net-client -- \
+    curl -s -o /dev/null --max-time 5 --connect-timeout 4 -w '%{http_code}' \
+    "http://${target_ip}:8080/" 2>/dev/null)
+  case "$code" in
+    000)    pass "project-${PROJECT_A} → project-${PROJECT_B} pod blocked (code 000, connection denied)" ;;
+    "")     fail "no response captured from curl (pod exec failed?)" ;;
+    200)    weak "project-${PROJECT_A} → project-${PROJECT_B} pod reachable (200) — project namespaces have no default-deny NetworkPolicy applied" ;;
+    *)      fail "unexpected code: $code" ;;
+  esac
+
+  kubectl delete pod -n "project-${PROJECT_B}" net-target --force --grace-period=0 >/dev/null 2>&1
+  kubectl delete pod -n "project-${PROJECT_A}" net-client --force --grace-period=0 >/dev/null 2>&1
+}
+
+# ---- Test 5 — User CR validation -------------------------------------------
+test_user_cr() {
+  section "Test 5 — User → Group → Project graph integrity"
+
+  for u in "$USER_A:$PROJECT_A:$PROJECT_B" "$USER_B:$PROJECT_B:$PROJECT_A"; do
+    IFS=':' read -r user own_project other_project <<< "$u"
+    local groups
+    groups=$(kubectl get user -n "$K8TRE_NAMESPACE" "$user" -o jsonpath='{.spec.groups[*]}' 2>/dev/null)
+    [ -n "$groups" ] && pass "user $user has groups: $groups" || fail "user $user not found or has no groups"
+  done
+}
+
+# ---- main -------------------------------------------------------------------
+main() {
+  echo "Domain:    $DOMAIN"
+  echo "Portal:    $PORTAL_URL"
+  echo "Keycloak:  $KC_URL"
+  echo "Projects:  $PROJECT_A, $PROJECT_B"
+  echo "Users:     $USER_A (in $PROJECT_A-team), $USER_B (in $PROJECT_B-team)"
+
+  setup
+  test_authvalidate
+  test_enumeration
+  test_rbac
+  test_netpol
+  test_user_cr
+
+  printf "\n${BOLD}== Summary ==${RESET}\n"
+  printf "  ${GREEN}%d passed${RESET},  ${RED}%d failed${RESET},  ${YELLOW}%d weak${RESET}\n" "$PASS" "$FAIL" "$WEAK"
+  echo
+  echo "WEAK = enforcement is incomplete by design — see comments above."
+  [ "$FAIL" -eq 0 ] || exit 1
+}
+
+main "$@"

From beb16592ef87dbec070f6523a8cd0555ccdbeca3 Mon Sep 17 00:00:00 2001
From: Gianpaolo Sanseverino <gia.sanseverino@gmail.com>
Date: Wed, 10 Jun 2026 10:18:57 +0200
Subject: [PATCH 2/3] volume isolation

---
 docs/troubleshooting/volume-isolation.md | 232 +++++++++++++++++++++++
 1 file changed, 232 insertions(+)
 create mode 100644 docs/troubleshooting/volume-isolation.md

diff --git a/docs/troubleshooting/volume-isolation.md b/docs/troubleshooting/volume-isolation.md
new file mode 100644
index 00000000..7e898fa5
--- /dev/null
+++ b/docs/troubleshooting/volume-isolation.md
@@ -0,0 +1,232 @@
+# Volume isolation between projects
+
+Operational write-up of the storage-isolation posture on a single-node
+k8tre cluster (Longhorn + k3s) as observed on the StackIT dev cluster.
+Pairs with [`docs/project-isolation.md`](../project-isolation.md), which
+covers the identity / network / RBAC layers; this doc covers the volume
+layer specifically and the one gap that came out of the probes.
+
+## TL;DR
+
+The defaults give you four solid isolation guarantees and one critical
+gap:
+
+| Probe | Result | Note |
+| --- | --- | --- |
+| PVC RBAC cross-namespace | ✅ blocked | default SA cannot `get/list/create/delete` PVCs in another `project-*` namespace |
+| Cluster-scoped Longhorn (`volumes.longhorn.io`, `persistentvolumes`) | ✅ blocked | same SA has no access |
+| Pod creation under the default SA | ✅ blocked | apiserver returns `Forbidden` — a notebook user cannot spawn its own pod |
+| `claimName` reference into another namespace | ✅ blocked | scheduler resolves `claimName` in the pod's own namespace and the pod stays `Pending` with `persistentvolumeclaim "X" not found` |
+| Reachability of `longhorn-frontend` / `longhorn-manager` from a project pod | ✅ blocked | Cilium NetworkPolicy returns 000 (no connection) |
+| `hostPath: /var/lib/longhorn` mount in a `project-*` namespace | ❌ **allowed** | no Pod Security Admission enforcement → cluster-admin (or anything able to create pods directly) can read every project's raw volume blocks |
+
+The last row is the one that matters — see *The hostPath gap* below.
+
+## How to reproduce — the six probes
+
+The whole block can be pasted into a shell with the cluster's kubectl
+context active.
+
+```sh
+SA_A=system:serviceaccount:project-alpha:default
+SA_B=system:serviceaccount:project-bravo:default
+
+# 1) PVC RBAC cross-namespace
+for verb in get list create delete; do
+  echo "  $verb pvc -n project-bravo : $(kubectl auth can-i $verb pvc -n project-bravo --as=$SA_A)"
+done
+
+# 2) Cluster-scoped Longhorn + PV
+for verb in get list create delete patch; do
+  echo "  $verb volumes.longhorn.io : $(kubectl auth can-i $verb volumes.longhorn.io --as=$SA_A)"
+done
+for verb in get list create patch; do
+  echo "  $verb persistentvolumes : $(kubectl auth can-i $verb persistentvolumes --as=$SA_A)"
+done
+
+# 3) Pod creation under the default SA — must fail Forbidden
+kubectl apply --as=$SA_A -f - <<'POD'
+apiVersion: v1
+kind: Pod
+metadata: {name: vol-steal, namespace: project-alpha}
+spec:
+  containers:
+  - {name: t, image: alpine:3.20, command: ["sleep","30"]}
+POD
+
+# 4) Cross-ns claimName — admin creates the pod, scheduler must refuse
+kubectl apply -f - <<'POD'
+apiVersion: v1
+kind: Pod
+metadata: {name: vol-steal-admin, namespace: project-alpha}
+spec:
+  containers:
+  - {name: t, image: alpine:3.20, command: ["sleep","30"],
+     volumeMounts: [{name: v, mountPath: /s}]}
+  volumes:
+  - {name: v, persistentVolumeClaim: {claimName: notebook-bob-bravo}}
+POD
+sleep 4
+kubectl get pod -n project-alpha vol-steal-admin \
+  -o jsonpath='{.status.conditions[?(@.type=="PodScheduled")].message}{"\n"}'
+
+# 5) hostPath mount — the gap
+kubectl apply -f - <<'POD'
+apiVersion: v1
+kind: Pod
+metadata: {name: host-escape, namespace: project-alpha}
+spec:
+  containers:
+  - {name: t, image: alpine:3.20, command: ["sleep","30"],
+     volumeMounts: [{name: host, mountPath: /host}]}
+  volumes:
+  - {name: host, hostPath: {path: /var/lib/longhorn}}
+POD
+sleep 3
+kubectl get pod -n project-alpha host-escape -o jsonpath='{.status.phase}{"\n"}'
+# expected on a hardened cluster:  the apply itself fails with
+#   pods "host-escape" is forbidden: violates PodSecurity "restricted:v1.32"
+# observed today: phase=Running
+
+# 6) Longhorn UI/API reachability from a project pod
+LH_UI_IP=$(kubectl get svc -n storage-system longhorn-frontend \
+  -o jsonpath='{.spec.clusterIP}')
+kubectl run lh-probe -n project-alpha --image=curlimages/curl:8.10.1 \
+  --restart=Never --command -- sleep 30 >/dev/null
+# (wait for ready, then)
+kubectl exec -n project-alpha lh-probe -- curl -s -o /dev/null \
+  --max-time 5 -w '%{http_code}' "http://$LH_UI_IP/"
+
+# cleanup
+kubectl delete pod -n project-alpha vol-steal-admin host-escape lh-probe \
+  --force --grace-period=0
+```
+
+## The `hostPath` gap
+
+Anything that can create pods in a `project-*` namespace — directly or
+through a spawner — can mount the node's `/var/lib/longhorn` and read
+the raw replica files for every project's Longhorn volumes. The
+mount-namespace separation is irrelevant: Longhorn keeps replicas as
+ordinary files (`volume-head-NNN.img`, `volume-snap-X.img`) under that
+directory, and once they're visible to the attacker's pod they are
+copyable and parseable.
+
+A user inside a JupyterHub notebook cannot today create such a pod
+(its bearer token is its own ServiceAccount, which lacks `create pods`
+in its namespace — see Probe 3). The realistic attack surface is:
+
+- **anything running with the JupyterHub or VDI spawner ServiceAccount**
+  (those *do* have `create pods` in the project namespace, otherwise
+  they couldn't spawn user environments);
+- **anything with cluster-admin** kubeconfig access (people, CI jobs,
+  ArgoCD app SAs with cluster-scoped permissions);
+- **future apps wired into the Project model** that may need a different
+  ServiceAccount with broader pod-creation rights.
+
+The defense is to *make `hostPath` (and other escapes) inadmissible at
+the namespace level*, not to rely on no one having the verb. That's
+what Pod Security Admission (PSA) does.
+
+## Mitigation — Pod Security Admission on `project-*` namespaces
+
+Kubernetes' built-in PSA enforces a profile at namespace level — no
+controller needed. Three profiles: `privileged` (default; allows
+everything), `baseline` (no obviously dangerous fields), `restricted`
+(strictly locked down). `hostPath`, `privileged: true`,
+`hostNetwork: true`, and most capabilities are rejected by both
+`baseline` and `restricted`.
+
+Label every project namespace:
+
+```sh
+for ns in project-alpha project-bravo project-demo-project; do
+  kubectl label ns "$ns" \
+    pod-security.kubernetes.io/enforce=baseline \
+    pod-security.kubernetes.io/enforce-version=v1.32 \
+    pod-security.kubernetes.io/audit=restricted \
+    pod-security.kubernetes.io/warn=restricted \
+    --overwrite
+done
+```
+
+Start with `enforce=baseline` (it kills `hostPath` and the worst gear
+but is friendly to most controller-spawned pods). Run the workloads
+for a few days, watch the `audit`/`warn` reports for what *would* fail
+under `restricted`, then promote `enforce` to `restricted` once those
+are fixed.
+
+Verify the gap closes:
+
+```sh
+kubectl apply -f - <<'POD' 2>&1 || echo "(blocked, as expected)"
+apiVersion: v1
+kind: Pod
+metadata: {name: host-escape, namespace: project-alpha}
+spec:
+  containers:
+  - {name: t, image: alpine:3.20, command: ["sleep","30"]}
+  volumes:
+  - {name: host, hostPath: {path: /var/lib/longhorn}}
+POD
+# expected:
+#   pods "host-escape" is forbidden: violates PodSecurity "baseline:v1.32":
+#   hostPath volumes (volume "host")
+```
+
+### Caveats before promoting to `restricted`
+
+The `restricted` profile demands `runAsNonRoot: true`, drops every
+capability except `NET_BIND_SERVICE`, requires `allowPrivilegeEscalation:
+false`, and pins `seccompProfile.type` to `RuntimeDefault` or
+`Localhost`. Things to verify before flipping the switch:
+
+- **JupyterHub user pod spec** (`apps/jupyterhub/.../values.yaml` —
+  `singleuser.cloudMetadata.blockWithIptables`, init containers, image
+  user) must satisfy all four constraints.
+- **VDI spawner pod spec** (`apps/guacamole/...` or whichever
+  controller materialises `VDIInstance`s) — desktop sessions often
+  need additional capabilities (e.g. `SYS_ADMIN` for FUSE) and won't
+  pass `restricted` without explicit `securityContext` tuning.
+- **Init containers** added by Longhorn (`engine-image-ei-*`) — those
+  live under `storage-system`, not the project namespaces, so they're
+  not affected, but any sidecar that the spawner injects into the
+  user pod is.
+
+Easiest path: bake the PSA labels into the Project provisioning logic
+(today there is no controller — the labels would have to be added
+manually each time a Project namespace is created), so a new Project
+boots with `enforce=baseline` from second zero.
+
+## What's still not tested
+
+- **Spawner ServiceAccount surface.** The JupyterHub `hub`
+  ServiceAccount has `create pods` on the project namespace. If an
+  attacker pops the hub, they can spawn arbitrary pods including a
+  hostPath escape — until PSA is on. A separate test should impersonate
+  the spawner SA and verify that PSA blocks the escape there too.
+- **Longhorn snapshots & `BackingImage` cross-project leak.** A
+  `BackingImage` is cluster-scoped and stores blob data under
+  `/var/lib/longhorn/backing-images/`. If one project's data is ever
+  promoted to a BackingImage by mistake, every project can mount it as
+  a read-only source. Probably accidental, worth checking once a
+  backup/restore workflow is wired up.
+- **`subPath` traversal inside a single namespace.** PVC content
+  containing `..` segments could let one pod's mount expose another
+  pod's files inside the same namespace. Out of scope for cross-project
+  isolation but a useful adversarial test inside JupyterHub.
+- **`VolumeSnapshot` / `VolumeSnapshotContent` RBAC.** Same shape as
+  PV vs PVC: snapshot contents are cluster-scoped. Today no SA in
+  `project-*` has access, but it's worth re-checking after any future
+  Velero / volume-restore integration.
+
+## Where this fits
+
+The findings on this page extend
+[`docs/project-isolation.md`](../project-isolation.md) with a 5th
+enforcement layer (volume / node-storage) that the current
+`tests/test-project-isolation.sh` does not yet cover. Next iteration
+of the test script should add probes 1–6 above and a final assertion
+that the `hostPath` pod is denied by PodSecurity. Until PSA is
+applied, that assertion is *expected to fail* — which is exactly the
+signal we want from the test.

From 1fc5952c90912efe8adb76d6a854b05421424a7e Mon Sep 17 00:00:00 2001
From: Gianpaolo Sanseverino <gia.sanseverino@gmail.com>
Date: Thu, 11 Jun 2026 12:36:35 +0200
Subject: [PATCH 3/3] flow charts

---
 docs/diagrams.md                | 184 ++++++++++++++++++++++++++++++++
 tests/test-project-isolation.sh | 166 +++++++++++++++++++++++++++-
 2 files changed, 348 insertions(+), 2 deletions(-)
 create mode 100644 docs/diagrams.md

diff --git a/docs/diagrams.md b/docs/diagrams.md
new file mode 100644
index 00000000..8374ad92
--- /dev/null
+++ b/docs/diagrams.md
@@ -0,0 +1,184 @@
+# k8tre — architecture & isolation diagrams
+
+Two diagrams that describe the cluster as deployed on the StackIT environment (single-node k3s + Cilium + Longhorn).
+
+1. Focuses on how project tenants are kept apart from each other and from the infrastructure.
+1. Cooms out to the end-to-end request flow from a researcher's browser down to a per-project notebook pod.
+
+## 1. Namespace separation & tenant isolation
+
+```mermaid
+flowchart TB
+    %% ─────────── Infrastructure namespaces ───────────
+    subgraph INFRA["🔧  Infrastructure namespaces — managed by platform admin"]
+        direction LR
+        ks["kube-system<br/>Cilium · CoreDNS · metrics-server"]
+        ss["storage-system<br/>Longhorn manager · CSI plugin"]
+        cnpg["cnpg-system<br/>CloudNativePG operator"]
+        es["external-secrets · cert-manager · argocd · metallb-system"]
+    end
+
+    %% ─────────── Platform namespaces (shared TRE services) ───────────
+    subgraph PLAT["🧱  Platform namespaces — shared TRE services"]
+        direction LR
+        kc["keycloak<br/>(realm k8tre-app)"]
+        bk["backend<br/>(portal)"]
+        gw["gateway<br/>(Cilium Gateway API)"]
+        nx["ingress-nginx"]
+        gt["gitea"]
+        jh["jupyterhub<br/>(hub + jhub-auth-proxy<br/>+ guacamole pods)"]
+        os["object-storage<br/>(SeaweedFS)"]
+    end
+
+    %% ─────────── Tenant (per-project) namespaces ───────────
+    subgraph TA["🟦  project-alpha — tenant A"]
+        direction LR
+        nbA["JupyterHub user-pod<br/>+ PVC notebook-alice-alpha"]
+        vdA["VDI pod<br/>+ PVC"]
+    end
+
+    subgraph TB_["🟩  project-bravo — tenant B"]
+        direction LR
+        nbB["JupyterHub user-pod<br/>+ PVC notebook-bob-bravo"]
+        vdB["VDI pod<br/>+ PVC"]
+    end
+
+    %% Allowed traffic (solid arrows)
+    bk -- "spawn pods<br/>(spawner SA)" --> TA
+    bk -- "spawn pods<br/>(spawner SA)" --> TB_
+    nbA -- "intra-ns OK" --> vdA
+    nbB -- "intra-ns OK" --> vdB
+
+    %% Denied traffic (dashed arrows) — the isolation barriers we tested
+    TA -. "❌ RBAC: default SA cannot list/get/create<br/>❌ Cilium NetworkPolicy drops cross-tenant TCP<br/>❌ PVC claimName resolved in own ns only" .- TB_
+    TA -. "❌ Cilium: apiserver = host entity, blocked<br/>❌ Tenant SA has no ClusterRole" .- ks
+    TB_ -. "❌" .- ks
+
+    %% Style cues
+    classDef tenant fill:#dde7f3,stroke:#1f4e79,stroke-width:1px,color:#000;
+    classDef tenant2 fill:#dff0d8,stroke:#3c763d,stroke-width:1px,color:#000;
+    classDef infra fill:#f5f0e0,stroke:#8a7032,stroke-width:1px,color:#000;
+    classDef plat fill:#f0e6f3,stroke:#5a2e7a,stroke-width:1px,color:#000;
+    class TA tenant;
+    class TB_ tenant2;
+    class INFRA infra;
+    class PLAT plat;
+```
+
+**Reading the diagram**
+
+- **Vertical separation**: cluster-scoped resources sit at the top
+  (only platform admins write here); the row of infrastructure
+  namespaces hosts the cluster operators (Cilium, Longhorn, CNPG,
+  cert-manager, ArgoCD); below them the platform namespaces host the
+  TRE services every tenant shares (Keycloak, portal, gateway, hub,
+  Gitea, object-storage); at the bottom each project lives in its own
+  `project-<name>` namespace.
+- **Solid arrows** mark traffic that flows in production: the backend's
+  spawner ServiceAccount creates user pods inside the project
+  namespaces; pods within the same project talk freely.
+- **Dashed lines** mark the four assertions
+  [`tests/test-project-isolation.sh`](../tests/test-project-isolation.sh)
+  enforces: cross-tenant RBAC `no`, cross-tenant network drops, cross-
+  namespace `claimName` not resolved, and project pods cannot reach the
+  apiserver (Cilium treats `10.43.0.1` as `host`, the
+  `allow-pod-to-pod-via-gateway` policy only opens `cluster` entities).
+
+## 2. End-to-end request flow
+
+```mermaid
+flowchart TB
+    user(["👩‍🔬 Researcher's browser"])
+
+    subgraph CLOUD["☁️  Cloud network (StackIT)"]
+        fip["188.34.94.28 · floating IP"]
+    end
+
+    subgraph VM["🖥️  Single-node VM — k3s + Cilium"]
+        direction TB
+        socat["socat 0.0.0.0:80,443 → 127.0.0.1:14722"]
+        envoy["cilium-envoy · Gateway API listener · 127.0.0.1:14722"]
+        gw["Gateway internal-gateway<br/>HTTPRoutes for portal · keycloak · jupyter · guacamole · gitea · cr8tor"]
+
+        subgraph TRE["TRE platform"]
+            direction TB
+            portal["portal (backend)"]
+            keycloak[("Keycloak · realm k8tre-app")]
+            cnpg[("CloudNativePG · postgres")]
+            apiserver["kube-apiserver"]
+            hub["JupyterHub + jhub-auth-proxy"]
+            guac["Guacamole + guacamole-auth-proxy"]
+            gitea["Gitea"]
+            longhorn[("Longhorn · volumes & replicas")]
+        end
+
+        subgraph TENANTS["Per-project workload namespaces"]
+            direction TB
+            pa["project-alpha · user-notebook · VDI · PVC"]
+            pb["project-bravo · user-notebook · VDI · PVC"]
+        end
+    end
+
+    user -->|"HTTPS<br/>*.&lt;domain&gt;.nip.io"| fip
+    fip -->|"DNAT to VM"| socat
+    socat --> envoy
+    envoy --> gw
+
+    gw -->|"portal.<br/>&lt;domain&gt;"| portal
+    gw -->|"keycloak.<br/>&lt;domain&gt;"| keycloak
+    gw -->|"jupyter.<br/>&lt;domain&gt;"| hub
+    gw -->|"guacamole.<br/>&lt;domain&gt;"| guac
+    gw -->|"gitea.<br/>&lt;domain&gt;"| gitea
+
+    portal -->|"OIDC code flow<br/>+ JWKS"| keycloak
+    portal -->|"reads User /<br/>Group / Project CRs"| apiserver
+    portal -->|"creates VDIInstance<br/>updates JupyterHub<br/>profile via API"| apiserver
+    keycloak --> cnpg
+
+    hub -->|"spawn user-notebook"| pa
+    hub -->|"spawn user-notebook"| pb
+    guac -->|"open VDI"| pa
+    guac -->|"open VDI"| pb
+
+    pa -->|"PVC binds<br/>project-alpha only"| longhorn
+    pb -->|"PVC binds<br/>project-bravo only"| longhorn
+
+    %% Auth subrequest loop (every subdomain hit)
+    hub <-->|"/auth/validate<br/>(subrequest)"| portal
+    guac <-->|"/auth/validate"| portal
+
+    classDef ext fill:#cfe2f3,stroke:#0b5394,color:#000;
+    classDef net fill:#e6f4ea,stroke:#137333,color:#000;
+    classDef plat fill:#fff2cc,stroke:#bf9000,color:#000;
+    classDef tenant fill:#f4cccc,stroke:#990000,color:#000;
+    class user,fip ext;
+    class socat,envoy,gw net;
+    class portal,keycloak,cnpg,apiserver,hub,guac,gitea,longhorn plat;
+    class pa,pb tenant;
+```
+
+**Reading the diagram**
+
+- **Ingress chain (top-left to top-right)**: browser → floating IP →
+  cloud NAT → VM's `enp3s0:443` → `socat` systemd unit → Cilium-Envoy
+  loopback listener → `Gateway internal-gateway` (Cilium Gateway API).
+  This is the path documented in
+  [`docs/troubleshooting/k8tre-install-guide.md`](troubleshooting/k8tre-install-guide.md#5-expose-the-gateway-on-the-host-ip)
+  and the reason `socat` exists on the host: every HTTPS hit traverses
+  it.
+- **HTTPRoutes** fan out to the platform services. Three (Keycloak,
+  portal, Gitea) serve directly; two (JupyterHub, Guacamole) are
+  fronted by their auth-proxy nginx which calls back to
+  `portal:/auth/validate` for every request to translate the user's
+  Keycloak session into the per-project authorization decision.
+- **portal ↔ Keycloak** is OIDC over the **internal-resolved** public
+  hostname — see the CoreDNS hosts override in
+  [`docs/troubleshooting/keycloak-post-install-setup.md`](troubleshooting/keycloak-post-install-setup.md#make-the-backend-reach-keycloak-from-inside-the-cluster)
+  for why a pod hitting `keycloak.<domain>` is rewritten to the VM's
+  primary private IP rather than hairpinning out the cloud NAT.
+- **portal ↔ apiserver** is how the User → Group → Project CR graph is
+  read at every `/projects` and `/auth/validate` call, and how the
+  backend mints a `VDIInstance` when a researcher clicks *Launch*.
+- **JupyterHub & Guacamole spawn pods inside the tenant namespace** —
+  per-project PVCs are created/bound here, never across; the dotted
+  isolation boundaries of Diagram 1 apply.
diff --git a/tests/test-project-isolation.sh b/tests/test-project-isolation.sh
index 1c05617d..fed82190 100755
--- a/tests/test-project-isolation.sh
+++ b/tests/test-project-isolation.sh
@@ -277,9 +277,170 @@ test_netpol() {
   kubectl delete pod -n "project-${PROJECT_A}" net-client --force --grace-period=0 >/dev/null 2>&1
 }
 
-# ---- Test 5 — User CR validation -------------------------------------------
+# ---- Test 5 — Cross-project volume access ----------------------------------
+test_volume() {
+  section "Test 5 — Volume isolation: PVC in project ${PROJECT_A} unreachable from project ${PROJECT_B}"
+
+  local pvc=secret-${PROJECT_A}-data
+  local sentinel="SECRET_${PROJECT_A^^}: only-for-${PROJECT_A}-team"
+  local sa_b="system:serviceaccount:project-${PROJECT_B}:default"
+
+  # Cleanup any previous run
+  kubectl delete pod ${PROJECT_A}-writer ${PROJECT_A}-reader -n "project-${PROJECT_A}" \
+    --ignore-not-found --force --grace-period=0 >/dev/null 2>&1
+  kubectl delete pod ${PROJECT_B}-thief-name -n "project-${PROJECT_B}" \
+    --ignore-not-found --force --grace-period=0 >/dev/null 2>&1
+  kubectl delete pvc "$pvc" -n "project-${PROJECT_A}" --ignore-not-found >/dev/null 2>&1
+  kubectl delete pvc "stolen-via-volume-name" -n "project-${PROJECT_B}" --ignore-not-found >/dev/null 2>&1
+
+  # --- 5a) create PVC in PROJECT_A and write sentinel ---
+  kubectl apply -f - >/dev/null <<YAML
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata: {name: ${pvc}, namespace: project-${PROJECT_A}}
+spec:
+  accessModes: [ReadWriteOnce]
+  storageClassName: rwo-default
+  resources: {requests: {storage: 256Mi}}
+---
+apiVersion: v1
+kind: Pod
+metadata: {name: ${PROJECT_A}-writer, namespace: project-${PROJECT_A}}
+spec:
+  restartPolicy: OnFailure
+  containers:
+  - name: w
+    image: alpine:3.20
+    command: ["sh","-c","echo '${sentinel}' > /data/secret.txt; sync; sleep 3"]
+    volumeMounts: [{name: d, mountPath: /data}]
+  volumes:
+  - {name: d, persistentVolumeClaim: {claimName: ${pvc}}}
+YAML
+
+  # Wait for writer to finish
+  local phase=""
+  for i in $(seq 60); do
+    phase=$(kubectl get pod -n "project-${PROJECT_A}" "${PROJECT_A}-writer" -o jsonpath='{.status.phase}' 2>/dev/null)
+    [ "$phase" = "Succeeded" ] && break
+    [ "$phase" = "Failed" ] && break
+    sleep 2
+  done
+  [ "$phase" = "Succeeded" ] && pass "sentinel written to ${pvc} (writer phase=$phase)" \
+    || { fail "writer phase=$phase — abort volume test"; return; }
+
+  # --- 5b) PROJECT_A's own pod can read the sentinel back ---
+  kubectl run "${PROJECT_A}-reader" -n "project-${PROJECT_A}" --image=alpine:3.20 --restart=Never \
+    --overrides="{\"spec\":{\"containers\":[{\"name\":\"r\",\"image\":\"alpine:3.20\",\"command\":[\"cat\",\"/data/secret.txt\"],\"volumeMounts\":[{\"name\":\"d\",\"mountPath\":\"/data\"}]}],\"volumes\":[{\"name\":\"d\",\"persistentVolumeClaim\":{\"claimName\":\"${pvc}\"}}]}}" \
+    --command -- cat /data/secret.txt >/dev/null 2>&1
+
+  for i in $(seq 60); do
+    phase=$(kubectl get pod -n "project-${PROJECT_A}" "${PROJECT_A}-reader" -o jsonpath='{.status.phase}' 2>/dev/null)
+    [ "$phase" = "Succeeded" ] || [ "$phase" = "Failed" ] && break
+    sleep 2
+  done
+  local content
+  content=$(kubectl logs -n "project-${PROJECT_A}" "${PROJECT_A}-reader" 2>/dev/null)
+  if echo "$content" | grep -qF "$sentinel"; then
+    pass "${PROJECT_A} reader sees the sentinel from its own PVC"
+  else
+    fail "${PROJECT_A} reader did not see the sentinel: '$content'"
+  fi
+
+  # --- 5c) RBAC: PROJECT_B's default SA cannot touch PROJECT_A's PVC ---
+  for verb in get list create delete patch; do
+    local can
+    can=$(kubectl auth can-i $verb pvc/$pvc -n "project-${PROJECT_A}" --as=$sa_b 2>&1)
+    [ "$can" = "no" ] && pass "$sa_b cannot $verb pvc/$pvc -n project-${PROJECT_A}" \
+                     || fail "$sa_b CAN $verb pvc/$pvc -n project-${PROJECT_A} (got: $can)"
+  done
+
+  # --- 5d) Pod in PROJECT_B referencing claimName=$pvc must stay Pending ---
+  kubectl apply -f - >/dev/null <<YAML
+apiVersion: v1
+kind: Pod
+metadata: {name: ${PROJECT_B}-thief-name, namespace: project-${PROJECT_B}}
+spec:
+  restartPolicy: Never
+  containers:
+  - {name: t, image: alpine:3.20, command: ["sh","-c","cat /steal/secret.txt || echo NO-DATA"], volumeMounts: [{name: s, mountPath: /steal}]}
+  volumes:
+  - {name: s, persistentVolumeClaim: {claimName: ${pvc}}}
+YAML
+  sleep 6
+  phase=$(kubectl get pod -n "project-${PROJECT_B}" "${PROJECT_B}-thief-name" -o jsonpath='{.status.phase}')
+  local msg
+  msg=$(kubectl get pod -n "project-${PROJECT_B}" "${PROJECT_B}-thief-name" -o jsonpath='{.status.conditions[?(@.type=="PodScheduled")].message}')
+  if [ "$phase" = "Pending" ] && echo "$msg" | grep -q 'not found'; then
+    pass "pod in project-${PROJECT_B} stays Pending — k8s does NOT cross-resolve claimName"
+  else
+    fail "pod in project-${PROJECT_B} reached phase=$phase (msg: $msg)"
+  fi
+
+  # --- 5e) Cluster-scoped PV — try direct claimRef hijack with volumeName ---
+  # An admin tries to bind a brand-new PVC in PROJECT_B to the existing PV
+  # already locked to PROJECT_A. The PV controller refuses because the PV's
+  # claimRef immutably points at the alpha PVC.
+  local pv
+  pv=$(kubectl get pvc -n "project-${PROJECT_A}" "$pvc" -o jsonpath='{.spec.volumeName}')
+  kubectl apply -f - >/dev/null 2>&1 <<YAML
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata: {name: stolen-via-volume-name, namespace: project-${PROJECT_B}}
+spec:
+  accessModes: [ReadWriteOnce]
+  storageClassName: ""
+  resources: {requests: {storage: 256Mi}}
+  volumeName: ${pv}
+YAML
+  sleep 8
+  local theft_status theft_event
+  theft_status=$(kubectl get pvc -n "project-${PROJECT_B}" stolen-via-volume-name -o jsonpath='{.status.phase}' 2>/dev/null)
+  theft_event=$(kubectl get events -n "project-${PROJECT_B}" --field-selector involvedObject.name=stolen-via-volume-name -o jsonpath='{.items[-1:].message}' 2>/dev/null)
+  if [ "$theft_status" = "Pending" ] && echo "$theft_event" | grep -qi 'already bound'; then
+    pass "PV refuses cross-ns claimRef hijack — status=$theft_status: $theft_event"
+  elif [ "$theft_status" = "Bound" ]; then
+    fail "PV bound to a project-${PROJECT_B} PVC — cluster-scoped PV ISOLATION BROKEN"
+  else
+    pass "stolen PVC did not bind (status=$theft_status, event='$theft_event')"
+  fi
+
+  # --- 5f) hostPath escape — currently allowed (no PodSecurity enforce) ---
+  # Documented in docs/troubleshooting/volume-isolation.md: PSA is not on, so
+  # anyone who can create a pod in project-${PROJECT_B} can mount the node's
+  # /var/lib/longhorn and read every project's raw blocks. We report this as
+  # WEAK rather than FAIL so the suite stays useful on un-hardened clusters.
+  kubectl delete pod -n "project-${PROJECT_B}" host-escape --ignore-not-found --force --grace-period=0 >/dev/null 2>&1
+  kubectl apply -f - >/dev/null 2>&1 <<YAML
+apiVersion: v1
+kind: Pod
+metadata: {name: host-escape, namespace: project-${PROJECT_B}}
+spec:
+  restartPolicy: Never
+  containers:
+  - {name: t, image: alpine:3.20, command: ["sleep","30"], volumeMounts: [{name: h, mountPath: /host}]}
+  volumes:
+  - {name: h, hostPath: {path: /var/lib/longhorn}}
+YAML
+  sleep 5
+  phase=$(kubectl get pod -n "project-${PROJECT_B}" host-escape -o jsonpath='{.status.phase}' 2>/dev/null)
+  if [ "$phase" = "Running" ] || [ "$phase" = "Pending" ]; then
+    weak "hostPath /var/lib/longhorn admitted in project-${PROJECT_B} (phase=$phase)"
+    weak "  → apply pod-security.kubernetes.io/enforce=baseline on project namespaces"
+    weak "  → see docs/troubleshooting/volume-isolation.md"
+  else
+    pass "hostPath /var/lib/longhorn denied by admission (phase=$phase)"
+  fi
+
+  # --- cleanup ---
+  kubectl delete pod -n "project-${PROJECT_A}" "${PROJECT_A}-writer" "${PROJECT_A}-reader" --ignore-not-found --force --grace-period=0 >/dev/null 2>&1
+  kubectl delete pod -n "project-${PROJECT_B}" "${PROJECT_B}-thief-name" host-escape --ignore-not-found --force --grace-period=0 >/dev/null 2>&1
+  kubectl delete pvc -n "project-${PROJECT_B}" stolen-via-volume-name --ignore-not-found >/dev/null 2>&1
+  # leave the alpha PVC behind — idempotent re-runs reuse it
+}
+
+# ---- Test 6 — User CR validation -------------------------------------------
 test_user_cr() {
-  section "Test 5 — User → Group → Project graph integrity"
+  section "Test 6 — User → Group → Project graph integrity"
 
   for u in "$USER_A:$PROJECT_A:$PROJECT_B" "$USER_B:$PROJECT_B:$PROJECT_A"; do
     IFS=':' read -r user own_project other_project <<< "$u"
@@ -302,6 +463,7 @@ main() {
   test_enumeration
   test_rbac
   test_netpol
+  test_volume
   test_user_cr
 
   printf "\n${BOLD}== Summary ==${RESET}\n"