Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
184 changes: 184 additions & 0 deletions docs/diagrams.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
# k8tre — architecture & isolation diagrams

Two diagrams that describe the cluster as deployed on the StackIT environment (single-node k3s + Cilium + Longhorn).

1. Focuses on how project tenants are kept apart from each other and from the infrastructure.
1. Cooms out to the end-to-end request flow from a researcher's browser down to a per-project notebook pod.

## 1. Namespace separation & tenant isolation

```mermaid
flowchart TB
%% ─────────── Infrastructure namespaces ───────────
subgraph INFRA["🔧 Infrastructure namespaces — managed by platform admin"]
direction LR
ks["kube-system<br/>Cilium · CoreDNS · metrics-server"]
ss["storage-system<br/>Longhorn manager · CSI plugin"]
cnpg["cnpg-system<br/>CloudNativePG operator"]
es["external-secrets · cert-manager · argocd · metallb-system"]
end

%% ─────────── Platform namespaces (shared TRE services) ───────────
subgraph PLAT["🧱 Platform namespaces — shared TRE services"]
direction LR
kc["keycloak<br/>(realm k8tre-app)"]
bk["backend<br/>(portal)"]
gw["gateway<br/>(Cilium Gateway API)"]
nx["ingress-nginx"]
gt["gitea"]
jh["jupyterhub<br/>(hub + jhub-auth-proxy<br/>+ guacamole pods)"]
os["object-storage<br/>(SeaweedFS)"]
end

%% ─────────── Tenant (per-project) namespaces ───────────
subgraph TA["🟦 project-alpha — tenant A"]
direction LR
nbA["JupyterHub user-pod<br/>+ PVC notebook-alice-alpha"]
vdA["VDI pod<br/>+ PVC"]
end

subgraph TB_["🟩 project-bravo — tenant B"]
direction LR
nbB["JupyterHub user-pod<br/>+ PVC notebook-bob-bravo"]
vdB["VDI pod<br/>+ PVC"]
end

%% Allowed traffic (solid arrows)
bk -- "spawn pods<br/>(spawner SA)" --> TA
bk -- "spawn pods<br/>(spawner SA)" --> TB_
nbA -- "intra-ns OK" --> vdA
nbB -- "intra-ns OK" --> vdB

%% Denied traffic (dashed arrows) — the isolation barriers we tested
TA -. "❌ RBAC: default SA cannot list/get/create<br/>❌ Cilium NetworkPolicy drops cross-tenant TCP<br/>❌ PVC claimName resolved in own ns only" .- TB_
TA -. "❌ Cilium: apiserver = host entity, blocked<br/>❌ Tenant SA has no ClusterRole" .- ks
TB_ -. "❌" .- ks

%% Style cues
classDef tenant fill:#dde7f3,stroke:#1f4e79,stroke-width:1px,color:#000;
classDef tenant2 fill:#dff0d8,stroke:#3c763d,stroke-width:1px,color:#000;
classDef infra fill:#f5f0e0,stroke:#8a7032,stroke-width:1px,color:#000;
classDef plat fill:#f0e6f3,stroke:#5a2e7a,stroke-width:1px,color:#000;
class TA tenant;
class TB_ tenant2;
class INFRA infra;
class PLAT plat;
```

**Reading the diagram**

- **Vertical separation**: cluster-scoped resources sit at the top
(only platform admins write here); the row of infrastructure
namespaces hosts the cluster operators (Cilium, Longhorn, CNPG,
cert-manager, ArgoCD); below them the platform namespaces host the
TRE services every tenant shares (Keycloak, portal, gateway, hub,
Gitea, object-storage); at the bottom each project lives in its own
`project-<name>` namespace.
- **Solid arrows** mark traffic that flows in production: the backend's
spawner ServiceAccount creates user pods inside the project
namespaces; pods within the same project talk freely.
- **Dashed lines** mark the four assertions
[`tests/test-project-isolation.sh`](../tests/test-project-isolation.sh)
enforces: cross-tenant RBAC `no`, cross-tenant network drops, cross-
namespace `claimName` not resolved, and project pods cannot reach the
apiserver (Cilium treats `10.43.0.1` as `host`, the
`allow-pod-to-pod-via-gateway` policy only opens `cluster` entities).

## 2. End-to-end request flow

```mermaid
flowchart TB
user(["👩‍🔬 Researcher's browser"])

subgraph CLOUD["☁️ Cloud network (StackIT)"]
fip["188.34.94.28 · floating IP"]
end

subgraph VM["🖥️ Single-node VM — k3s + Cilium"]
direction TB
socat["socat 0.0.0.0:80,443 → 127.0.0.1:14722"]
envoy["cilium-envoy · Gateway API listener · 127.0.0.1:14722"]
gw["Gateway internal-gateway<br/>HTTPRoutes for portal · keycloak · jupyter · guacamole · gitea · cr8tor"]

subgraph TRE["TRE platform"]
direction TB
portal["portal (backend)"]
keycloak[("Keycloak · realm k8tre-app")]
cnpg[("CloudNativePG · postgres")]
apiserver["kube-apiserver"]
hub["JupyterHub + jhub-auth-proxy"]
guac["Guacamole + guacamole-auth-proxy"]
gitea["Gitea"]
longhorn[("Longhorn · volumes & replicas")]
end

subgraph TENANTS["Per-project workload namespaces"]
direction TB
pa["project-alpha · user-notebook · VDI · PVC"]
pb["project-bravo · user-notebook · VDI · PVC"]
end
end

user -->|"HTTPS<br/>*.&lt;domain&gt;.nip.io"| fip
fip -->|"DNAT to VM"| socat
socat --> envoy
envoy --> gw

gw -->|"portal.<br/>&lt;domain&gt;"| portal
gw -->|"keycloak.<br/>&lt;domain&gt;"| keycloak
gw -->|"jupyter.<br/>&lt;domain&gt;"| hub
gw -->|"guacamole.<br/>&lt;domain&gt;"| guac
gw -->|"gitea.<br/>&lt;domain&gt;"| gitea

portal -->|"OIDC code flow<br/>+ JWKS"| keycloak
portal -->|"reads User /<br/>Group / Project CRs"| apiserver
portal -->|"creates VDIInstance<br/>updates JupyterHub<br/>profile via API"| apiserver
keycloak --> cnpg

hub -->|"spawn user-notebook"| pa
hub -->|"spawn user-notebook"| pb
guac -->|"open VDI"| pa
guac -->|"open VDI"| pb

pa -->|"PVC binds<br/>project-alpha only"| longhorn
pb -->|"PVC binds<br/>project-bravo only"| longhorn

%% Auth subrequest loop (every subdomain hit)
hub <-->|"/auth/validate<br/>(subrequest)"| portal
guac <-->|"/auth/validate"| portal

classDef ext fill:#cfe2f3,stroke:#0b5394,color:#000;
classDef net fill:#e6f4ea,stroke:#137333,color:#000;
classDef plat fill:#fff2cc,stroke:#bf9000,color:#000;
classDef tenant fill:#f4cccc,stroke:#990000,color:#000;
class user,fip ext;
class socat,envoy,gw net;
class portal,keycloak,cnpg,apiserver,hub,guac,gitea,longhorn plat;
class pa,pb tenant;
```

**Reading the diagram**

- **Ingress chain (top-left to top-right)**: browser → floating IP →
cloud NAT → VM's `enp3s0:443` → `socat` systemd unit → Cilium-Envoy
loopback listener → `Gateway internal-gateway` (Cilium Gateway API).
This is the path documented in
[`docs/troubleshooting/k8tre-install-guide.md`](troubleshooting/k8tre-install-guide.md#5-expose-the-gateway-on-the-host-ip)
and the reason `socat` exists on the host: every HTTPS hit traverses
it.
- **HTTPRoutes** fan out to the platform services. Three (Keycloak,
portal, Gitea) serve directly; two (JupyterHub, Guacamole) are
fronted by their auth-proxy nginx which calls back to
`portal:/auth/validate` for every request to translate the user's
Keycloak session into the per-project authorization decision.
- **portal ↔ Keycloak** is OIDC over the **internal-resolved** public
hostname — see the CoreDNS hosts override in
[`docs/troubleshooting/keycloak-post-install-setup.md`](troubleshooting/keycloak-post-install-setup.md#make-the-backend-reach-keycloak-from-inside-the-cluster)
for why a pod hitting `keycloak.<domain>` is rewritten to the VM's
primary private IP rather than hairpinning out the cloud NAT.
- **portal ↔ apiserver** is how the User → Group → Project CR graph is
read at every `/projects` and `/auth/validate` call, and how the
backend mints a `VDIInstance` when a researcher clicks *Launch*.
- **JupyterHub & Guacamole spawn pods inside the tenant namespace** —
per-project PVCs are created/bound here, never across; the dotted
isolation boundaries of Diagram 1 apply.
201 changes: 201 additions & 0 deletions docs/project-isolation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
# k8tre — project isolation model and how to test it

This doc describes the four layers that keep one k8tre Project's resources
out of reach of another Project's users, and the
[`tests/test-project-isolation.sh`](../tests/test-project-isolation.sh)
script that exercises all four end-to-end.

## The isolation model

A *Project* in k8tre is more than a row in a database — it is materialized
across four independent enforcement layers. Each layer can fail on its
own without breaking the others, so the test below treats them
separately.

### Layer 1 — UX (`/projects` filtering)

When a logged-in user opens
`https://portal.<domain>/projects`, the backend walks the user's
`User CR (spec.groups[]) → Group CR (spec.projects[]) → Project CR` graph
and renders only the projects reachable from that user's identity. This is
a **convenience layer**, not a security boundary: it only controls what
the menu shows. The endpoint
`/projects/<other>/apps` is not protected by the authorization check
(`_is_user_authorised_project()` is **not** called there), so a
logged-in user who knows another project's name can still GET that
page and see its app list. That's a metadata leak, not a data leak —
see the *Weak spots* section below.

### Layer 2 — Backend authorization gate (`/auth/validate`)

This is the **real security boundary**. Every request the browser makes
to a non-portal subdomain (`jupyter.<domain>`, `guacamole.<domain>`, …)
is intercepted by an auth-proxy nginx that issues a subrequest to
`https://portal.<domain>/auth/validate`. That handler:

1. Extracts the project from the `k8tre-project` cookie or
`?project=` URL parameter,
2. Extracts the token from `k8tre-auth-token-<project>` cookie or
`?token=` parameter,
3. Verifies the JWT signature and the `aud` claim (the `audience`
protocol mapper on the Keycloak client is what makes this work —
see [`keycloak-post-install-setup.md`](troubleshooting/keycloak-post-install-setup.md#add-the-aud-audience-protocol-mapper)),
4. Calls **`_is_user_authorised_project(username, project)`**
(`ci/backend/main.py:301`) which re-walks the User/Group/Project
graph from the OIDC `preferred_username` claim,
5. Returns 200 + auth headers (`X-Auth-User`, `X-Auth-Groups`, …) on
success, **403** on authorization failure.

Same check is repeated at `/vdi/sso/<token>/<project>/<app>` for VDI
shortcut authentication, and inside `/launch/<project>/<app>` for the
in-VDI lockout (a user inside a VDI for project A can't `/launch` an
app for project B — `launch_app()` checks `vdi_context` + `vdi_project`).

### Layer 3 — Kubernetes RBAC

Each project runs in its own namespace, named
`project-<project>` by convention (see `get_proj_namespace()` in
`ci/backend/main.py:216`). Pods spawned by JupyterHub spawners or
VDIInstance CRs land in that namespace. The default Kubernetes RBAC
policy gives a workload's ServiceAccount no implicit cross-namespace
permissions, so a pod in `project-alpha` cannot list, read, create or
delete resources in `project-bravo` without an explicit RoleBinding
or ClusterRoleBinding granting it. The test asserts this with
`kubectl auth can-i --as=system:serviceaccount:project-alpha:default`
against `project-bravo`.

### Layer 4 — Cilium network policy

Once Cilium is the cluster CNI (see
[`k8tre-install-guide.md`](troubleshooting/k8tre-install-guide.md)),
the `CiliumClusterwideNetworkPolicy` and `CiliumNetworkPolicy`
resources shipped by `apps/jupyterhub/base/network_policy.yaml` are
**enforced** instead of inert. The pre-Cilium install (default k3s
flannel) accepted the manifests but no controller honored them. With
Cilium, pods in project namespaces can't reach pods in other project
namespaces — and, importantly, they can't reach the Kubernetes API
server (`10.43.0.1:443`) either, because Cilium treats it as `host`
entity, not `cluster`. This is the reason the RBAC test in layer 3
uses `kubectl auth can-i --as=...` from outside the pod rather than
running `kubectl` from inside it.

## The test script

[`tests/test-project-isolation.sh`](../tests/test-project-isolation.sh)
sets up two parallel projects (`alpha`, `bravo`) plus two users
(`alice` in `alpha-team`, `bob` in `bravo-team`) and walks each
isolation layer in turn. The setup phase is idempotent — re-running
the script just verifies state.

### Run

```sh
# Default: domain 188.34.94.28.nip.io, projects alpha/bravo, users alice/bob
./tests/test-project-isolation.sh

# Override any of these:
DOMAIN=foo.nip.io PROJECT_A=cardio PROJECT_B=onco \
USER_A=anna USER_B=bruno \
./tests/test-project-isolation.sh
```

Requires `kubectl` pointed at the cluster (the script runs the heavy
checks against the apiserver from the host) plus `curl`, `python3` and
`jq` locally.

### What it asserts

| Test | Layer | Expected on a healthy cluster |
|---|---|---|
| Token issuance + `aud` claim | Authn / Keycloak | both users get a JWT, `aud` contains `backend` |
| `alice → alpha` `/auth/validate` | Layer 2 | **200** |
| `alice → bravo` `/auth/validate` | Layer 2 | **403** |
| `bob → bravo` `/auth/validate` | Layer 2 | **200** |
| `bob → alpha` `/auth/validate` | Layer 2 | **403** |
| Anonymous `/projects/<other>/apps` | Layer 1 (negative) | 401 / 302 |
| Logged-in `/projects/<other>/apps` (manual) | Layer 1 (**weak**) | reachable, no authz — flagged as WEAK |
| `kubectl auth can-i list/create/delete pods+secrets -n project-bravo --as=…project-alpha:default` | Layer 3 | every answer is `no` |
| Pod in `project-alpha` curl to pod IP in `project-bravo:8080` | Layer 4 | code 000 (connection denied) |
| User CR `spec.groups[]` exists for both users | Layer 1 source-of-truth | non-empty |

Output uses three states: `PASS` (the assertion held), `FAIL` (it
didn't), `WEAK` (enforcement is incomplete by design — documented but
not yet fixed upstream). On a clean cluster you should see **14 PASS,
0 FAIL, 2 WEAK** today.

### What it does NOT cover

- **Token replay across projects.** The cookie name encodes the
project (`k8tre-auth-token-<project>`), but the JWT itself is
identical per user. An attacker with the JWT can mint a cookie for
any project the user is authorized for. Limit token TTLs to
mitigate.
- **JupyterHub spawner & VDIInstance RBAC** — pods spawned by these
components run under their own ServiceAccounts (`hub`,
`user-scheduler`, `vdi-spawner`) which DO have RoleBindings that
cross namespace boundaries. The test only checks the *default* SA;
audit those spawner SAs separately.
- **Cilium policies for inter-pod traffic *within* a single project
namespace.** Today everything in `project-alpha` can talk to
everything else in `project-alpha`. Tighten with per-pod
`endpointSelector` policies if needed.
- **The control plane / Keycloak realm itself.** A misconfigured
Keycloak protocol mapper (missing `groups` claim, missing `aud`)
can defeat the whole stack — see *Setup bugs the script surfaced*
below.

## Weak spots (the two `WEAK` results today)

### 1. Project enumeration via `/projects/<other>/apps`

A logged-in user can GET
`https://portal.<domain>/projects/<any-project-name>/apps` and the
backend will render the list of apps for that project regardless of
whether the user is authorized. The data fetched on subsequent
clicks is gated by `/auth/validate`, so no payload leaks — but the
existence of arbitrary project names is exposed. Fix is one line in
`get_apps()` in `ci/backend/main.py`: add the same
`_is_user_authorised_project(username, project)` check the
`/auth/validate` handler uses, return 403 if not authorized.

### 2. `/launch/<other>/<app>` sets cookies it shouldn't

Same shape, same fix: `launch_app()` mints a project-scoped token
and writes the `k8tre-auth-token-<other>` cookie before checking
authorization. The next request to the subdomain is then rejected by
`/auth/validate`, so a real attack only ever gets a dead cookie —
but it pollutes the user's cookie jar and consumes a Keycloak token
refresh.

## Setup bugs the script surfaced

The first runs failed for reasons that are themselves worth
documenting; the script now handles them in its setup phase:

1. **`kcadm.sh set-password` defaults to `--temporary=true`**, which
marks the password as needing a change on next login. The OIDC
*Resource Owner Password Credentials* (password grant) flow then
rejects the login with `Account is not fully set up`. The script
always passes `--temporary=false`.
2. **Users created without `firstName`/`lastName`** trigger the same
`Account is not fully set up` error even when the password is
permanent. The script always sets both on create and runs an
`update users/<id>` on existing users to backfill them.
3. **In-pod `kubectl get …` against another namespace fails before
it can reach the API server** because the project-namespace
network policy prevents the pod from connecting to
`10.43.0.1:443`. The script uses
`kubectl auth can-i --as=…` from the host instead — cleaner
assertion and resilient to network-policy changes.

## Files

- [`tests/test-project-isolation.sh`](../tests/test-project-isolation.sh)
— the script.
- [`ci/backend/main.py`](../ci/backend/main.py) — `get_apps()`,
`launch_app()`, `_is_user_authorised_project()`,
`get_proj_namespace()`.
- [`apps/jupyterhub/base/network_policy.yaml`](../apps/jupyterhub/base/network_policy.yaml)
— the Cilium policies enforced under layer 4.
- [`docs/troubleshooting/keycloak-post-install-setup.md`](troubleshooting/keycloak-post-install-setup.md)
— Keycloak client and audience mapper setup that the authz check depends on.
Loading