Skip to content

Support operator-independent and non-sidecar (separate-pod egress gateway) deployment of AuthBridge #531

Description

@huang195

Summary

Make AuthBridge deployable without the kagenti-operator and in a non-sidecar topology — specifically a separate-pod egress gateway (authbridge-proxy as its own Deployment/StatefulSet + Service, with NetworkPolicy-enforced egress capture). Today AuthBridge can only be deployed as an operator-injected sidecar; the data plane is already capable of more, but the packaging and a documented operator-independent contract don't exist.

Motivation

  1. Decouple from the operator → broaden reuse. The operator's admission webhook is currently the only way to wire AuthBridge up (sidecar injection, iptables init, cert-manager CA mount, per-agent ConfigMap, HTTP_PROXY/trust env). That hard-couples AuthBridge to Kagenti's control plane. Other platforms that have their own controller — e.g. dam-agents/dam (internally kagenti/platform, see Investigate swapping AuthBridge in as the egress data plane for dam-agents/dam (kagenti/platform) #529) — can't adopt authbridge-proxy as a drop-in egress data plane without re-implementing all of that wiring. A documented, operator-independent deployment contract turns the operator into an optional convenience rather than a hard dependency.

  2. Enable the higher-assurance separate-pod topology. A separate-pod gateway is meaningfully more defensible than the sidecar for egress containment and secret isolation:

    • The enforcement point and the CA key / credentials live outside the agent's pod and network namespace — stealing them requires a node-level escape, not just agent-container root.
    • It unlocks a CNI-enforced NetworkPolicy layer (default-deny egress, allow-only-gateway) that the agent cannot manipulate even with in-pod NET_ADMIN. The sidecar model is structurally incapable of this: agent and sidecar share one pod IP/netns, so NetworkPolicy can't discriminate the agent→proxy hop, leaving in-pod iptables as the single (agent-co-resident) layer.
    • This directly addresses threat-model gaps C1 (default-allow egress) and C2 (SVID/secret theft).
  3. Standalone / PoC / third-party use. Running the proxy as a plain Deployment via Helm makes it usable in demos, tests, and non-Kagenti environments without standing up the operator + webhook.

Current state (what exists today)

  • Modes (authlib/config/config.go:451-456): envoy-sidecar, waypoint, proxy-sidecar. Mode is fixed at build time by which cmd/ binary you run. envoy-sidecar and proxy-sidecar are both sidecar shapes. waypoint exists only as a preset (presets.go:12-14) — no binary/manifest wires it up.
  • authbridge-lite is NOT a separate topology — it's the size-optimized proxy-sidecar build (fewer plugins), same listener layout (cmd/authbridge-lite/main.go:135-140).
  • No standalone deployment artifacts exist — no Helm chart, kustomize, or raw Deployment+Service for the proxy anywhere. Every AuthBridge manifest is a demo agent relying on the operator webhook (demos/echo/k8s/agent.yaml, demos/mtls/k8s/caller.yaml). The only standalone Deployment+Service in the repo is sparc-service (a plugin backend, explicitly kagenti.io/inject: disabled), not the proxy.

So operator-independent + separate-pod is genuinely unbuilt.

The good news: the data plane is already capable

authbridge-proxy/lite need very little to run anywhere (low data-plane risk; the work is mostly packaging + contract + docs):

  • One flag: --config <path> (cmd/authbridge-proxy/main.go:139).
  • All listeners bind 0.0.0.0, not localhost (presets.go:16-22) — reachable over a Service IP.
  • The forward proxy is a real HTTP/CONNECT proxy that dials the client-supplied target (forwardproxy/server.go:202-203,772,844) — no same-pod assumption. The demo's HTTP_PROXY=http://localhost:8081 is config, not a code constraint.
  • mtls/spiffe/SPIRE are optional and only dialed when consumed (main.go:99-114); tls_bridge is decoupled from SPIRE (Fix: Run the TLS bridge without SPIRE (need-driven SPIFFE provider; drop in-process trust self-check) #523).
  • The only same-pod-coupled knob is reverse_proxy_backend (config.go:364, no default) — the inbound URL the reverse proxy forwards to, conventionally http://localhost:<agent-port> because the sidecar shares the agent netns. In a separate pod it points at the agent's Service.

Proposed scope

  • Phase 1 — packaging + contract: a Helm chart (and/or kustomize) that deploys authbridge-proxy as a Deployment/StatefulSet + Service. Document the operator-independent deployment contract the proxy needs: the config.yaml, the CA dir mount (pre-provisioned Secret), listener ports, and how the agent workload is wired (its HTTP_PROXY/trust env + the NetworkPolicy). Goal: anyone (Helm, another operator, a human, DAM's controller) can satisfy the contract without the kagenti-operator.
  • Phase 2 — separate-pod egress-gateway topology: NetworkPolicy templates (default-deny egress + allow-only-gateway), the reverse_proxy_backend→Service change, and a pre-provisioned cert-manager CA whose ca.crt is mounted into the agent's trust store (the operator does this today; provide a non-operator path — a templated cert-manager Certificate + mounts in the chart).
  • Phase 3 — docs + a named topology: document "egress gateway (separate pod)" as a first-class supported topology alongside sidecar, including the security trade-offs (CNI dependency, extra pod + hop) and when to choose which.

Design notes / gotchas

  • Egress-only vs full proxy. A separate egress-gateway naturally covers only the forward-proxy (outbound) function. The reverse-proxy (inbound) jwt-validation function is co-located-with-agent by nature and may stay a sidecar or become a distinct inbound gateway. Scope the egress gateway first; don't try to relocate the inbound path in the same step.
  • Ephemeral CA won't work cross-pod. NewEphemeralSource (tlsbridge/ca.go:36) generates a CA at boot — the agent can't pre-trust it across a pod boundary. Separate-pod TLS bridge requires a pre-provisioned CA (cert-manager FileSource) so ca.crt can be mounted into the agent ahead of egress.
  • Decrypted-traffic-stays-in-gateway is a bonus property: with the proxy in its own pod, decrypted bodies and the session API (forceLocalhost, config.go:516-528) never leave the gateway pod.
  • NetworkPolicy is CNI-dependent. The agent-inaccessible outer layer only exists if the CNI enforces egress NetworkPolicy (Cilium/Calico yes; some CNIs degrade to iptables-only parity). Document this.
  • Capture options in separate-pod: cooperative HTTP_PROXY (works today) + NetworkPolicy pinning the only reachable destination to the gateway. Optionally an iptables init in the agent pod for defense-in-depth (DAM does both).

Open questions

  • New mode: enum value for the gateway, or is separate-pod purely a deployment topology of proxy-sidecar (since the data path is identical)? Leaning topology-only + a forward-proxy-focused preset.
  • Helm vs kustomize vs both, and where it lives (this repo's deploy/ — which has none today).
  • Per-agent gateway (1:1, DAM-style, strong isolation, heavy) vs shared namespace gateway (cheaper, weaker isolation) — support one or both?
  • How much of the operator's cert-manager Certificate/Issuer provisioning should be templated into the chart vs left to the user?

References


Assisted-By: Claude Code

Metadata

Metadata

Assignees

Labels

IdentityIssues related to agent identity attestation, SPIFFE, and Authorization bearer tokensenhancementNew feature or request

Type

No type

Fields

No fields configured for issues without a type.

Projects

Status
New/ToDo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions