Feasibility/design: SSH bridge for AuthBridge (proxy-sidecar mode) — guardrails, authz, audit for agent SSH egress

## Summary

Investigate / design an **SSH bridge** for AuthBridge — outbound SSH interception in **proxy-sidecar mode only** — as a sibling to the recently-merged TLS bridge (#522, #523). The goal is to bring SSH egress under the same guardrail / authz / audit model AuthBridge already applies to HTTP(S), and to close the SSH-shaped hole in the egress story.

## Motivation

Agentic workloads emit a meaningful amount of SSH that AuthBridge currently treats as an opaque tunnel:

| Agentic workload | SSH it emits | Guardrail / authz / audit value |
|---|---|---|
| **SRE / infra-ops / "fix-my-server" agent** | interactive `shell` + `exec` into VMs/bastions | Strongest case: SSH shell = arbitrary RCE with no other choke point. Command allow/deny, per-host authz (staging vs prod), full command/keystroke audit + replay. PAM/bastion-for-agents. |
| **Config-management agent (Ansible/Salt)** | heavy `exec` + file copy (Ansible transport *is* SSH) | Restrict hosts/playbooks, block destructive modules, audit every module run. |
| **CI/CD & deploy agents** | `ssh deploy@host '…'`, rsync/scp | Authz on deploy targets, audit what shipped where. |
| **Data-movement / backup agents** | SFTP/SCP `subsystem` | DLP — block exfil of sensitive paths, size caps, file/direction audit. |
| **Network-automation agents** | `exec` to routers/switches/firewalls | Command guardrails on high-blast-radius config; per-device authz. |
| **Any agent using SSH tunneling** (incl. compromised) | `direct-tcpip` / SOCKS / reverse shell | Egress-completeness (see below). |
| **Git agents** | `git-upload-pack`/`receive-pack` over `exec` | Repo allowlist, block force-push, secret-scan on push. |

Two arguments are platform-level, not workload-specific:

1. **Egress-completeness / closing the bypass.** AuthBridge's egress controls (TLS bridge, HTTP_PROXY, iptables L7 filtering) have a hole: a sophisticated or compromised agent opens **one** SSH connection and tunnels everything over `direct-tcpip`, defeating the HTTP-layer controls. Today port 22 is blind-tunneled, so the hole is open. (Relates to threat-model gap C5 — blind-tunnel.)
2. **Credential substitution = the SSH analog of OBO.** SSH authenticates with keys at connection setup, not per-request headers, so there is no `Authorization` header to inject. The value-add instead is: the bridge mints a **short-lived SSH cert scoped to allowed principals/hosts per session**, so the agent never holds a standing SSH key — the same "agents never hold downstream creds" property OBO gives HTTP.

**Caveat (don't oversize the surface):** much "remote access" is actually HTTPS, not SSH, and already falls under the TLS bridge — `kubectl exec` (SPDY/websocket over HTTPS), EC2 Instance Connect / GCP OS Login (API-mediated), cloud-shell. The genuinely-SSH surface is VM/bastion shell, Ansible, rsync/SFTP, network gear, and tunneling.

## Feasibility (vs the TLS bridge)

Honest assessment: an SSH bridge is **meaningfully more work** than the TLS bridge — the two things that made the TLS bridge cheap do **not** transfer. SSH multiplexes channels, and each can carry a different guardrail:

- `session` + `exec` → one-shot remote command (git, ansible, deploy)
- `session` + `shell`/`pty-req` → interactive shell
- `session` + `subsystem` → SFTP/SCP
- `direct-tcpip` / `forwarded-tcpip` → port forwarding / tunneling

**Reusable (cheap):**
- **Connection capture** — in `enforce-redirect` mode the iptables catch-all (`proxy-init/init-iptables.sh:336`) already redirects all TCP incl. port 22 into `HandleTransparentConn`; SSH is captured today and just blind-tunneled. (The legacy `redirect`/envoy chain explicitly `RETURN`s :22 — another reason to scope this to proxy-sidecar mode.)
- **Config/cmd plumbing** — mirror the TLS-bridge pattern: an `SSHBridge *SSHBridgeConfig` pointer + `Validate()` + ~40 lines of cmd glue + one `fpSrv.X = engine` assignment, feature-flagged off by default.
- **Decision/skip machinery** — port-classify + runtime skip-set + fall-open is transport-agnostic; swap `looksLikeTLSRecord` for an `SSH-2.0-` banner sniff.
- **Operator CRD mode field** — cheap, reuses the existing `AuthBridgeMode`/`MTLSMode` slot.

**Does NOT carry over (the real cost):**
1. **Zero pipeline reuse** — the TLS bridge's biggest free lunch. After TLS termination the stream is HTTP, so the existing outbound pipeline runs unchanged (OBO/`Authorization` injection, redaction, reject). SSH is not HTTP: `terminator.go` + `serve.go` + the HTTP handler get replaced by a from-scratch SSH channel-relay engine (~200–400 LOC, protocol-nuanced).
2. **Trust model mismatch** — TLS injects a forged **CA** into the agent trust store (`NODE_EXTRA_CA_CERTS`/`SSL_CERT_FILE`). SSH pins **per-host keys** in `known_hosts` (TOFU). Interception requires an OpenSSH-CA `@cert-authority` entry (needs SSH-cert host keys, not the x509 `minter.go` output) or disabling host-key checking (defeats the purpose). `tlsbridge/ca.go` does not map.
3. **Auth-substitution design** — agent→bridge vs bridge→upstream are different credentials; new design with no TLS-side analog, plus a new per-agent credential reconciler + secret shape on the operator side.

**Sizing:** same ~60 LOC config/cmd and free capture, but a from-scratch engine + harder trust-injection + new auth-substitution design ⇒ roughly **2–3× the authbridge-side work** of the TLS bridge, and a harder operator/trust story. Crypto primitives exist (`golang.org/x/crypto/ssh`: `NewServerConn`/`Dial`), so it is feasible, not "similar effort."

## Proposed Phase 1 scope

Anchor on the highest-value, lowest-ambiguity slice:

1. **SSH channel-type policy** (no termination required first): on captured :22 connections, allow `exec`/`shell` to approved hosts and **hard-block `direct-tcpip` port-forwarding** — closes the egress bypass cheaply.
2. **`exec`-command policy + audit** for the ops/Ansible/deploy agents (command allow/deny + structured audit log).
3. **Short-lived SSH-cert credential substitution** (the OBO analog) — agent never holds a standing key.

Git push/pull guardrails are real but the weakest standalone justification — defer.

## Open questions

- Termination vs metadata-only: how much can be done by inspecting channel-open/`exec` requests without fully terminating + re-originating?
- Host-key trust injection: OpenSSH-CA `@cert-authority` into agent `known_hosts` — which agent runtimes/images can we support, and how does the operator provision it?
- Bridge→upstream credential store + per-agent reconciler shape.
- Interaction with SPIRE/SPIFFE identity for minting the agent's SSH cert.

## References

- TLS bridge: #522 (Phase 1, engine wired into forward proxy), #523 (SPIRE decouple)
- Capture path: `authlib/listener/forwardproxy/{server.go,transparent.go}`, `proxy-init/init-iptables.sh`
- TLS-bridge engine (the template): `authlib/tlsbridge/`

---
*Assisted-By: Claude Code*


Agentic workload	SSH it emits	Guardrail / authz / audit value
SRE / infra-ops / "fix-my-server" agent	interactive `shell` + `exec` into VMs/bastions	Strongest case: SSH shell = arbitrary RCE with no other choke point. Command allow/deny, per-host authz (staging vs prod), full command/keystroke audit + replay. PAM/bastion-for-agents.
Config-management agent (Ansible/Salt)	heavy `exec` + file copy (Ansible transport is SSH)	Restrict hosts/playbooks, block destructive modules, audit every module run.
CI/CD & deploy agents	`ssh deploy@host '…'`, rsync/scp	Authz on deploy targets, audit what shipped where.
Data-movement / backup agents	SFTP/SCP `subsystem`	DLP — block exfil of sensitive paths, size caps, file/direction audit.
Network-automation agents	`exec` to routers/switches/firewalls	Command guardrails on high-blast-radius config; per-device authz.
Any agent using SSH tunneling (incl. compromised)	`direct-tcpip` / SOCKS / reverse shell	Egress-completeness (see below).
Git agents	`git-upload-pack`/`receive-pack` over `exec`	Repo allowlist, block force-push, secret-scan on push.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feasibility/design: SSH bridge for AuthBridge (proxy-sidecar mode) — guardrails, authz, audit for agent SSH egress #528

Summary

Motivation

Feasibility (vs the TLS bridge)

Proposed Phase 1 scope

Open questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Feasibility/design: SSH bridge for AuthBridge (proxy-sidecar mode) — guardrails, authz, audit for agent SSH egress #528

Description

Summary

Motivation

Feasibility (vs the TLS bridge)

Proposed Phase 1 scope

Open questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions