Add OAuth client for outbound remote MCP sources#68
Add OAuth client for outbound remote MCP sources#68mgoldsborough wants to merge 4 commits intomainfrom
Conversation
Closes the gap where workspace bundles with `{ url }` refs against
OAuth-gated remote MCP servers (Reboot under `rbt dev`, ChatGPT Apps,
hosted Claude.ai MCP endpoints) returned `401 invalid_token` and
failed to start. The MCP TS SDK already supports the OAuth client
flow via `OAuthClientProvider`; we just weren't passing one.
- WorkspaceOAuthProvider: file-backed `OAuthClientProvider` scoped per
`(workspace, serverName)`. Persists DCR client info, tokens, and
PKCE verifier under `.nimblebrain/workspaces/<wsId>/credentials/
mcp-oauth/<serverName>/` (mode 0o700). Follows the authorize-redirect
chain up to 10 hops and detects when the chain lands at our own
callback — Reboot's `Anonymous` dev flow completes headlessly this
way. Non-self-target flows throw `InteractiveOAuthNotSupportedError`,
staging real browser OAuth for a follow-up.
- oauth-flow-registry: process-local `Map<state, PendingFlow>` bridging
the provider to the HTTP callback route. Extension point for the
interactive flow.
- /v1/mcp-auth/callback route: `GET` handler that resolves pending
flows by state. Unauthenticated by design (state param guards
against unsolicited codes).
- createRemoteTransport: optional `authProvider` parameter; static
`transport.auth` takes precedence.
- McpSource.start(): retry exactly once on `UnauthorizedError` after
awaiting the provider's pending flow and calling
`transport.finishAuth(code)`. Extracted helpers (connectWithTimeout,
cleanupOnStartFailure) keep the retry readable.
- startBundleSource: instantiate WorkspaceOAuthProvider for url-ref
bundles without static auth; threads `opts.wsId` + `opts.workDir`
with sensible defaults.
Tests: 10 unit tests for the provider (persistence roundtrips,
headless self-target detection via multi-hop redirect, interactive
rejection) plus 4 for the flow registry.
Critical fixes:
- Cross-tenant OAuth credential leak closed. `lifecycle.ts::installRemote`
now threads `wsId` + `workDir` into `startBundleSource` (previously
dropped), and `startup.ts`'s URL-branch hard-errors on missing `wsId`
when no static auth is configured. The old `?? "ws_default"` fallback
would have silently pooled OAuth tokens across every workspace under
the default id; named bundles already threw, URL bundles silently
diverged. Matches the named-bundle precedent at the credential
boundary.
- SSRF vector on the OAuth authorize chain closed. `validateBundleUrl`
previously ran only on `ref.url`; a compromised remote MCP server
could hand us an authorize URL pointing at AWS IMDS / RFC1918 / our
own loopback, and we'd probe it directly. Every hop through
`redirectToAuthorization`'s redirect loop now passes through
`validateBundleUrl`; blocks surface as explicit
`[workspace-oauth-provider] SSRF block …` errors so operators see the
real cause instead of the generic "interactive not supported"
fallthrough. `allowInsecureRemotes` is propagated into the provider
so local-dev (Reboot on localhost:9991) still works when the flag
is explicitly set.
- Callback self-match normalized. Comparing `next.origin + next.pathname`
as a raw string was flaky: configured `callbackUrl` with a trailing
slash, or a server echoing an explicit default port / uppercase
hostname, would fail the equality check and silently fall through to
InteractiveOAuthNotSupportedError. Provider now precomputes a
canonical form (lowercased origin, pathname stripped of trailing `/`)
at constructor time and compares against that.
Also: discovered and fixed a latent retry-path bug while writing the
integration test QA requested (W5). `StreamableHTTPClientTransport`
rejects a second `start()` on the same instance, so my original
"connect → catch 401 → finishAuth → connect on same transport" would
have failed whenever the retry actually fired. Matches the SDK's own
`simpleOAuthClient` pattern of new-transport-per-attempt: retry now
tears down the first transport+client, rebuilds via
`createRemoteTransport`, and reconnects on the fresh instance. The
real Reboot flow happened to work because tokens landed on disk via a
prior partial run and the first connect succeeded with cached tokens;
the test surfaced the gap.
Warnings applied:
- Bun.serve-based cases moved from `test/unit/workspace-oauth-provider`
to `test/integration/workspace-oauth-provider`, per CLAUDE.md's
classification rule.
- `test/integration/mcp-source-oauth-retry.test.ts` added with an
end-to-end Bun.serve mock of the OAuth + MCP stack (discovery, DCR,
authorize, token, /mcp 401-then-200). Directly exercises the retry
path the PR introduces.
- `connectWithTimeout` timer leak fixed. The `setTimeout` handle is
now captured and cleared in a finally; previously every successful
connect leaked a 15–30s timer, doubled under the retry path.
- `rename` moved to the top static import in `workspace-oauth-provider`
— dynamic `await import("node:fs/promises")` on every token write
was out of step with the rest of the file.
- `mcp-auth.ts` callback route adds `Cache-Control: no-store` +
`Pragma: no-cache`. Defense-in-depth against intermediate caches
storing the success/failure HTML with `?code=` in the URL.
Warning #8 (token_endpoint_auth_method): acknowledged, deferred.
Switching between "none" and "client_secret_basic" based on DCR
response is not a one-liner — it requires implementing
`OAuthClientProvider.addClientAuthentication` to inject the credential
on token requests post-hoc. Noted as a TODO in a follow-up ticket;
shipping "none" unconditionally is correct for Reboot Anonymous and
works for any AS that issues refresh tokens to public clients (the
ext-apps examples do).
Suggestions applied:
- oauth-flow-registry gains a 15-minute TTL per registration (also the
boundary documented in INTERACTIVE_OAUTH_UI.md). Timer handles are
cleared on resolve/reject/_clearAll so late fires can't double-settle
a promise. `timeout.unref?.()` so a stuck flow's timer doesn't keep
CLI invocations alive.
- `mcp-source` cast `this.transport as StreamableHTTPClientTransport`
replaced with a narrower `Transport & { finishAuth?: (...) }` —
SSE also has finishAuth, the old cast was a narrowing lie.
- Redundant `this.dead = false` in retry path dropped — happy path
leaves it unset, matching the default, one code path for "started."
- `NB_API_URL` startup warning: if unset when a URL-ref bundle is
being wired, log a one-time warning that localhost:27247 is dev-only
and prod deployments behind a proxy need the env var set.
Suggestion #11 (replace hand-rolled escape with Hono `html` template)
deliberately skipped — the escape helper is minimal, auditable, and
doesn't depend on Hono's tagging API; swap isn't worth the churn.
Full verify green: 1874 unit + 117 web + 378 integration + 16 smoke.
|
Thanks for the thorough review. Fix-up landed in db8dfbf. PR #69 rebased onto the updated base. AdjudicationCritical #1 (wsId drop) — Confirmed and fixed. Critical #2 (SSRF) — Confirmed and fixed. Critical #3 (callback match) — Confirmed and fixed. Precomputed canonical form at constructor time (lowercased origin, pathname stripped of trailing Bonus bug found writing W5's test: my original retry-path was broken — Warnings 4, 5, 6, 7, 9 — all applied:
W8 (token_endpoint_auth_method) — acknowledged, deferred with TODO. "One-line check after DCR" understates it — switching between Suggestions 10, 12, 13, 14 — all applied:
S11 (Hono html template) — deliberately skipped. Hand-rolled escape is minimal and auditable; the swap adds Hono API surface coupling for cosmetic gain. Verify1874 unit + 117 web + 378 integration + 16 smoke. Clean across the stack. Process noteYour review caught two landable bugs my PR didn't — cross-tenant leak ( |
Two-part follow-up to the QA fix-up's credential-boundary guard: - Positive + negative integration tests for `startBundleSource`'s URL branch. (a) missing wsId + no static auth throws `requires opts.wsId`; (b) missing wsId WITH static auth starts cleanly — confirms the wsId requirement is scoped exactly to the path that would otherwise construct an OAuth provider. A future refactor that weakens the check to a default now fails CI. - `products/nimblebrain/code/CLAUDE.md`'s "Workspace Isolation" section gains a one-line rule: hard-error on missing `wsId` at credential boundaries, don't silently default. Sits next to the existing `requireWorkspaceId()` guidance — same rule family. The rule is enforceable by the test; the doc is a cross-reference for humans reading the section. Not saved to session memory — it's a codebase convention, versioned with the code.
Companion to 8666a75 — that commit caught the test but not the docs update because `git add CLAUDE.md` resolved the symlink to AGENTS.md and I didn't notice the status-line divergence. One-liner in the existing "Workspace Isolation" section pointing at the rule.
Summary
{ url }refs against OAuth-gated remote MCP servers returned401 invalid_tokenand failed to startWorkspaceOAuthProvider(file-backed per(workspace, serverName)),/v1/mcp-auth/callbackroute, and a one-shot retry inMcpSource.start()— the MCP TS SDK'sOAuthClientProviderpath is now wired end-to-endAnonymousdev provider, client-credentials-style auth) by probing the authorize chain up to 10 hops and resolving in-process when it lands at our callback. Interactive browser OAuth (Granola, Claude.ai hosted) fails fast with a clearInteractiveOAuthNotSupportedError— staged for a follow-up iteration so the extension point is visible, not hidden behind TODOsSee the commit message for the full per-file breakdown.
Test plan
bun run verifypasses on this branch alone (verified locally; 1881 unit + 123 web + 370 integration + 16 smoke, all green)remote-integration.test.tsandremote-lifecycle.test.tsstill pass — covers bundles with statictransport.auth(which now bypasses the provider) and url-only bundles{ url: "http://localhost:9991/mcp" }with no static auth, restart NB — bundle registers, tokens persist underworkspaces/<ws>/credentials/mcp-oauth/reboot-hello/, tools callable from the agent