Skip to content

Workspace lifecycle refactor: provision at identity, strict resolver, tighten placement API#67

Merged
mgoldsborough merged 6 commits intomainfrom
feat/workspace-lifecycle-refactor
Apr 24, 2026
Merged

Workspace lifecycle refactor: provision at identity, strict resolver, tighten placement API#67
mgoldsborough merged 6 commits intomainfrom
feat/workspace-lifecycle-refactor

Conversation

@mgoldsborough
Copy link
Copy Markdown
Contributor

Summary

Fixes the HQ-tenant bug Mario hit on first login (blank workspace selector + duplicate apps in sidebar) and removes the bug class that allowed it.

  • Phase 1 — Provision at identity boundary. ensureUserWorkspace called from each identity provider's first-login hook (WorkOS provisionUser, OIDC verifyRequest, Dev ensureDefaults). Idempotent and concurrent-safe. Invariant "authenticated user has ≥1 workspace" holds at the identity boundary — no more half-provisioned user state between auth and first data request.
  • Phase 2 — Strict resolver + permissive bootstrap. resolveWorkspace becomes pure selection: header or conversation-derived wsId, else 400. Both the silent single-workspace default and the data-path auto-provision are gone. /v1/bootstrap is exempt from requireWorkspace middleware — it's the discovery endpoint and does its own local resolution (honors X-Preferred-Workspace, defaults to first membership).
  • Phase 3 — Tighten PlacementRegistry API. Drop all() and forSlot(). Only forWorkspace(wsId) remains, returning ambient + scoped merged. The original placement-leak bug becomes unwritable, not merely caught. workspace-access.ts deleted (folded into registry).

Three phases, three commits for clean review/bisect. Net +714/−428 across 34 files; production-code surface shrinks by ~100 LOC after Phase 3's deletions.

Test plan

  • CI passes (static, unit, integration — 37 pre-existing lifecycle failures on main are unchanged; zero new failures from this branch)
  • Deploy to HQ tenant staging; verify Mario-equivalent first-login user sees populated workspace selector and only their workspace's apps
  • Verify external MCP clients with X-Workspace-Id header continue working
  • Verify external MCP clients without the header now 400 with a clear pointer message (straight cut, no grace period per product decision)
  • Spot-check a returning user across deploy: identity exists, workspace exists, bootstrap returns populated data
  • Verify no duplicate workspaces created under concurrent first-login race (React StrictMode double-mount, bootstrap-parallel-with-tool-call)

Move workspace provisioning from the data path (resolveWorkspace
middleware, fired on the first tool call) to the identity path
(provisionUser in each adapter, fired once on first login). Establishes
the invariant "authenticated user has ≥1 workspace" at the identity
boundary so request resolvers can treat it as a hard requirement.

Fixes a timing bug where GET /v1/bootstrap returns before any
workspace exists for a first-login user, leading to a blank workspace
selector and unfiltered placements leaking apps from other workspaces.

New src/workspace/provisioning.ts::ensureUserWorkspace is idempotent
and concurrent-safe: a deterministic slug + WorkspaceConflictError /
MemberConflictError reconciliation guarantees exactly one workspace
per user even under racing first-login requests. Replaces the old
timestamp-suffix fallback in resolveWorkspace which could create
duplicate workspaces under StrictMode or retries.

Wired into WorkOS (exchangeCode -> provisionUser), OIDC (first valid
JWT -> verifyRequest), and Dev (ensureDefaults). resolveWorkspace's
auto-provision branch stays for now as a safety net; removed in Phase 2.
resolveWorkspace (src/api/auth-middleware.ts) no longer picks defaults
or auto-provisions. Both behaviors were footguns:

- Silent single-workspace resolution: a client that "just worked" with
  one workspace would 400 the next day when the user was added to a
  second. Honest contract: the caller names the workspace.
- Auto-provisioning on zero: side effect on the data path. Phase 1
  moved this to the identity boundary; now the fallback is removed.

The resolver now requires either an X-Workspace-Id header or a
conversation-derived wsId (chat paths, tied to stored state). Missing
→ 400 workspace_required, with a message pointing at the bootstrap
endpoint or Settings → Profile → MCP Connection.

handleBootstrap (GET /v1/bootstrap) does NOT mount requireWorkspace
middleware — it's the one endpoint where a client legitimately cannot
yet know a wsId. The handler does local permissive resolution: honor
X-Preferred-Workspace hint when it matches a membership, else pick the
user's first workspace. Placement filter is now unconditional — if
userWorkspaces is empty at bootstrap time, the Phase 1 invariant has
broken and we 500 loudly instead of leaking placements from every
workspace on the tenant (which was the original Mario bug).

Test changes: integration tests that previously relied on silent
single-workspace resolution now send the X-Workspace-Id header
explicitly. Three blocks in api-integration.test.ts were using the
default work dir (shared state across parallel suites, masked by the
silent resolver) — added isolated work dirs. No behavior regressions
against the pre-existing lifecycle test failures on main.
Reduce the registry's read surface to a single method, forWorkspace(wsId),
that returns ambient (no wsId) plus scoped-to-this-workspace entries
merged and sorted. Remove all() and forSlot().

The removed methods were footguns: in a multi-tenant system no
legitimate caller wants placements unrelated to a workspace. all()
was the exact API that let the original Mario/HQ-tenant bug compile —
handlers could fetch every tenant's nav entries with one call. With
forWorkspace as the only accessor, the leak becomes unwritable.

The concept of "global" vs "workspace-scoped" placements was also
wrong in spirit. Home/Conversations/Files/Settings aren't global —
they operate inside a workspace; they just happen to be provided by
the platform rather than by an installed bundle. Renaming to "ambient"
(always present within a workspace) makes the semantics honest.

Collateral deletions:
- src/runtime/workspace-access.ts — filterPlacementsForWorkspace
  folded into PlacementRegistry.forWorkspace. Deleted.
- test/unit/runtime/workspace-access.test.ts — contract now covered
  by placement-registry.test.ts. Deleted.

handleBootstrap and handleShell become one-line readers. handleShell
is also tightened to require workspaceId (always set by the
requireWorkspace middleware that already gates the route); the silent
"no workspaceId → return everything" fallback is gone.

Zero new test failures against the pre-existing 37 lifecycle-test
failures on main.
Addresses PR review feedback. Phase 1 took workspaceStore as optional
(workspaceStore?: WorkspaceStore) to minimize test-site churn, but
that's a silent footgun: a future provider author could forget to
pass it, provisioning would no-op, and the user would hit a confusing
400 on their first data request rather than a clear error at wiring
time.

The API principle is the same one Phase 3 applied to PlacementRegistry
(remove all() and forSlot()): make wrong code unwritable. Required
parameter is the strongest form of the guarantee. Production wiring
(createIdentityProvider factory, server.ts dev-mode construction)
already passes workspaceStore — only tests were relying on the
optional form.

Also removes the now-dead `if (this.workspaceStore)` guards inside
each provider's provisioning call site, since the type now excludes
null.

userStore stays optional on the WorkOS provider — documented in-line
why: WorkOS itself is the source of truth for users (managedUsers:
true), and the local profile is a preferences cache. workspaceStore
does not have that asymmetry.
Addresses three warnings and two suggestions from code review.

W1+W2: self-healing workspace provisioning on every verifyRequest.
  Move ensureUserWorkspace out of first-login-only gates. OIDC, WorkOS
  (both AuthKit and regular paths), and dev now call it on every
  successful verifyRequest. Happy path is one filesystem read (same
  tier as existing per-request work) and no writes.

  Fixes:
  - Pre-existing users whose workspace was lost (admin deletion,
    partial failure, migration drop) would permanently 500 at
    /v1/bootstrap with no client-side recovery path. They now recover
    on next login.
  - The AuthKit/MCP-OAuth path never routes through exchangeCode, so
    users authenticated via that path had no workspace provisioned.
    They do now.

  Remove the redundant ensureUserWorkspace from WorkOS provisionUser
  (verifyRequest handles it). Rename dev's ensureDefaults →
  ensureUserProfile to reflect that it only seeds the profile.

W3: drop deriveSlug truncation.
  Old code sliced user IDs to 16 chars. For OIDC IDs (usr_oidc_<12
  hex>) that left ~7 hex of entropy and birthday-collided at ~16K
  users. On collision, reconcileConflict silently added both users as
  admins of the same workspace — a cross-tenant leak. WORKSPACE_ID_RE
  allows 64 chars; no reason to truncate. Drop it. Remove the unit
  test that codified the collision behavior as correct. Add a
  regression test asserting different OIDC IDs with shared hex
  prefixes now yield distinct workspaces.

S4+S5: sharpen comments.
  reconcileConflict's !existing branch comment narrowed to the real
  trigger (concurrent delete between create() throw and our re-read).
  ensureUserWorkspace doc tightened to accurately describe what
  WorkspaceStore.create atomically detects vs what it doesn't.

Header unification: delete X-Preferred-Workspace.
  Pre-existing dual-header design ("X-Preferred-Workspace" for
  bootstrap, "X-Workspace-Id" for data endpoints) violated the same
  "API shapes should not require context-specific rules" principle
  Phase 3 applied to PlacementRegistry. Unify on X-Workspace-Id:
  bootstrap reads it as a soft hint (only place the server defaults,
  per Phase 2); data endpoints keep authoritative semantics. Remove
  from CORS allowlist, web client, handler. One header, one name.

AuthKit: new test asserting verifyRequest provisions a workspace on
successful MCP-OAuth auth. Covers the path that has no exchangeCode.

OIDC integration: new self-heal test that deletes a user's workspace
and asserts the next verifyRequest re-establishes the invariant
instead of leaving the user stuck.

Zero unit-test regressions. Integration suite is stable against the
pre-existing 37 lifecycle failures on main. (One intermittent flake —
"10 concurrent authenticated chat requests" — passes in isolation but
can time out under parallel load; exposed by Phase 2 actually reaching
the stream path. Tracked separately, not a behavior regression.)
@mgoldsborough mgoldsborough added the qa-reviewed QA review completed with no critical issues label Apr 23, 2026
resolveWorkspace took a `conversationWorkspaceId?: string` second-fallback
parameter intended for chat paths — the idea being that a request with a
known conversationId could derive workspace from the conversation's
stored wsId instead of requiring X-Workspace-Id.

No caller passed it. requireWorkspace middleware and the /mcp route both
call the three-arg form. The fallback branch was unreachable for the
entire time it existed.

Two unit tests in workspace-context.test.ts exercised the dead parameter
directly ("resolves from conversation's workspaceId when no header",
"header takes precedence over conversation workspaceId"). They verified
behavior no production code ever relied on. Deleted.

Now that X-Workspace-Id is the one-and-only way to name a workspace on
data endpoints (bootstrap handles its own defaulting), this parameter
was just a footgun waiting to grow. Delete it — matches the same
"API shapes make wrong code unwritable" principle applied to
PlacementRegistry.all() and the header unification.
@mgoldsborough mgoldsborough merged commit 1e64adc into main Apr 24, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qa-reviewed QA review completed with no critical issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant