Skip to content

feat(auth): M1 multi-tenant auth foundation (#81)#86

Merged
viktor-shcherb merged 2 commits intomainfrom
dev/81-multi-tenant-auth-foundation
May 7, 2026
Merged

feat(auth): M1 multi-tenant auth foundation (#81)#86
viktor-shcherb merged 2 commits intomainfrom
dev/81-multi-tenant-auth-foundation

Conversation

@viktor-shcherb
Copy link
Copy Markdown
Member

Summary

Replaces the single shared MURMUR_TOKEN with a per-publisher namespace and
four-token model. Existing demo deploy continues to work via env-grandfathered
MURMUR_TOKEN.

  • Publisher namespace (publishers, publisher_tokens,
    publisher_secrets, publisher_audit_events); pipelines scoped by
    publisher_id FK with back-fill default to the demo seed.
  • Four-token model: admin, runner, webhook_signing,
    subcommand_bearer. admin/runner stored as SHA-256 hash; the two
    outgoing-use secrets stored plaintext (Murmur needs the cleartext to
    sign / inject).
  • Auth zoningbootstrapAuth(MURMUR_BOOTSTRAP_TOKEN) on
    POST /publishers; publisherAuth(db) on /pipelines*, /runs*,
    /publishers/me*; bearerAuth(MURMUR_TOKEN) retained on /work*
    and /mcp* (agent plane unchanged in M1).
  • Cross-publisher isolation — UPSERT on pipelines scoped via
    ON CONFLICT WHERE publisher_id; runs reads JOIN through
    pipelines.publisher_id; cross-publisher reads return 404 (no
    information leak).
  • Per-publisher webhook bearerdeliverWebhook resolves the
    run's publisher's subcommand_bearer (the demo seeded equal to
    MURMUR_TOKEN; new publishers get a random value). MURMUR_TOKEN no
    longer leaks across publishers via webhook delivery.
  • Webhook HMAC — additive X-Murmur-Signature: t=<unix>,v1=<hmac>
    header; legacy bearer retained for backward compat (drop in M10).
  • task_tool dispatch — switches from shared MURMUR_TOKEN to
    per-publisher subcommand_bearer via LEFT JOIN on the claim lookup.
  • Admin APIPOST /publishers, GET/PATCH /publishers/me,
    POST /publishers/me/tokens/{kind}/rotate,
    DELETE /publishers/me/tokens/{kind}/{id}, GET /publishers/me/audit.
  • Boot-seed — idempotent demo publisher seed; grandfathers
    MURMUR_TOKEN as kinds=['admin','runner']. MURMUR_TOKEN rotation
    between boots is detected — stale grandfather row revoked, fresh row
    inserted; subcommand_bearer rotated in lockstep so task_tool
    dispatch stays in sync with jobseek's shim.
  • Docsdocs/auth.md covers token model, lifecycle, HMAC
    verification (Node + Python samples), demo migration semantics, audit
    vocabulary, and known v1 limitations.

Definition of Done coverage

  • Publisher record + four-token model in storage with migrations
  • All existing endpoints scoped via token claims
  • Token rotation API + audit log
  • Webhook HMAC signing + sample verification doc
  • Demo publisher (jobseek) migrated to new token model with no downtime
  • Documentation in docs/auth.md

Folds in #77 (per-publisher namespace context for run ownership;
extending created_by is M2 scope).

Test plan

  • pnpm typecheck — green
  • pnpm lint — green
  • pnpm grep:all — green
  • pnpm test:unit — 380 tests pass (28 new tests for tokens,
    bootstrap seed, publisher_auth middleware, kinds_json codec,
    admin API, webhook HMAC, cross-publisher isolation)
  • Manual smoke against deployed Murmur after merge: existing
    jobseek start-run.ts flow still triggers a run; webhook
    delivery still received with both Authorization: Bearer
    and X-Murmur-Signature headers.

Backward compat for the existing jobseek deploy

The demo publisher's MURMUR_TOKEN is grandfathered as
kinds_json=["admin","runner"], so:

  • POST /pipelines (CI) — accepts MURMUR_TOKEN as admin ✓
  • POST /pipelines/{id}/runs (form / start-run.ts) — accepts MURMUR_TOKEN
    as runner ✓
  • Webhook delivery — still sends Authorization: Bearer <MURMUR_TOKEN>
    via the seeded subcommand_bearer = MURMUR_TOKEN
  • task_tool dispatch — sends Authorization: Bearer <MURMUR_TOKEN> via
    the same seeded value ✓

MURMUR_TOKEN rotation between boots is supported: the grandfather
token row AND the subcommand_bearer are both rotated to match the
new env value, keeping all four paths above in sync.

Known follow-ups (filed inline as comments)

  • Per-publisher pipeline namespacing (composite (publisher_id, id)
    PK) — deferred until multiple non-demo publishers exist.
  • Encryption-at-rest for webhook_signing_secret and
    subcommand_bearer — M2 scope.
  • DNS rebinding defense in URL validation — v1 validates at
    registration only.
  • last_used_at telemetry — deferred to avoid writer-lock contention
    on the auth hot path.
  • Bootstrap rate limiting — MURMUR_BOOTSTRAP_TOKEN should be
    operator-only and rotated independently.

Closes #81
🤖 Generated with Claude Code

viktor-shcherb and others added 2 commits May 7, 2026 14:52
…model, HMAC webhooks

Replaces the single shared MURMUR_TOKEN with a per-publisher namespace and
four-token model (admin, runner, webhook_signing, subcommand_bearer).
Existing demo deploy continues to work via env-grandfathered MURMUR_TOKEN.

Schema (migration 0002):
- publishers, publisher_tokens, publisher_secrets, publisher_audit_events
- pipelines.publisher_id (NOT NULL DEFAULT pub_demo_seed); back-fills existing rows
- Demo seed inserted in-migration so the FK back-fill default is satisfied

Auth zoning (src/server.ts):
- POST /publishers gated by MURMUR_BOOTSTRAP_TOKEN
- /pipelines*, /runs*, /publishers/me* gated by publisherAuth(db) + per-route requireKind
- /work*, /mcp* keep legacy bearerAuth (agent plane unchanged in M1)

Cross-publisher isolation:
- pipelines UPSERT scoped via ON CONFLICT WHERE publisher_id; cross-publisher slug
  collision returns 409
- runs/runs-list JOIN through pipelines.publisher_id; cross-publisher reads → 404

Per-publisher webhook bearer:
- Webhook delivery resolves the run's publisher's subcommand_bearer; demo seeded
  to MURMUR_TOKEN preserves jobseek's accept handler. Cross-publisher leak of
  MURMUR_TOKEN closed.
- Additive X-Murmur-Signature: t=<unix>,v1=<hmac> header; bearer retained for
  backward compat (drop in M10).

task_tool dispatch:
- Resolves the run's publisher's subcommand_bearer (via JOIN on LOOKUP_CLAIM_SQL);
  per-tenant credential, MURMUR_TOKEN never leaks to subcommand endpoints of
  hostile publishers.

Boot-seed (src/db/bootstrap.ts):
- Idempotent demo publisher seed; grandfathers MURMUR_TOKEN as
  kinds_json=["admin","runner"]; subcommand_bearer rotated in lockstep with
  MURMUR_TOKEN; webhook_signing_secret generated random.

Admin API (POST /publishers, GET/PATCH /publishers/me, tokens rotate/delete,
audit) with rotation atomicity and kind verification on revoke (DELETE
/tokens/runner/<admin-row-id> can no longer revoke an admin row).

Documentation in docs/auth.md with Node + Python verifier samples.

Closes #81
Folds in #77 (per-publisher namespace context)

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
CI's coverage gate flagged `src/api/**/*.ts` branches at 73.35% (threshold
75%) — admin.ts at 60.97%. Adds tests for the M1 kind-verification fix
(DELETE /tokens/runner/<admin-id> → 404) and the rotation-independence
property (rotating runner does not revoke the multi-kind grandfather row).

Lifts admin.ts branch coverage from 60.97% → 77.22%, and the
src/api/publisher folder from 70.29% → 76.92%.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@viktor-shcherb viktor-shcherb merged commit e42e087 into main May 7, 2026
2 checks passed
viktor-shcherb added a commit that referenced this pull request May 7, 2026
…LL DEFAULT (#87)

The post-merge deploy of M1 (#86) failed at migration 0002 with
`Cannot add a REFERENCES column with non-NULL default value` and the
container went into a crash loop, taking https://murmur.colophon-group.org
down (502).

Root cause: SQLite forbids `ALTER TABLE ADD COLUMN ... NOT NULL DEFAULT
'<value>' REFERENCES ...` when foreign_keys=ON AND the table has
existing rows. The local test suite hit it against a fresh `:memory:`
DB (no rows), so the ALTER passed and the gate stayed green; the
production `pipelines` table held the demo's existing row, so the
ALTER tripped the rule.

Fix: split the ALTER into two statements:

  ALTER TABLE pipelines ADD COLUMN publisher_id TEXT REFERENCES publishers(id);
  UPDATE pipelines SET publisher_id = 'pub_demo_seed' WHERE publisher_id IS NULL;

The schema-level NOT NULL guarantee is traded for an application-level
invariant — `mountPipelineRoutes` always supplies `publisher_id` from
`c.var.publisher_id`. Test fixtures that INSERT directly into
`pipelines` now include `publisher_id = 'pub_demo_seed'` explicitly
(the previous DEFAULT was supplying it).

A future migration can rebuild the table to recover schema-level
NOT NULL once the migration runner supports a `PRAGMA foreign_keys=OFF`
toggle (the toggle can't go inside a single BEGIN IMMEDIATE / COMMIT).

Smoke-tested locally: identical `ALTER ... REFERENCES ... ; UPDATE`
sequence applied successfully against a SQLite DB with an existing
`pipelines` row (mirroring the failing prod state). All M1 tests + the
grandfather-token + subcommand_bearer seed paths still pass (391 tests
green).

Closes the M1 deploy outage; the migration runner's BEGIN IMMEDIATE /
COMMIT wrapping rolls the failed migration back atomically, so the
production DB schema is unchanged and this fix can re-apply cleanly.

Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

M1: Multi-tenant auth foundation (machine plane)

1 participant