Skip to content

feat(trial): zero-friction URL-to-workspace onboarding MVP#758

Merged
simple-agent-manager[bot] merged 35 commits intomainfrom
sam/trial-onboarding-mvp
Apr 21, 2026
Merged

feat(trial): zero-friction URL-to-workspace onboarding MVP#758
simple-agent-manager[bot] merged 35 commits intomainfrom
sam/trial-onboarding-mvp

Conversation

@simple-agent-manager
Copy link
Copy Markdown
Contributor

@simple-agent-manager simple-agent-manager bot commented Apr 18, 2026

Summary

Implements the zero-friction URL-to-workspace onboarding MVP from idea 01KPGJQ853C44JEREXWEZS1GQ8. Anonymous visitors paste a public GitHub repo URL, watch a live discovery agent analyze it, and get pre-generated suggestion chips that lead into a full SAM workspace after a 2-click login.

Built as a single orchestrated PR via 5 waves (foundation + 4 parallel tracks + integration) against the sam/trial-onboarding-mvp integration branch. Not to be merged to main — this is flagged for @raphaeltm manual review before merge and before production configuration is applied.

cc @raphaeltm — Configuration Checklist Before Merge

Staging (sammy.party) — zero manual steps required

The deploy pipeline provisions + flips everything automatically:

  • TRIAL_CLAIM_TOKEN_SECRET — auto-generated by Pulumi (infra/resources/secrets.ts), stored encrypted in the Pulumi R2-backed state, pushed as a Worker secret by configure-secrets.sh (commit 086f4ded)
  • trials:enabled=true in KV — written by the staging deploy workflow on every run (deploy-reusable.yml + commit b15ca27c removing an invalid --remote flag)
  • TRIAL_LLM_PROVIDER=workers-ai — already wired in wrangler.toml vars
  • TRIAL_MODEL=@cf/meta/llama-3.1-8b-instruct
  • TRIAL_MONTHLY_CAP=1500
  • sam_anonymous_trials sentinel user — seeded via migration 0043

→ Nothing to click on the staging environment. A fresh workflow_dispatch on deploy-staging.yml gives you a working trial surface.

Production (simple-agent-manager.org) — one manual step (the key)

  • Procure Anthropic API key budgeted for trials
  • wrangler secret put ANTHROPIC_API_KEY_TRIAL --env production (separate from platform key)
  • Set TRIAL_LLM_PROVIDER=anthropic, TRIAL_MODEL=claude-3-5-haiku-latest, TRIAL_AGENT_TYPE=claude-code in production vars
  • Set TRIAL_MONTHLY_CAP to your preferred prod cap (default 500)
  • Flip the kill switch when ready: pnpm --filter @simple-agent-manager/api exec wrangler kv key put "trials:enabled" "true" --binding KV --env production
  • Confirm sam_anonymous_trials sentinel user exists on prod D1
  • Confirm trial_counter KV namespace + TrialCounter DO bindings exist in prod wrangler env

Cookies

  • HMAC key for trial fingerprint cookies reuses TRIAL_CLAIM_TOKEN_SECRET (auto-provisioned on staging, manual on prod if desired).

Kill Switches

  • Set KV trials:enabled=false to instantly pause trial creation. /try cleanly falls back to "Trials are paused" — verified on staging.
  • TRIAL_MONTHLY_CAP=0 is also a hard stop.

What Shipped

Wave 0 — Foundation (e253c08e)

  • Shared Valibot schemas (packages/shared/src/trial.ts) for requests, responses, SSE events, idea shape
  • D1 migration 0043: trial_projects, trial_waitlist, sam_anonymous_trials sentinel user
  • Durable Objects: TrialCounter (monthly cap), TrialEventBus (SSE fan-out)
  • HMAC-signed cookie helpers (apps/api/src/services/trial/cookies.ts) for fingerprint (7d) and claim (48h) tokens
  • Kill-switch + cap helpers, discovery prompt template, route stubs

Wave 1 Track A — Backend Lifecycle (4ca29ea6)

  • POST /api/trial/create — validates repo URL, checks kill switch + cap, creates project under sentinel user, starts discovery session
  • GET /api/trial/status — enabled + remaining slots + reset date (public, no auth)
  • POST /api/trial/waitlist — cap-exceeded email capture
  • Cron: month-rollover counter reset + 30d waitlist purge

Wave 1 Track B — Backend Claim + SSE (6ba2e101)

  • GET /api/trial/:trialId/events — SSE stream multiplexed from TrialEventBus DO
  • POST /api/trial/:trialId/claim — post-OAuth handler that transfers the anonymous project from sentinel user to the newly-signed-in user, validates claim cookie
  • OAuth callback integration (claim=<trialId> query param round-trip)
  • Agent wiring: discovery session uses TRIAL_LLM_PROVIDER + TRIAL_MODEL

Wave 1 Track C — Frontend Discovery (e8088705)

  • /try landing page (mobile-first, repo URL input, kill-switch + cap-exceeded fallbacks)
  • /try/:trialId discovery feed consuming the SSE event stream
  • /try/cap-exceeded + /try/waitlist/thanks pages
  • React Router entries wired into App.tsx

Wave 1 Track D — Frontend Chat Gate (1114c8fc)

  • ChatGate component: suggestion chip carousel + textarea + send button
  • LoginSheet modal triggering GitHub OAuth with claim cookie preserved
  • useTrialDraft hook: localStorage persistence of the draft across the OAuth round-trip
  • useTrialClaim hook: post-login auto-submit of the stashed draft to the claimed project's chat

Wave 2 — Integration, Automation, and Live Fix

  1. Merged all 4 Wave 1 tracks into sam/trial-onboarding-mvp. Two conflicts resolved:
    • apps/api/src/env.ts — kept both Track A + Track B TRIAL_* env vars.
    • apps/web/src/components/trial/ChatGate.tsx — kept Track D's real implementation; adapted Track C's TryDiscovery to Track D's TrialIdea contract + onAuthenticatedSubmit callback.
  2. Automated the staging trial secret (commit 086f4ded): added infra/resources/secrets.ts entry that auto-generates TRIAL_CLAIM_TOKEN_SECRET via @pulumi/random, and wired configure-secrets.sh to push it as a Worker secret. No manual wrangler secret put on staging ever.
  3. Automated the staging kill-switch (commits 086f4ded + b15ca27c): added a conditional step to .github/workflows/deploy-reusable.yml that writes trials:enabled=true to KV on every staging deploy (and only staging). Initial attempt used --remote, which is not a valid flag for wrangler kv key put — removed in b15ca27c.
  4. Discovered and fixed a Wave 1 integration bug (commit db1d6332): Track A was persisting new trials to D1 only, while Track B readers (events.ts, claim.ts, trial-runner.ts) look up trials in KV via readTrial(). Every SSE connection 404'd with "Trial not found". Fix mirrors the trial to KV in POST /api/trial/create after the D1 insert, before issuing cookies, with rollback on KV failure (D1 row deleted, TrialCounter slot released). writeTrial() also hardened to skip the trial-by-project: index when projectId is empty (would otherwise collide all pending trials on a single key). Added regression test asserting KV.put("trial:<id>", ...) is invoked on the happy path.

Non-negotiable Constraints Verified

  • Mobile-first (375×667 authoritative) — all four trial screens rendered and screenshot-verified at mobile width
  • Public GitHub repos only — GITHUB_REPO_URL_REGEX in shared schemas
  • Locked initial prompt — discovery prompt template owned by the backend; user cannot write the first message
  • Login gate on chat interactions — ChatGate triggers LoginSheet on any send attempt by an anonymous visitor
  • Monthly cap + kill switch — TrialCounter DO + TRIAL_ENABLED env var
  • Staging uses opencode + Workers AI; production will use claude-code + Anthropic
  • Valibot for runtime validation — every request schema in packages/shared/src/trial.ts
  • System user pattern — no schema change to projects.userId; anonymous projects owned by sam_anonymous_trials until claimed
  • HMAC-signed claim cookie — uses auto-provisioned TRIAL_CLAIM_TOKEN_SECRET

Local Quality Gates

  • pnpm typecheck — clean across all packages
  • pnpm lint — 0 errors
  • API unit tests — 3773 / 3773 passing (includes new writeTrial regression test)
  • Web unit tests — 1863 / 1863 passing

Staging Deployments

Run Commit Result
24614206706 c2780059 ✅ initial merge deploy
24614985380 pre-b15ca27c Unknown argument: remote — fixed by removing --remote flag
24615223155 post-db1d6332 ✅ final green with kill-switch KV put + all fixes

Staging Verification (Playwright + curl, live app)

TRIAL_ENABLED=true on staging, end-to-end happy path exercised:

Check Result
GET /api/trial/status {"enabled":true,"remaining":1500,"resetsAt":"2026-05-01"}
POST /api/trial/create with https://github.com/sindresorhus/is 201 with Set-Cookie: sam_trial_fingerprint=… + sam_trial_claim=…
GET /api/trial/:trialId/events via real cookies HTTP/2 200, content-type: text/event-stream, : connected heartbeat ✅
/try landing form submission on mobile 375×667 navigates to /try/:trialId, ChatGate renders "Live" status, feed waits for events, zero console errors ✅
Same on desktop 1280×800

Screenshots: trial-sse-live-mobile.png, trial-sse-live-desktop.png (in .codex/tmp/playwright-screenshots/).

Regression spot-check

  • Authenticated via smoke-test token login → /dashboard renders, project list loads, 0 console errors
  • Navigation sidebar, command palette, notifications panel all intact
  • /health200 healthy

What was NOT verified end-to-end

The OAuth claim + post-login auto-submit leg (chat gate → login sheet → GitHub OAuth → /api/trial/:trialId/claim → stashed draft replay) requires a real GitHub OAuth round-trip with a human. All individual components have unit + integration coverage; the OAuth leg is gated behind a real sign-in and deferred to Raphaël's manual review.

Review Status

Full specialist review was not dispatched because this PR is flagged for manual review by @raphaeltm before merge. The needs-human-review label is applied. Raphaël will decide whether to dispatch additional reviewers, flip production config, and proceed to merge.

Do NOT Merge Yet

  • ❌ Do NOT merge to main until Raphaël has reviewed the configuration checklist.
  • ❌ Do NOT deploy to production until the Anthropic key is procured and the OAuth claim leg has been exercised at least once.

🤖 Generated with Claude Code

raphaeltm and others added 9 commits April 18, 2026 20:17
Lays groundwork for /try — shared types (Valibot), DB migration 0043
(system user sentinel + trial_waitlist table), wrangler TRIAL_COUNTER DO
binding (v7 migration) + trial env vars, trial services (HMAC-signed
cookies with constant-time compare, KV kill-switch with 30s cache +
fail-closed, discovery prompt), 501 route stubs under /api/trial/*,
TrialCounter DO with atomic transactionSync increment/decrement, frontend
Try/TryDiscovery stubs mounted at /try + /try/:trialId, operator docs
at docs/guides/trial-configuration.md, and 43 unit tests covering
cookie round-trip/tamper/expiry, kill-switch cache/TTL/fail-closed, and
TrialCounter cap enforcement.

Trials remain disabled by default (kill-switch fails closed) so this is
safe to deploy without setting TRIAL_CLAIM_TOKEN_SECRET. Wave 1 will wire
the live create/events/claim/waitlist handlers.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Implements backend lifecycle for zero-friction trial onboarding (Wave 1
Track A):

- trials table + sentinel-installation workaround (migration 0044)
- TrialCounter DO: fetch surface + tryIncrement/prune RPC methods
- POST /api/trial/create with Valibot validation, kill-switch gate,
  GitHub repo probe (size/privacy), DO slot allocation, and
  counter-decrement rollback on D1 failure
- GET /api/trial/status with fail-closed fallback when DO throws
- POST /api/trial/waitlist with lowercase-email dedupe via
  onConflictDoNothing(email, resetDate)
- Three scheduled modules wired into cron dispatch:
    - trial-expire: 5-min sweep marks expired trials
    - trial-rollover: monthly DO pruning (0 3 1 * *)
    - trial-waitlist-cleanup: daily notified-row purge (0 4 * * *)
- All configurable via DEFAULT_* constants + env overrides (Principle XI)
- 92 new behavioral tests covering resolution branches, DO RPC surface,
  fallback semantics, cookie issuance, and fail-closed error paths

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Builds the frontend components that gate the trial experience behind
GitHub auth — a chat input with suggestion chips for anonymous users,
and a login sheet that opens when they send their first message.
Integration into TryDiscovery (SSE streaming `trial.idea` events) lands
in wave-2 alongside the live /claim handler.

Components
- ChatGate: autogrowing textarea + horizontally-scrolling chip row;
  Cmd/Ctrl+Enter submits, Enter inserts newline; disabled state when
  empty/whitespace; surfaces submit errors without clearing the draft
- LoginSheet: responsive dialog (mobile bottom-sheet, desktop centered
  modal) with Escape/backdrop/close-button dismissal, focus trap
  between primary CTA + close, body scroll lock, return-to URL
  construction (trialId URL-encoded, ?claim=1 sentinel)
- SuggestionChip: 44px-tall touch target with title + optional summary,
  aria-label compose, disabled state

Hooks
- useTrialDraft: per-trialId localStorage draft with 400ms debounce
  (flush-on-unmount), synchronous writes when debounceMs=0, rehydrates
  on trialId change, no-ops with undefined trialId
- useTrialClaim: idle → claiming → submitting → done/error state
  machine; injectable claim/submit fns for testing; StrictMode-safe
  (single claim per mount); clears draft only on successful submit;
  preserves projectId when submit fails so UI can retry

Harness + tests
- TrialChatGateHarness at /__test/trial-chat-gate (public, not linked
  from nav) renders ChatGate + LoginSheet with query-param-driven mock
  data (ideas=0..20, long=1, auth=1, loginOpen=1) so Playwright can
  capture screenshots without hitting the real claim flow
- 43 new unit tests across components + hooks covering rendering,
  interactions, persistence, error states, focus management
- 13 Playwright visual scenarios at 375x667 + 1280x800: empty state,
  1/5/20 chips (page-level overflow asserted false — chip row owns
  its horizontal scroll), long-text wrapping, anonymous send opening
  LoginSheet, bottom-sheet vs centered-modal layouts, 44px touch
  targets on send button + suggestion chips

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Wire trial onboarding backend so the post-OAuth claim flow and the
per-trial event stream work end-to-end.

- TrialEventBus DO: in-memory ring buffer (MAX_BUFFERED_EVENTS=500)
  with long-poll /poll, /append with terminal-event auto-close, /close,
  waiter-wake semantics. Configurable via TRIAL_EVENT_BUS_DEFAULT_POLL_TIMEOUT_MS.
- trial-store service: KV-backed writeTrial/readTrial/markTrialClaimed
  with 3-key indexing (by trialId, by projectId, by fingerprint).
- trial-runner: mode-aware config resolution (staging=opencode+workers-ai,
  production=claude-code+anthropic); production requires ANTHROPIC_API_KEY_TRIAL.
  startDiscoveryAgent creates chat + ACP session with discovery prompt.
  emitTrialEvent/emitTrialEventForProject append to TrialEventBus best-effort.
- GET /api/trial/:trialId/events: fingerprint-cookie-authenticated SSE.
  Verifies trial record + HMAC signature + UUID match (fails closed on
  any mismatch). Heartbeat every TRIAL_SSE_HEARTBEAT_MS (default 15s);
  long-poll DO every TRIAL_SSE_POLL_TIMEOUT_MS; max duration
  TRIAL_SSE_MAX_DURATION_MS. Closes on terminal event.
- POST /api/trial/claim: auth-required; verifies HMAC claim cookie;
  atomic D1 UPDATE with WHERE userId=TRIAL_SENTINEL_USER_ID precondition;
  clears claim cookie; returns {projectId, claimedAt}. Returns 409 on
  UPDATE-changes=0 race.
- OAuth callback hook (maybeAttachTrialClaimCookie): on 2xx/3xx response
  from /callback/github, if a valid fingerprint cookie maps to an unclaimed
  non-expired trial, sign a claim token, set sam_trial_claim cookie, and
  rewrite Location to https://app.${BASE_DOMAIN}/try/:trialId?claim=1.
- Env + wrangler binding for TRIAL_EVENT_BUS Durable Object.

70 new unit tests (6 files) cover DO long-poll/waiter-wake/terminal-close,
SSE auth-failure matrix + happy path, claim route 400/404/409/200 branches,
oauth-hook bail-out matrix + rewrite happy path, trial-runner config
resolution + error paths, and trial-store round-trips.
Replaces Wave 0 stubs with full trial discovery flow:

- Try landing page with GitHub URL validation + error branches
  (invalid_url, repo_private, trials_disabled, cap_exceeded, existing_trial)
- TryDiscovery streams SSE events (started, progress, knowledge, idea,
  ready) with exponential backoff reconnect (max 5 retries) and renders
  repo header, progress, knowledge graph, ideas, and workspace-ready CTA
- TryCapExceeded page with waitlist email capture + inline validation
- TryWaitlistThanks confirmation page
- trial-api client: createTrial, joinWaitlist, openTrialEventStream
- ChatGate stub placeholder for Track D integration

Tests:
- Vitest component tests for Try + TryCapExceeded (11 cases: URL
  validation, success nav, existing-trial resume, each error branch,
  email validation, waitlist submit, API error)
- Playwright visual audit at 375x667 and 1280x800 covering landing,
  discovery (streaming/ready/empty), cap-exceeded, waitlist-thanks, and
  all inline error states — overflow asserted on every test

Mobile-first with design tokens; 56px primary CTA, 44px secondary
touch targets; env(safe-area-inset-*) padding.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
… integration

Resolves conflict in ChatGate.tsx by keeping Track D's real implementation;
adapts TryDiscovery to Track D's ChatGate contract (TrialIdea shape,
onAuthenticatedSubmit handler that navigates to the claimed project chat
with the message staged in sessionStorage).
@simple-agent-manager simple-agent-manager bot added the needs-human-review Agent could not complete all review gates — human must approve before merge label Apr 18, 2026
… kill-switch

Previously, self-hosters had to manually run `wrangler secret put
TRIAL_CLAIM_TOKEN_SECRET` and `wrangler kv key put trials:enabled true`
before the /try flow would work on staging. Wire both into the standard
deployment pipeline so staging trials are live out of the box.

Changes:
- infra/resources/secrets.ts: add `trial-claim-token-secret` RandomId
  resource (32 bytes base64) + export `trialClaimTokenSecret` Pulumi
  output, same persistence pattern as encryptionKey / jwtPrivateKey.
- infra/index.ts: re-export the new output.
- scripts/deploy/configure-secrets.sh: read trialClaimTokenSecret from
  Pulumi state and set it as a required Worker secret on every deploy.
- .github/workflows/deploy-reusable.yml: add a staging-only step that
  sets KV `trials:enabled=true` via wrangler after the worker deploys.
  Production stays opt-in per spec (operator flips the flag manually
  when ready to accept live trial traffic).
- docs/guides/trial-configuration.md: document the automation — no more
  manual secret-put or kv-put steps for staging.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
`wrangler kv key put` writes to remote by default; --remote is not a
valid flag for that subcommand and caused the staging deploy's trial
kill-switch step to fail.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…olve it

Track A (create.ts) inserted trial records into D1 only; Track B readers
(events.ts, claim.ts, trial-runner.ts) all look trials up via
trial-store.readTrial() which reads from KV. The result: every SSE
connection 404'd with "Trial not found or expired" seconds after the
trial was created.

Integration fix:
- create.ts calls writeTrial() after the D1 insert, with projectId=''
  (Track B's orchestrator rewrites the KV record once the project row
  exists). On KV failure, roll back the D1 row and release the
  TrialCounter slot so we don't burn a cap entry.
- writeTrial() skips the trial-by-project index when projectId is
  empty, preventing all pending trials from colliding on
  `trial-by-project:`.
- events.ts: use errors.notFound('Trial') — previous argument produced
  doubled "Trial not found or expired not found".

Added a regression test asserting writeTrial is invoked from the happy
path (captures the exact KV put) so this bug cannot silently recur.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@simple-agent-manager
Copy link
Copy Markdown
Contributor Author

Staging verification update — trials automation + integration fix

Two follow-up commits landed after the initial PR:

1. Deploy automation (commits 086f4ded + b15ca27c)

  • TRIAL_CLAIM_TOKEN_SECRET is now auto-provisioned by Pulumi (infra/resources/secrets.ts) and pushed by configure-secrets.sh on every deploy — no manual wrangler secret put
  • trials:enabled KV flag is set automatically by deploy-reusable.yml on staging deploys — no manual wrangler kv key put
  • Production remains opt-in (operator flips the flag when ready)

2. Wave 1 integration bug fix (commit db1d6332)

  • Track A persisted trials to D1; Track B read from KV via trial-store.readTrial(). Nothing wrote KV → every SSE /events call returned 404 "Trial not found".
  • Fix: create.ts calls writeTrial() after the D1 insert with projectId='' (Track B's orchestrator rewrites the record once the project row exists). On KV failure, D1 row is rolled back and the TrialCounter slot released.
  • Hardened writeTrial() to skip the by-project index when projectId is empty, preventing pending-trial collisions.
  • Added regression test asserting writeTrial is invoked — this bug cannot silently recur.

Staging verification evidence (run 24615223155, 2026-04-18 22:22Z):

  • /api/trial/status{"enabled":true,"remaining":1498,"resetsAt":"2026-05-01"}
  • POST /api/trial/create with public repo URL → 201 with set-cookie fingerprint + claim cookies, returns trialId
  • GET /api/trial/:trialId/events with fingerprint cookie → HTTP/2 200 text/event-stream, : connected heartbeat received
  • /try/:trialId page renders ChatGate in "Live" state (green), zero console errors, on mobile 375×667 and desktop 1280×800

Updated configuration checklist for @raphaeltm:

  • TRIAL_CLAIM_TOKEN_SECRET — auto-provisioned by Pulumi, no action needed
  • Staging kill-switch — auto-set by deploy workflow, no action needed
  • Production kill-switch — flip trials:enabled=true manually when ready: pnpm --filter @simple-agent-manager/api exec wrangler kv key put "trials:enabled" "true" --binding KV --env production
  • Production Anthropic key — set ANTHROPIC_API_KEY_TRIAL via wrangler secret put ... --env production once procured (required for production trials — staging uses Workers AI, no key needed)
  • Optional tunables in apps/api/wrangler.toml: TRIAL_MONTHLY_CAP (default 1500), TRIAL_WORKSPACE_TTL_MS (default 20 min), TRIAL_DATA_RETENTION_HOURS (default 168)

Production deploy and merge remain deferred per your instructions.

simple-agent-manager bot and others added 2 commits April 19, 2026 10:41
…760)

* task: move trial-orchestrator-wire-up to active

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* feat(shared): add trial orchestrator timing/retry constants

Introduce DEFAULT_TRIAL_ORCHESTRATOR_* and DEFAULT_TRIAL_KNOWLEDGE_*
constants used by the alarm-driven TrialOrchestrator DO and the fast-path
GitHub knowledge probes fired from POST /api/trial/create. Every value is
env-var overridable (Constitution Principle XI).

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* feat(trial): add TrialOrchestrator DO binding, env vars, sentinel installation

- Declare TRIAL_ORCHESTRATOR DO binding + v9 migration in wrangler.toml
- Extend Env interface with TrialOrchestrator/Knowledge tuning knobs and
  TRIAL_ANONYMOUS_INSTALLATION_ID override
- Migration 0045 seeds the system_anonymous_trials_installation sentinel
  row so anonymous trial projects can satisfy the NOT NULL + FK constraint
  on projects.installation_id without owning a real GitHub App install

The DO class itself is added in the next commit.

* feat(trial): add TrialOrchestrator DO state machine

Adds the alarm-driven TrialOrchestrator Durable Object (one per trialId)
that replaces the fire-and-forget `waitUntil(provisionTrial())` pattern
with a resumable state machine.

Module layout mirrors TaskRunner:
  - types.ts     — TrialOrchestratorStep union + persisted state shape
  - helpers.ts   — re-exports TaskRunner helpers; adds sentinel-user /
                   sentinel-installation resolvers + safeEmitTrialEvent.
  - steps.ts     — per-step handlers (project_creation, node_selection,
                   node_provisioning, node_agent_ready, workspace_creation,
                   workspace_ready, discovery_agent_start, running).
  - index.ts     — DO class: start(), alarm() dispatch, backoff retry,
                   overall-timeout guard, trial.error emission on failure.

Each step emits `trial.progress` at entry so the SSE stream reflects
where the orchestrator is. Terminal `running` step is idle — the ACP
bridge (wired separately) is responsible for emitting `trial.ready`
after the discovery agent produces its first assistant turn.

All timing/retry knobs read from env vars with DEFAULT_* fallbacks
(Constitution Principle XI). Adds two new optional env fields:
TRIAL_VM_SIZE and TRIAL_VM_LOCATION for trial-specific VM overrides.

Exports the class from apps/api/src/index.ts so the Workers runtime
can instantiate it via the TRIAL_ORCHESTRATOR binding (already declared
in wrangler.toml v9 migration).

Task: tasks/active/2026-04-19-trial-orchestrator-wire-up.md

* feat(trial): bridge ACP/MCP events into trial SSE stream

Adds a dedicated `services/trial/bridge.ts` module with three helpers that
hook into existing hot paths and fan qualifying events out as `trial.*` SSE
events:

  - bridgeAcpSessionTransition: `running` → trial.ready (with workspaceUrl
    derived from BASE_DOMAIN + workspaceId), `failed` → trial.error.
  - bridgeKnowledgeAdded:       fires trial.knowledge when the discovery
    agent adds a knowledge observation via MCP.
  - bridgeIdeaCreated:          fires trial.idea with a summary-clipped
    excerpt when the discovery agent creates an idea via MCP.

All three helpers short-circuit on non-trial projects after a single
`readTrialByProject(env, projectId)` KV lookup, so normal (non-trial)
project traffic only pays that one extra KV read on qualifying events.

Hook sites:
  - ProjectData DO `transitionAcpSession` — dynamic-imports the bridge
    and dispatches after the transition succeeds, guarded by `if (projectId)`
    and wrapped in try/catch so bridge errors never block the transition.
    Casts `this.env` through unknown to the worker-scope Env because the
    DO's local Env type is intentionally narrow.
  - `handleAddKnowledge` MCP handler — dispatches after addKnowledgeObservation.
  - `handleCreateIdea`   MCP handler — dispatches after the DB insert.

Every dispatch is fire-and-forget; bridge errors are already caught
inside each helper but the call sites add a second try/catch for defense.

Task: tasks/active/2026-04-19-trial-orchestrator-wire-up.md

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* feat(trial): wire TrialOrchestrator + GitHub knowledge into POST /api/trial/create

Adds two fire-and-forget dispatches after the trial record is written and
before the HTTP response returns, via c.executionCtx.waitUntil:

1. TrialOrchestrator DO `start()` — kicks off the alarm-driven state machine
   that provisions a project, workspace, and discovery agent session. The
   DO is idempotent on `start()`, so accidental re-invocations no-op.

2. emitGithubKnowledgeEvents() — hits unauthenticated GitHub REST endpoints
   (`/repos/:o/:n`, `/repos/:o/:n/languages`, `/repos/:o/:n/readme`) in
   parallel and emits up to `TRIAL_KNOWLEDGE_MAX_EVENTS` `trial.knowledge`
   events within ~`TRIAL_KNOWLEDGE_GITHUB_TIMEOUT_MS` each. Surfaces
   description, primary language, stars, topics, license, language breakdown,
   and README first paragraph so the SSE stream shows activity within ~3s
   while the VM provisions in the background.

Both helpers fully swallow errors — an orchestrator dispatch failure or
GitHub rate-limit hit never blocks the response or crashes the Worker.

All knobs are env-configurable per Constitution Principle XI:
- TRIAL_KNOWLEDGE_GITHUB_TIMEOUT_MS (default 5000)
- TRIAL_KNOWLEDGE_MAX_EVENTS (default 10)

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* test(trial): cover orchestrator dispatch, bridge, and GitHub knowledge probe

Adds four categories of behavioral tests for the trial onboarding wiring:

1. trial-create.ts.test.ts (+2 cases)
   - Asserts TrialOrchestrator.start() is dispatched via waitUntil with
     trialId, repoOwner, repoName, and canonical repoUrl.
   - Asserts a rejecting start() does NOT propagate — the HTTP response
     still returns 201 (fire-and-forget contract).
   - Updates makeEnv() to stub TRIAL_ORCHESTRATOR + TRIAL_EVENT_BUS
     bindings and introduces makeExecutionCtx() helper.
   - Also adds a graceful-fallback in create.ts so routes that run without
     a Worker executionCtx (unit tests) still complete instead of 500-ing
     on Hono's "This context has no ExecutionContext" throw.

2. trial-github-knowledge.test.ts (new, 5 cases)
   - Happy path: verifies description, primary language, stars, topics,
     license, language breakdown, and README paragraph are all emitted.
   - TRIAL_KNOWLEDGE_MAX_EVENTS cap is enforced.
   - Total network failure → 0 events, no throw.
   - Non-2xx repo metadata response → 0 events, no throw.
   - emitTrialEvent rejection → no throw (last line of defense).

3. trial-orchestrator.test.ts (new, 4 cases)
   - start() persists initial state with currentStep='project_creation'
     and schedules an alarm.
   - start() is idempotent — second call with same input is a no-op and
     does not re-schedule the alarm.
   - alarm() on a completed state is a terminal no-op.
   - alarm() emits trial.error and marks completed when the overall
     timeout budget is exceeded.

4. trial-bridge.test.ts (new, 9 cases)
   - bridgeAcpSessionTransition: no-ops on non-trial projects, emits
     trial.ready on 'running' with ws-{id}.{BASE_DOMAIN} URL, emits
     trial.error on 'failed', no-ops on other transitions, swallows
     emitter errors.
   - bridgeKnowledgeAdded / bridgeIdeaCreated: no-op on non-trial,
     emit correct event shape when trial exists, swallow errors.

All 3,793 tests pass; typecheck clean.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* docs(trial): document TrialOrchestrator + GitHub knowledge fast-path

Adds an "Orchestrator and Fast-Path Knowledge" section to the trial
configuration guide covering the two fire-and-forget background tasks
dispatched from POST /api/trial/create (TrialOrchestrator DO and the
GitHub REST knowledge probe) plus the ACP/MCP event bridge, with
tunables tables for both.

Also records the change in CLAUDE.md "Recent Changes" and marks the
corresponding checklist items in the task file.

* style(trial): sort imports per eslint rules

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(trial): emit trial.started event from orchestrator start()

The SSE stream's first real event must be `trial.started` so the
frontend can transition out of the "Warming up..." empty state.
Without it, viewers sat on the placeholder until `trial.progress` or
`trial.knowledge` arrived — which could be 3-5s later.

Added unit test asserting `emitTrialEvent` is called exactly once with
type='trial.started' and the expected shape.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* test(trial): capability test chaining start() + alarm() through event bus

Addresses task-completion-validator HIGH finding #2: no capability
test exercised the full orchestrator state machine through the event
bus seam. Existing per-method tests covered each transition in
isolation but did not chain them.

New test drives:
  start() → persist + setAlarm + emit trial.started
    → (simulate expired budget)
    → alarm() → mark failed + emit trial.error

The `emitTrialEvent` mock is the event-bus seam; its downstream is
already covered by tests/unit/routes/trial-events.test.ts which
verifies the bus → SSE stream path.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* chore(trial): archive orchestrator wire-up task

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* test(trial): cover alarm() retry/backoff + step handler invariants

Addresses test-engineer review HIGH findings #1 and #2 (partial).

Finding #1 — alarm() retry/backoff:
Added 4 tests driving the step-error catch branches via a `./steps`
vi.mock. Covers transient-error + retries-remaining (increments counter
and schedules backoff, no failTrial), permanent-error (immediate
failTrial regardless of budget), transient-error with retries exhausted
(promotes to failTrial), and the null-state guard (alarm fires before
start()).

Finding #2 — step handlers:
New `trial-orchestrator-steps.test.ts` covers the two highest-value
invariants that don't need D1/DO plumbing mocks:
  - handleRunning marks state.completed = true
  - handleDiscoveryAgentStart throws permanent on missing IDs
  - handleDiscoveryAgentStart is idempotent when session already linked

Broader per-handler coverage (project_creation / node_selection /
node_provisioning / node_agent_ready / workspace_creation /
workspace_ready) tracked in
tasks/backlog/2026-04-19-trial-orchestrator-step-handler-coverage.md —
those paths require mocks for drizzle + node-agent + project-data
services and are out of scope for this PR.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(trial): remove hardcoded BASE_DOMAIN fallback + extract heartbeat skew constant

Addresses constitution-validator findings:

HIGH — bridge.ts:41 had `env.BASE_DOMAIN || 'workspaces.example.com'` fallback.
BASE_DOMAIN is a non-optional binding; a misconfiguration that let it be empty
would silently generate workspace URLs pointing at workspaces.example.com
instead of failing loudly. Removed the fallback.

MEDIUM — steps.ts had a hardcoded `30_000` heartbeat-skew window. Extracted to
DEFAULT_TRIAL_ORCHESTRATOR_HEARTBEAT_SKEW_MS (shared), TRIAL_ORCHESTRATOR_HEARTBEAT_SKEW_MS
env override, getHeartbeatSkewMs() getter on the DO, threaded through
TrialOrchestratorContext.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(trial): per-IP rate limit on POST /api/trial/create + SSE injection guard

Addresses security-auditor HIGH findings:

1. Rate limit on POST /api/trial/create (was missing)
   - New rateLimitTrialCreate() factory (useIp=true, keyPrefix=trial-create)
   - Default 10 req/hr, configurable via RATE_LIMIT_TRIAL_CREATE env var
   - Tighter than the general anonymous bucket because each trial create
     allocates a Durable Object, fires ~4 GitHub API calls, and consumes
     a monthly trial slot
   - Mounted per-route in create.ts so the limiter sees request env
   - Regression test exercises 429 path with IP-scoped KV window

2. SSE event-name sanitization in formatSse()
   - Strips CR/LF to prevent SSE-frame injection if a future caller ever
     bypasses the TrialEvent discriminated union via `as never` casts or
     dynamic event names
   - Function now exported for direct testing
   - New trial-events-format.test.ts covers: happy path stable shape,
     CR/LF strip on hostile event name (single event frame survives),
     and JSON data escaping for embedded newlines

* fix(trial): switch TrialOrchestrator to new_sqlite_classes + drop premature status gate

Addresses cloudflare-specialist HIGH findings:

1. wrangler.toml v9 migration: new_classes -> new_sqlite_classes
   Cloudflare recommends SQLite-backed storage for new DO classes; the
   KV-style ctx.storage.put() API works identically on both backends but
   SQLite is the future-forward choice. TrialOrchestrator has not yet been
   deployed to any environment (introduced in this PR chain), so flipping
   the migration type is safe.

2. handleNodeProvisioning: remove synchronous status='running' gate
   After provisionNode() returns, async-IP providers (Scaleway, GCP) leave
   the node in 'creating' status — the IP and status='running' flip happens
   on the first heartbeat. Synchronously requiring status='running' here
   forced every async-IP trial through the retry/backoff cycle until the
   heartbeat landed, wasting retry budget and risking permanent failure on
   slow VM boots. The next step (node_agent_ready) polls heartbeat freshness
   with its own timeout, which correctly handles both sync (Hetzner) and
   async (Scaleway/GCP) provisioning paths.

Regression test: handleNodeProvisioning advances to node_agent_ready even
when provisionNode() leaves the node in 'creating' status.

* fix(trial): HMAC-verify fingerprint cookie before reusing UUID

Security-auditor HIGH: the old code extracted the fingerprint UUID from the
`sam_trial_fingerprint` cookie by splitting on the last `.` without verifying
the HMAC signature. An attacker who learned a victim's fingerprint UUID
(from logs, a captured cookie, or a prior trial row) could forge
`<victimUuid>.anything` to overwrite the `trial-by-fingerprint:<victimUuid>`
KV index to point at their own trial. The victim's subsequent OAuth hook
lookup would then redirect them to the attacker's trial project.

Fix: call verifyFingerprint(existingFp, secret) and only trust the returned
UUID. Fall back to crypto.randomUUID() on invalid / missing signature. The
secret is already resolved earlier in the same handler (line 195-203).

Added regression test in trial-create.ts.test.ts — a forged cookie MUST NOT
reuse the victim's UUID; a fresh UUID is minted instead. Updated the
"reuses existing fingerprint" test to use a validly-signed cookie.

---------

Co-authored-by: Raphaël Titsworth-Morin <[email protected]>
Co-authored-by: Claude Opus 4.6 <[email protected]>
* task: move trial-onboarding-ux-polish to active

* feat(trial): polish discovery feed with skeleton timeline + knowledge grouping

- Extract all timing/threshold constants to trial-ui-config.ts (Constitution XI)
- Add STAGE_LABELS map + friendlyStageLabel() for orchestrator stage strings
- TryDiscovery: render StageSkeleton timeline before first SSE event arrives
- TryDiscovery: group rapid trial.knowledge events into a single card
- TryDiscovery: surface "taking longer than usual" hint when SSE silent for 20s
- TryDiscovery: retry-aware terminal error panel
- ChatGate: spinner + aria-busy on send, snap-x chip scroll, anonymous hint copy
- Try: friendlier validation copy, testid hooks for landing audit

* test(trial): cover stage-label mapping + skeleton/error/knowledge-burst Playwright cases

* task: archive trial-onboarding-ux-polish

* fix(trial): SSE replay dedup, accessible badges, larger touch targets

Addresses Phase 5 review findings on the trial onboarding UX polish PR:

CRITICAL — SSE event replay duplication
  EventSource silently re-opens after a transport error and the server may
  replay any buffered events the client missed. Without dedup, the feed
  duplicated every replayed event. Add a composite (`type:at`) dedup set
  in TryDiscovery that resets on trialId change.

HIGH — color-only ConnectionBadge (WCAG 1.4.1)
  Status was conveyed by background color alone. Prepend a Unicode shape
  indicator (●/✕/↺/○) so the meaning is also conveyed in monochrome.

HIGH — knowledge toggle hit area (WCAG 2.5.5)
  The "+N more" toggle on grouped knowledge cards was 24px tall — below
  the 44px touch-target minimum. Promote to min-h-11 with vertical hit
  padding.

MEDIUM — semantic header role + truncation hint
  The sticky discovery header used role="banner" (reserved for the
  page-wide masthead) and the truncated repo title had no full-text
  hover affordance. Switch to role="region" + aria-label and move the
  title attribute to the truncating wrapper.

LOW — error CTA touch targets
  The "Try again" / "Join the waitlist" Links were below 44px. Promote
  to inline-flex min-h-[44px].

Tests
  - try-discovery-dedup.test.ts: behavioural coverage of eventDedupKey
    and the dedup branch in onEvent (3 scenarios: identical replay,
    chronological non-collision, type-vs-timestamp collision).
  - try-discovery-build-feed.test.ts: boundary coverage of buildFeed
    (within-window merge, exact-boundary `<=` merge, +1ms split,
    interleaved non-knowledge break, error-event exclusion).
  - ChatGate.test.tsx: spinner visible/hidden behavioural test using a
    deferred promise (idle → sending → resolved transitions).
  - trial-ui-audit.spec.ts: knowledge-burst test now asserts exactly one
    grouped card (was: presence only) and exercises the expand toggle.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(trial): keep StageSkeleton visible after lone trial.started; forward Alert testid

Two narrow fixes uncovered by Playwright visual audit:

1. **StageSkeleton hides too eagerly.** `showSkeleton = events.length === 0`
   meant a lone `trial.started` event (which is just an acknowledgement,
   not visible progress) caused the "Setting things up" roadmap to vanish
   while the user was still staring at a blank screen. Tighten to "no
   substantive events yet" — keep showing the roadmap until a real
   progress / knowledge / idea / ready / error event arrives.

2. **`Alert` drops `data-testid`.** The shared design-system `Alert`
   component didn't declare or forward `data-testid`, so
   `<Alert variant="error" data-testid="trial-error-panel">` silently
   discarded the prop and the terminal-error Playwright assertion
   couldn't find the panel. Add the prop to `AlertProps` and forward it
   to the rendered `<div role="alert">`.

All 45 Playwright trial-ui-audit tests now pass across iPhone SE,
iPhone 14, and Desktop projects.

---------

Co-authored-by: Raphaël Titsworth-Morin <[email protected]>
Co-authored-by: Claude Opus 4.6 <[email protected]>
simple-agent-manager bot and others added 5 commits April 19, 2026 15:58
)

* task: move trial-events-debug to active

* task: instrument trial event bus path for staging triage

Add high-signal log.info points at every boundary in the trial event
flow so `wrangler tail` can show exactly where the pipeline drops:

- create.ts: log dispatch_begin, orchestrator_task.{enter,stub_ready,
  start_returned}, knowledge_task.{enter,done}, waitUntil_registered
- trial-runner.ts:emitTrialEvent — log emit_begin / emit_ok
- trial-orchestrator: start.enter, state_put, alarm_set,
  trial_started_emitted; alarm.enter
- trial-event-bus: handleAppend.enter / stored / rejected_closed

Pure instrumentation — no behavior change. Will be pared back or
removed once the failure mode is identified on staging.

* fix(trial): emit unnamed SSE frames so EventSource.onmessage fires

Root cause of the zero-events-on-staging incident (2026-04-19):
formatSse() wrote named SSE frames ('event: trial.knowledge\ndata: {...}')
but the frontend subscribes via source.onmessage, which only fires for the
default (unnamed) event. Bytes arrived on the wire — curl saw them — but no
frontend-visible event was ever dispatched.

Change the SSE serializer to emit unnamed frames ('data: {...}'). The
TrialEvent payload itself carries a 'type' discriminator so no information
is lost. Update the unit test to lock in the new contract (no 'event:' line)
and point at the post-mortem.

Also fix a latent eventsUrl contract mismatch: POST /api/trial/create
returned '/api/trial/events?trialId=X' while the real route is
'/api/trial/:trialId/events'. The frontend builds its own URL so end-users
weren't affected, but the response-field contract was wrong. The previous
unit test used toContain() on a substring, masking the drift.

See docs/notes/2026-04-19-trial-sse-named-events-postmortem.md.

* test(trial): add TrialEventBus → SSE capability test

Regression guard for the 2026-04-19 incident. Seeds a trial in KV, appends
events directly on the TrialEventBus DO (identical to emitTrialEvent()),
opens the SSE stream via SELF.fetch with a valid fingerprint cookie, reads
the raw stream bytes, and asserts:

  - HTTP 200 + correct content-type
  - At least one 'data: {...}' frame
  - No 'event:' line anywhere (the regression guard)
  - The parsed JSON payload round-trips through the bus intact

Also add TRIAL_EVENT_BUS DO binding and TRIAL_* env bindings to the workers
vitest config so this test (and future trial-related worker tests) can
construct stubs.

Note: the existing workers test pool is currently broken on this branch and
base (miniflare WebSocket exits unexpectedly on all 6 pre-existing worker
tests too — not caused by this change). Once the pool is unblocked this
test runs as-is.

* docs(trial): post-mortem + rule 13 ban curl-only SSE verification

Post-mortem covers what broke, the two-layer contract mismatch (named SSE
events + wrong eventsUrl shape), timeline, why it wasn't caught (no E2E
capability test, curl used instead of a real browser, frontend test path
not exercised), the class of bug, and the process fixes landing in this PR.

Update rule 13 (staging verification) to explicitly ban curl-only
verification for browser-consumed SSE/WebSocket streams — curl confirms the
byte stream, only a real browser confirms dispatch to onmessage.

* task: record root cause + fixes on trial SSE events task

* test(trial): update trial-events.test SSE assertion for unnamed frames

The integration test for GET /api/trial/:trialId/events was asserting the
old named-event contract ('event: trial.ready'). With the formatSse() fix
the frame is unnamed; update the assertion to lock in the new contract
(data: line present, no event: line).

* task: archive trial SSE events debugging task

* chore(trial): address review findings on SSE events fix

- Add TRIAL_ORCHESTRATOR + TRIAL_COUNTER DO bindings to
  apps/api/vitest.workers.config.ts (cloudflare-specialist MEDIUM)
- CLAUDE.md: prepend 'trial-sse-events-fix' entry to Recent Changes
  (doc-sync-validator MEDIUM)
- Fix broken link in postmortem (tasks/active -> docs/notes) and tick
  the completed rule-13 follow-up checkbox (doc-sync-validator LOW)
- Add cross-reference from .claude/rules/02-quality-gates.md to the
  rule-13 curl-only SSE-verification ban (doc-sync-validator LOW)
- File pre-existing HIGH (AbortController not propagated into
  busStub.fetch) and MEDIUM (nextCursor persistence) as backlog tasks
  so they're tracked but don't block this fix PR

---------

Co-authored-by: Raphaël Titsworth-Morin <[email protected]>
…764)

* task: move trial orchestrator agent-boot task to active

* feat(trial): boot discovery agent on VM + detect real default branch

Two bugs blocked the trial demo from working end-to-end:

1. handleDiscoveryAgentStart only created chat + ACP session records but
   never called createAgentSessionOnNode / startAgentSessionOnNode. The
   ACP session sat in `pending` forever, never transitioning to `running`,
   so `trial.ready` never fired.
2. Project defaultBranch + workspace branch were hardcoded to 'main', so
   trials on master-default repos (e.g. octocat/Hello-World) failed the
   VM-side `git clone --branch main`.

Fix (mirrors TaskRunner's agent-session-step pattern):

- Add `defaultBranch`, `mcpToken`, `agentSessionCreatedOnVm`,
  `agentStartedOnVm`, `acpAssignedOnVm`, `acpRunningOnVm` fields to
  TrialOrchestratorState for crash-safe idempotency.
- `fetchDefaultBranch()` probes GitHub's public API with a 5s
  AbortController timeout (TRIAL_GITHUB_TIMEOUT_MS override), falls
  back to 'main' on any failure. Threaded through both
  `projects.default_branch` and the workspace-side `git clone --branch`.
- `handleDiscoveryAgentStart` now runs a 5-step idempotent flow:
    1. startDiscoveryAgent (existing) -> chat + ACP session records.
    2. createAgentSessionOnNode (new) -> D1 agent_sessions row + VM
       agent registers the session.
    3. generateMcpToken + storeMcpToken (new) -> KV token so the agent
       can call add_knowledge / create_idea.
    4. startAgentSessionOnNode (new) -> VM agent boots the agent
       subprocess with the discovery prompt + MCP server URL.
    5. transitionAcpSession pending -> assigned -> running -> the trial
       bridge emits `trial.ready` with workspaceUrl.
- Trial's synthetic taskId = state.trialId (trials have no tasks row),
  so MCP rate-limiting keys per-trial. Drop get_instructions from the
  initial prompt since it'd 404 against the tasks table.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* test(trial): capability coverage for orchestrator VM agent boot

Adds trial-orchestrator-agent-boot.test.ts asserting the 3-step VM boot
pattern + ACP pending→assigned→running transitions + idempotency across
crash/retry. Updates trial-orchestrator-steps.test.ts for the new nodeId
requirement and adds mocks for node-agent/mcp-token/project-data services.

Also adds fetchDefaultBranch coverage (master, 404 fallback, network error
fallback, idempotent re-entry).

Post-mortem at docs/notes/2026-04-19-trial-orchestrator-agent-boot-postmortem.md.
Process fix: adds port-of-pattern coverage bullet to
.claude/rules/10-e2e-verification.md so a port of TaskRunner's agent-session
pattern into a new consumer must assert every step fired.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* task: archive trial orchestrator agent-boot task

* docs(trial): add CLAUDE.md Recent Changes + TRIAL_GITHUB_TIMEOUT_MS row

* fix(trial): persist defaultBranch before D1 insert + redact mcpToken in getStatus

Cloudflare-specialist review (HIGH): two fixes
1. handleProjectCreation now persists state.defaultBranch before the D1
   projects insert. Previously a crash between the D1 write and the DO
   state persist could cause a retry to re-probe GitHub and resolve a
   different branch than what had already landed in the projects row.
2. getStatus() now redacts the live mcpToken bearer credential before
   returning state to any debug/admin caller. The stale comment claiming
   the DO doesn't store secrets is corrected.

* fix(trial): revoke MCP token on failure + redaction test + review doc sync

Addresses Phase 5 reviewer findings from the trial-agent-boot PR:

security-auditor HIGH:
- Revoke state.mcpToken in failTrial() before emitting trial.error. Mirrors
  TaskRunner's state-machine.ts:265-275 pattern; closes the 4-hour TTL
  window where a leaked/botched-trial bearer token stays usable.
- Document the intentional non-revocation in handleRunning() — orchestrator
  terminates but the discovery agent still needs the token for MCP calls
  during the 20-min workspace TTL.
- Document the sentinel userId scoping limitation on resolveAnonymousUserId
  so future trial code remembers that per-user queries do NOT isolate
  trials from each other; projectId/trialId scoping is mandatory.

task-completion-validator MEDIUM:
- New test coverage for getStatus() mcpToken redaction (both populated and
  uninitialized state branches).
- New test coverage for failTrial revocation (happy path + KV-error tolerance).

doc-sync-validator HIGH:
- Add Trial Onboarding section to .claude/skills/env-reference/SKILL.md
  cross-referencing docs/guides/trial-configuration.md for the full table.

* fix(trial): allow multiple trials per repo (partial unique index)

The `(user_id, installation_id, repository)` unique index on `projects`
prevented more than one anonymous trial per public repo — every trial
after the first on the same repo hit a UNIQUE constraint failure during
the projects insert in TrialOrchestrator.handleProjectCreation. The DO
retried 6 times on alarm backoff then emitted a terminal `trial.error`
("step_failed"), so the user saw the 10% progress event repeat and then
fail.

Why it slipped through earlier reviews: the capability tests mock D1, so
no test exercised the real constraint. Staging verification only tested
a single trial per repo. This surfaced the moment a second trial on
`octocat/Hello-World` landed during Phase 6 verification.

Fix:
- Migration 0046 drops + recreates the index as a partial unique index
  that excludes the trial-sentinel user `system_anonymous_trials`. Real
  users still can't register duplicate project rows; sentinel-owned
  trial rows are isolated by `projectId` (per helpers.ts sentinel scope
  note).
- Drizzle schema updated with matching `.where()` clause so codegen and
  migration stay in sync.

Verified locally: trial-orchestrator tests pass (28/28); typecheck clean;
lint clean (no new warnings).

Co-Authored-By: Claude Opus 4.6 <[email protected]>

---------

Co-authored-by: Raphaël Titsworth-Morin <[email protected]>
Co-authored-by: Claude Opus 4.6 <[email protected]>
trial.ready is a provisioning milestone (workspace is up), not a signal
that discovery is complete. The discovery agent continues producing
trial.knowledge and trial.idea events after the workspace is provisioned.

Changes:
- Event bus: only auto-close on trial.error, not trial.ready
- Frontend: keep EventSource open after trial.ready with a 3-minute
  grace timer (TRIAL_DISCOVERY_STREAM_TIMEOUT_MS) for late-arriving
  discovery events
- Header shows "Discovering <repo>…" while stream is still open
  after trial.ready, then "Ready: <repo>" after stream closes

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…icons

- Add TrialAgentActivityEvent type and bridgeAgentActivity() to pipe
  agent messages/tool calls into the trial SSE stream
- Hook message persistence path to emit trial.agent_activity events
- Render agent activity cards in the feed (grouped, showing tool names)
- Replace misleading "Workspace ready — chat below" with informative
  message about agent analyzing repository
- Replace emoji icons (📎, ★) with lucide-react icons (BookOpen, Lightbulb,
  Brain, Wrench, Terminal) matching platform design
- Add auto-scroll to bottom on new events (scrollIntoView smooth)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Deduplicate consecutive progress events with the same stage in the
  feed — the orchestrator re-emits keepalive progress while waiting
  for the agent, creating visual spam (3x "Starting the agent" at 70%)
- Clean up agent activity text: strip XML tags, collapse JSON blobs,
  add line-clamp-2 for overflow
- Change "AGENT WORKING..." from uppercase to normal case
- Add cleanActivityText() helper for readable tool output summaries

Co-Authored-By: Claude Opus 4.6 <[email protected]>
raphaeltm and others added 3 commits April 20, 2026 20:52
…orker secrets

The Anthropic API key for the AI proxy should come from admin-managed
platform credentials (stored encrypted in D1 via /admin/platform-credentials),
not from a Worker secret. This aligns with the existing credential architecture
where admins configure shared keys through the UI.

The proxy now resolves the key by looking up a 'claude-code' platform
credential at request time. No new Worker secrets or deployment steps needed.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add admin UI tab "AI Proxy" where admins select the default model for
the platform inference proxy. Config is stored in KV so the default
can be changed without redeploying.

Model resolution priority: KV admin override > env var > shared constant.
Out-of-box default is a free Workers AI model (Llama 4 Scout 17B).
Anthropic models (Claude Haiku) are only selectable when an admin has
added a Claude Code platform credential on the Credentials tab.

- New API routes: GET/PUT/DELETE /api/admin/ai-proxy/config
- AI proxy route and runtime agent-key endpoint read default from KV
- Admin UI model picker with availability indicators
- Revert DEFAULT_AI_PROXY_MODEL to free Workers AI model
- File backlog idea for PLATFORM_TRIAL_ENABLED env var

Co-Authored-By: Claude Opus 4.6 <[email protected]>
raphaeltm and others added 2 commits April 21, 2026 01:17
Merge sam/trial-discovery-stream-fix into trial MVP branch, bringing:
- Auto-scroll to bottom on new events
- Agent activity cards grouped in feed with Lucide icons
- Progress card deduplication and text cleanup
- Stream stays open after trial.ready (agent continues producing events)
- Default model switched to Qwen 3 30B

Update trial-event-bus test to match new behavior: trial.ready no
longer closes the bus since the discovery agent continues producing
knowledge and idea events after workspace provisioning.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add AI usage section to the admin analytics dashboard, powered by
the AI Gateway Logs API. Shows token usage, estimated cost, trial
vs. authenticated breakdown, per-model metrics, and daily trends.

Backend:
- New admin endpoint GET /api/admin/analytics/ai-usage?period=7d
  queries AI Gateway logs with pagination and aggregates by model/day
- AI proxy now tags requests with projectId and trialId in
  cf-aig-metadata for trial usage attribution
- Configurable via AI_USAGE_PAGE_SIZE, AI_USAGE_MAX_PAGES env vars

Frontend:
- AIUsageChart component with KPI cards, stacked bar chart (tokens
  by model), daily usage area chart, and model breakdown table
- Integrated into admin analytics dashboard above DAU chart
- Graceful fallback if AI Gateway is not configured (catch + null)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…stics

The CF AI Gateway Logs API uses `order_by_direction` (not `direction`) for
sort order, and error responses now include the upstream body for easier
debugging.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The Cloudflare AI Gateway Logs API enforces a maximum per_page of 50.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
simple-agent-manager bot and others added 5 commits April 21, 2026 04:25
* fix(trial): address review findings from trial onboarding subagents

Security and correctness fixes from 7 specialist reviewers:

CRITICAL:
- Fix cookie domain mismatch: claim.ts clearClaimCookie and oauth-hook.ts
  buildClaimCookie now pass domain from BASE_DOMAIN (matching create.ts)

HIGH:
- TrialEventBus DO: persist `closed` flag to storage so it survives eviction
- AI proxy: sanitize error bodies — log raw errors server-side, return generic
  messages to clients (prevents internal URL/config leakage)
- Admin AI usage: sanitize CF API error responses the same way
- SSE events endpoint: add per-IP rate limiting (30 req/5min via KV)
- Deploy pipeline: forward ANTHROPIC_API_KEY_TRIAL as optional Worker secret
- sync-wrangler-config: inject ENVIRONMENT var into generated env sections
- Remove hardcoded DEFAULT_GATEWAY_ID; require AI_GATEWAY_ID from env

MEDIUM:
- Cron collision: move trial counter rollover from 03:00 to 05:00 UTC
  (avoids collision with daily analytics forward job at 03:00)
- Replace magic number in create.ts with DEFAULT_TRIAL_CLAIM_TTL_MS constant
- Add trial secrets to secrets-taxonomy.md and trial-configuration.md
- Add comprehensive trial + AI proxy env vars to .env.example
- Fix test mocks: add ctx.storage to TrialEventBus tests, add KV to SSE tests

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(trial): address CTO review — 6 quality improvements

1. Reject unknown IP: SSE rate limit now returns 400 when no client IP
   header is present, instead of sharing a single "unknown" bucket across
   all headerless clients. CF-Connecting-IP is always present on Workers.

2. Document KV rate limit trade-off: added inline comment explaining why
   KV's non-atomic read-modify-write is acceptable here (storm prevention,
   not exact enforcement) vs DO-based counters for credential rotation.

3. Clean up formatSse: removed unused _eventName parameter that gave the
   false impression the event name was being used. Updated all call sites
   and tests.

4. Cookie domain consistency test: new regression test suite asserting
   that buildClaimCookie, clearClaimCookie, and buildFingerprintCookie
   produce matching Domain= attributes. Explicitly demonstrates the bug
   where clearing without a domain fails to delete a domain-scoped cookie.

5. AI_GATEWAY_ID self-hoster safe: returns an empty summary (zero counts)
   when AI_GATEWAY_ID is not configured, instead of throwing. Self-hosters
   who don't use AI Gateway get a clean "no data" admin dashboard.

6. Fix .env.example cron default: TRIAL_CRON_ROLLOVER_CRON now shows
   "0 5 1 * *" matching the actual default after the collision fix.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

---------

Co-authored-by: Raphaël Titsworth-Morin <[email protected]>
Co-authored-by: Claude Opus 4.6 <[email protected]>
Resolves package.json version conflict (take main's newer deps) and
fixes simple-import-sort/exports error in packages/shared/src/constants/index.ts.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Autofix export sort in apps/web/src/lib/api/index.ts
- Move useMemo before early return in AIUsageChart (rules-of-hooks)
- Prefix unused anthropicModels with _ in staging test
- Add FILE SIZE EXCEPTION comments for TryDiscovery.tsx and steps.ts

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
6 Security Hotspots
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@simple-agent-manager simple-agent-manager bot merged commit 1f92ecf into main Apr 21, 2026
16 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-human-review Agent could not complete all review gates — human must approve before merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant