diff --git a/.github/workflows/deploy-reusable.yml b/.github/workflows/deploy-reusable.yml index 7c7a2e30f..9b9951f0c 100644 --- a/.github/workflows/deploy-reusable.yml +++ b/.github/workflows/deploy-reusable.yml @@ -448,6 +448,7 @@ jobs: R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }} R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }} SMOKE_TEST_AUTH_ENABLED: ${{ secrets.SMOKE_TEST_AUTH_ENABLED }} + ANTHROPIC_API_KEY_TRIAL: ${{ secrets.ANTHROPIC_API_KEY_TRIAL }} SEGMENT_WRITE_KEY: ${{ secrets.SEGMENT_WRITE_KEY }} GA4_API_SECRET: ${{ secrets.GA4_API_SECRET }} GA4_MEASUREMENT_ID: ${{ secrets.GA4_MEASUREMENT_ID }} diff --git a/CLAUDE.md b/CLAUDE.md index cb0ac734e..38b252eac 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -221,6 +221,7 @@ Domains chain together: competitive research feeds marketing and business strate - Cloudflare D1 (credentials table with AES-GCM encrypted tokens) (028-provider-infrastructure) ## Recent Changes +- ai-proxy-gateway: AI inference proxy routes LLM requests through Cloudflare AI Gateway — `POST /ai/v1/chat/completions` accepts OpenAI-format requests, transparently routes to Workers AI (@cf/* models) or Anthropic (claude-* models) with format translation (`ai-anthropic-translate.ts`); per-user RPM rate limiting + daily token budget via KV; admin model picker at `/admin/ai-proxy`; AI usage analytics dashboard at `/admin/analytics/ai-usage` aggregates AI Gateway logs by model, day, cost; configurable via AI_PROXY_ENABLED, AI_PROXY_DEFAULT_MODEL, AI_GATEWAY_ID, AI_PROXY_ALLOWED_MODELS, AI_PROXY_RATE_LIMIT_RPM, AI_PROXY_RATE_LIMIT_WINDOW_SECONDS, AI_PROXY_MAX_INPUT_TOKENS_PER_REQUEST, AI_USAGE_PAGE_SIZE, AI_USAGE_MAX_PAGES - trial-agent-boot: TrialOrchestrator `discovery_agent_start` step now runs the full 5-step idempotent VM boot (registers agent session via `createAgentSessionOnNode`, mints MCP token with trialId as synthetic taskId, `startAgentSessionOnNode` with discovery prompt + MCP server URL, drives ACP session `pending → assigned → running`; idempotency flags `mcpToken`, `agentSessionCreatedOnVm`, `agentStartedOnVm`, `acpAssignedOnVm`, `acpRunningOnVm` on DO state let crash/retry resume without double-booking); new `fetchDefaultBranch()` probes GitHub `/repos/:owner/:repo` with AbortController-bounded fetch and threads the real default branch through `projects.defaultBranch` + workspace `git clone --branch` (master-default repos like `octocat/Hello-World` now work); configurable via TRIAL_GITHUB_TIMEOUT_MS (default: 5000); new capability test `apps/api/tests/unit/durable-objects/trial-orchestrator-agent-boot.test.ts` asserts every cross-boundary call fires with correct payload; rule 10 updated with port-of-pattern coverage requirement. See `docs/notes/2026-04-19-trial-orchestrator-agent-boot-postmortem.md`. - trial-sse-events-fix: Fixed "zero trial.* events on staging" — `formatSse()` in `apps/api/src/routes/trial/events.ts` previously emitted named SSE frames (`event: trial.knowledge\ndata: {...}`), but the frontend subscribes via `source.onmessage` which only fires for the default (unnamed) event; frames arrived on the wire (curl saw them) but browser EventSource silently dropped them. Now emits unnamed `data: {JSON}\n\n` frames; the `TrialEvent` payload's own `type` discriminator preserves dispatch info. Also fixed `eventsUrl` in `apps/api/src/routes/trial/create.ts` response shape mismatch (`/api/trial/events?trialId=X` → `/api/trial/:trialId/events`). New capability test `apps/api/tests/workers/trial-event-bus-sse.test.ts` asserts no `event:` line + JSON round-trip across the TrialEventBus DO → SSE endpoint boundary; unit tests updated to assert new unnamed-frame contract and exact `eventsUrl` shape (no substring matches on URL contracts). Rule 13 updated to ban curl-only verification of browser-consumed SSE/WebSocket streams — curl confirms bytes, browsers confirm dispatch. See `docs/notes/2026-04-19-trial-sse-named-events-postmortem.md`. - trial-orchestrator-wire-up: TrialOrchestrator Durable Object + GitHub-API knowledge fast-path — `POST /api/trial/create` now fire-and-forget dispatches two concurrent `c.executionCtx.waitUntil` tasks: (1) `env.TRIAL_ORCHESTRATOR.idFromName(trialId)` DO state machine (alarm-driven, steps: project_creation → node_provisioning → workspace_creation → workspace_ready → agent_session → completed; idempotent `start()`; terminal guard on completed/failed; overall-timeout emits `trial.error`); (2) `emitGithubKnowledgeEvents()` probe hits unauthenticated `/repos/:o/:n`, `/repos/:o/:n/languages`, `/repos/:o/:n/readme` in parallel with AbortController-bounded fetches, emits up to `TRIAL_KNOWLEDGE_MAX_EVENTS` `trial.knowledge` events (description, primary language, stars, topics, license, language breakdown by bytes, README first paragraph), swallows all errors; `apps/api/src/services/trial/bridge.ts` bridges ACP session transitions (`running` → `trial.ready`, `failed` → `trial.error`) and MCP tool calls (`add_knowledge` → `trial.knowledge`, `create_idea` → `trial.idea`) into the SSE stream via `readTrialByProject()` KV lookup (no-op on non-trial projects); new sentinel `TRIAL_ANONYMOUS_INSTALLATION_ID` row in `github_installations` so trial projects satisfy the FK; configurable via TRIAL_ORCHESTRATOR_OVERALL_TIMEOUT_MS (default: 300000), TRIAL_ORCHESTRATOR_STEP_MAX_RETRIES (default: 5), TRIAL_ORCHESTRATOR_RETRY_BASE_DELAY_MS (default: 1000), TRIAL_ORCHESTRATOR_RETRY_MAX_DELAY_MS (default: 60000), TRIAL_ORCHESTRATOR_NODE_READY_TIMEOUT_MS (default: 180000), TRIAL_ORCHESTRATOR_AGENT_READY_TIMEOUT_MS (default: 60000), TRIAL_ORCHESTRATOR_WORKSPACE_READY_TIMEOUT_MS (default: 180000), TRIAL_ORCHESTRATOR_WORKSPACE_READY_POLL_INTERVAL_MS (default: 5000), TRIAL_VM_SIZE (default: DEFAULT_VM_SIZE), TRIAL_VM_LOCATION (default: DEFAULT_VM_LOCATION), TRIAL_KNOWLEDGE_GITHUB_TIMEOUT_MS (default: 5000), TRIAL_KNOWLEDGE_MAX_EVENTS (default: 10) diff --git a/apps/api/.env.example b/apps/api/.env.example index 1e863b5bf..5e729bf64 100644 --- a/apps/api/.env.example +++ b/apps/api/.env.example @@ -46,6 +46,58 @@ BASE_DOMAIN=workspaces.example.com # MAX_SMOKE_TOKENS_PER_USER=10 # MAX_SMOKE_TOKEN_NAME_LENGTH=100 +# ======================================== +# Trial Onboarding +# ======================================== +# See docs/guides/trial-configuration.md for the full tunable list. +# TRIAL_MONTHLY_CAP=1500 +# TRIAL_WORKSPACE_TTL_MS=1200000 # 20 min +# TRIAL_DATA_RETENTION_HOURS=168 # 7 days +# TRIAL_ANONYMOUS_USER_ID=system_anonymous_trials +# TRIAL_ANONYMOUS_INSTALLATION_ID=1 +# TRIAL_AGENT_TYPE_STAGING=opencode +# TRIAL_AGENT_TYPE_PRODUCTION=claude-code +# TRIAL_DEFAULT_WORKSPACE_PROFILE=lightweight +# TRIAL_VM_SIZE=cax11 +# TRIAL_VM_LOCATION=fsn1 +# TRIALS_ENABLED_KV_KEY=trials:enabled +# TRIAL_REPO_MAX_KB=102400 +# TRIAL_COUNTER_KEEP_MONTHS=6 +# TRIAL_KILL_SWITCH_CACHE_MS=30000 +# TRIAL_GITHUB_TIMEOUT_MS=10000 +# TRIAL_WAITLIST_PURGE_DAYS=90 +# TRIAL_MODEL=@cf/qwen/qwen3-30b-a3b-fp8 # Model for trial conversations +# TRIAL_LLM_PROVIDER=workers-ai # "workers-ai" or "anthropic" +# ANTHROPIC_API_KEY_TRIAL= # Required only when TRIAL_LLM_PROVIDER=anthropic +# ENVIRONMENT=staging # "staging" or "production" — set by deploy pipeline +# TRIAL_SSE_HEARTBEAT_MS=15000 +# TRIAL_SSE_POLL_TIMEOUT_MS=15000 +# TRIAL_SSE_MAX_DURATION_MS=1800000 +# TRIAL_CRON_ROLLOVER_CRON="0 5 1 * *" +# TRIAL_CRON_WAITLIST_CLEANUP="0 4 * * *" +# TRIAL_KNOWLEDGE_GITHUB_TIMEOUT_MS=10000 +# TRIAL_KNOWLEDGE_MAX_EVENTS=50 +# TRIAL_ORCHESTRATOR_NODE_WAIT_MS=300000 +# TRIAL_ORCHESTRATOR_WORKSPACE_WAIT_MS=300000 +# TRIAL_ORCHESTRATOR_AGENT_BOOT_TIMEOUT_MS=120000 +# TRIAL_ORCHESTRATOR_STEP_RETRY_MAX=3 +# TRIAL_ORCHESTRATOR_STEP_RETRY_DELAY_MS=5000 +# RATE_LIMIT_TRIAL_CREATE=10 +# RATE_LIMIT_TRIAL_SSE=30 + +# ======================================== +# AI Inference Proxy (Cloudflare AI Gateway) +# ======================================== +# AI_PROXY_ENABLED=true +# AI_PROXY_DEFAULT_MODEL=@cf/qwen/qwen3-30b-a3b-fp8 +# AI_GATEWAY_ID=sam +# AI_PROXY_ALLOWED_MODELS= # Comma-separated; defaults in shared/constants +# AI_PROXY_RATE_LIMIT_RPM=60 +# AI_PROXY_RATE_LIMIT_WINDOW_SECONDS=60 +# AI_PROXY_MAX_INPUT_TOKENS_PER_REQUEST=4096 +# AI_USAGE_PAGE_SIZE=50 +# AI_USAGE_MAX_PAGES=20 + # Pages project names (for Worker proxy routing) # PAGES_PROJECT_NAME=sam-web-prod # WWW_PAGES_PROJECT_NAME=sam-www diff --git a/apps/api/src/durable-objects/trial-event-bus.ts b/apps/api/src/durable-objects/trial-event-bus.ts index 8f2bfc4d4..84b2fa226 100644 --- a/apps/api/src/durable-objects/trial-event-bus.ts +++ b/apps/api/src/durable-objects/trial-event-bus.ts @@ -46,12 +46,26 @@ export class TrialEventBus extends DurableObject { private buffer: BufferedEvent[] = []; private nextCursor = 1; private closed = false; + private closedLoaded = false; private waiters: Set = new Set(); constructor(ctx: DurableObjectState, env: Env) { super(ctx, env); } + /** + * Lazily load the `closed` flag from DO storage. In-memory state is lost on + * eviction; persisting this single flag ensures SSE consumers see `closed: true` + * even after the DO is re-instantiated (the event buffer is intentionally NOT + * persisted — short-lived trial streams don't need full replay). + */ + private async ensureClosedLoaded(): Promise { + if (this.closedLoaded) return; + const stored = await this.ctx.storage.get('closed'); + if (stored === true) this.closed = true; + this.closedLoaded = true; + } + async fetch(req: Request): Promise { const url = new URL(req.url); const path = url.pathname; @@ -73,6 +87,7 @@ export class TrialEventBus extends DurableObject { // ------------------------------------------------------------------------- private async handleAppend(req: Request): Promise { + await this.ensureClosedLoaded(); log.info('trial_event_bus.handleAppend.enter', { closed: this.closed, bufferLen: this.buffer.length, @@ -108,6 +123,7 @@ export class TrialEventBus extends DurableObject { // would reject those late-arriving events with 409. if (event.type === 'trial.error') { this.closed = true; + await this.ctx.storage.put('closed', true); } this.wakeWaiters(); @@ -119,6 +135,7 @@ export class TrialEventBus extends DurableObject { // ------------------------------------------------------------------------- private async handlePoll(url: URL): Promise { + await this.ensureClosedLoaded(); const cursor = Number.parseInt(url.searchParams.get('cursor') ?? '0', 10) || 0; const requestedTimeout = Number.parseInt( url.searchParams.get('timeoutMs') ?? String(DEFAULT_POLL_TIMEOUT_MS), @@ -164,6 +181,7 @@ export class TrialEventBus extends DurableObject { private async handleClose(): Promise { this.closed = true; + await this.ctx.storage.put('closed', true); this.wakeWaiters(); return Response.json({ closed: true }); } diff --git a/apps/api/src/env.ts b/apps/api/src/env.ts index 8851cda2c..32ce01390 100644 --- a/apps/api/src/env.ts +++ b/apps/api/src/env.ts @@ -463,7 +463,7 @@ export interface Env { TRIAL_GITHUB_TIMEOUT_MS?: string; // Timeout for GitHub repo metadata probe (default: 5000) TRIAL_COUNTER_KEEP_MONTHS?: string; // Months of counter rows to retain in DO (default: 3) TRIAL_WAITLIST_PURGE_DAYS?: string; // Days after reset_date before notified waitlist rows are purged (default: 30) - TRIAL_CRON_ROLLOVER_CRON?: string; // Cron expression used by the monthly rollover audit (default: 0 3 1 * *) + TRIAL_CRON_ROLLOVER_CRON?: string; // Cron expression used by the monthly rollover audit (default: 0 5 1 * *) TRIAL_CRON_WAITLIST_CLEANUP?: string; // Cron expression used by the daily waitlist cleanup (default: 0 4 * * *) TRIAL_SSE_HEARTBEAT_MS?: string; // SSE comment heartbeat cadence (default: 15000) TRIAL_SSE_POLL_TIMEOUT_MS?: string; // Long-poll timeout per DO fetch (default: 15000) diff --git a/apps/api/src/index.ts b/apps/api/src/index.ts index 59e343cf8..92343213b 100644 --- a/apps/api/src/index.ts +++ b/apps/api/src/index.ts @@ -452,7 +452,7 @@ export default { env: Env, ctx: ExecutionContext ): Promise { - const rolloverCron = env.TRIAL_CRON_ROLLOVER_CRON ?? '0 3 1 * *'; + const rolloverCron = env.TRIAL_CRON_ROLLOVER_CRON ?? '0 5 1 * *'; const waitlistCleanupCron = env.TRIAL_CRON_WAITLIST_CLEANUP ?? '0 4 * * *'; const isDailyForward = controller.cron === '0 3 * * *'; diff --git a/apps/api/src/middleware/rate-limit.ts b/apps/api/src/middleware/rate-limit.ts index 1ca16e0d5..34f6ce1de 100644 --- a/apps/api/src/middleware/rate-limit.ts +++ b/apps/api/src/middleware/rate-limit.ts @@ -40,6 +40,8 @@ export const DEFAULT_RATE_LIMITS = { // Tighter limit for anonymous trial creation: each call spawns a DO, // fires ~4 GitHub API calls, and consumes a monthly trial slot. TRIAL_CREATE: 10, + // SSE events endpoint — short window to prevent connection storms. + TRIAL_SSE: 30, } as const; /** Default time window (1 hour in seconds) */ diff --git a/apps/api/src/routes/admin-ai-usage.ts b/apps/api/src/routes/admin-ai-usage.ts index a7a871474..98fd52562 100644 --- a/apps/api/src/routes/admin-ai-usage.ts +++ b/apps/api/src/routes/admin-ai-usage.ts @@ -11,8 +11,6 @@ import { log } from '../lib/logger'; import { requireApproved, requireAuth, requireSuperadmin } from '../middleware/auth'; import { errors } from '../middleware/error'; -/** Default AI Gateway ID. Override via AI_GATEWAY_ID env var. */ -const DEFAULT_GATEWAY_ID = 'sam'; /** Default number of log entries to fetch per page. Override via AI_USAGE_PAGE_SIZE env var. CF max is 50. */ const DEFAULT_PAGE_SIZE = 50; /** Maximum pages to iterate when aggregating. Override via AI_USAGE_MAX_PAGES env var. */ @@ -97,7 +95,7 @@ async function fetchGatewayLogs( body: body.slice(0, 500), url: url.replace(/Bearer\s+\S+/, 'Bearer [REDACTED]'), }); - throw errors.internal(`AI Gateway API error: ${resp.status} — ${body.slice(0, 200)}`); + throw errors.internal(`AI Gateway API error (${resp.status}). Check admin logs for details.`); } return resp.json() as Promise; @@ -147,7 +145,24 @@ adminAiUsageRoutes.use('/*', requireAuth(), requireApproved(), requireSuperadmin */ adminAiUsageRoutes.get('/', async (c) => { const period = parsePeriod(c.req.query('period')); - const gatewayId = c.env.AI_GATEWAY_ID || DEFAULT_GATEWAY_ID; + const gatewayId = c.env.AI_GATEWAY_ID; + if (!gatewayId) { + // AI Gateway is optional — self-hosters who don't use it get an empty summary + // instead of a 500. The admin dashboard renders "no data" gracefully. + return c.json({ + totalRequests: 0, + totalInputTokens: 0, + totalOutputTokens: 0, + totalCostUsd: 0, + trialRequests: 0, + trialCostUsd: 0, + cachedRequests: 0, + errorRequests: 0, + byModel: [], + byDay: [], + period, + } satisfies AiUsageSummary); + } const pageSize = parseInt(c.env.AI_USAGE_PAGE_SIZE || '', 10) || DEFAULT_PAGE_SIZE; const maxPages = parseInt(c.env.AI_USAGE_MAX_PAGES || '', 10) || DEFAULT_MAX_PAGES; const startDate = periodToStartDate(period); diff --git a/apps/api/src/routes/ai-proxy.ts b/apps/api/src/routes/ai-proxy.ts index 051cb9b8d..27edb4c7e 100644 --- a/apps/api/src/routes/ai-proxy.ts +++ b/apps/api/src/routes/ai-proxy.ts @@ -159,9 +159,13 @@ async function forwardToWorkersAI( if (!response.ok) { const errorText = await response.text(); + log.error('ai_proxy.workers_ai_error', { + status: response.status, + body: errorText.slice(0, 500), + }); return new Response(JSON.stringify({ error: { - message: `AI Gateway error (${response.status}): ${errorText.slice(0, 200)}`, + message: `AI inference failed (${response.status}). Please try again.`, type: 'server_error', }, }), { status: response.status, headers: { 'Content-Type': 'application/json' } }); @@ -206,9 +210,13 @@ async function forwardToAnthropic( if (!response.ok) { const errorText = await response.text(); + log.error('ai_proxy.anthropic_error', { + status: response.status, + body: errorText.slice(0, 500), + }); return new Response(JSON.stringify({ error: { - message: `Anthropic API error (${response.status}): ${errorText.slice(0, 200)}`, + message: `AI inference failed (${response.status}). Please try again.`, type: 'server_error', }, }), { status: response.status, headers: { 'Content-Type': 'application/json' } }); diff --git a/apps/api/src/routes/trial/claim.ts b/apps/api/src/routes/trial/claim.ts index 56d32012d..b81a53152 100644 --- a/apps/api/src/routes/trial/claim.ts +++ b/apps/api/src/routes/trial/claim.ts @@ -112,11 +112,12 @@ claimRoutes.post('/claim', requireAuth(), async (c) => { const claimedAt = Date.now(); log.info('trial_claim.success', { trialId, projectId, userId, claimedAt }); - // Clear the claim cookie. Domain attribute mirrors what was set by the - // OAuth-callback issuer; we use host-only (no Domain) by default — safe for - // app.${BASE_DOMAIN}. + // Clear the claim cookie. Domain attribute MUST match what was set when the + // cookie was issued (`.BASE_DOMAIN`), otherwise the browser treats them as + // different cookies and the original is never deleted — enabling replay. + const cookieDomain = c.env.BASE_DOMAIN ? `.${c.env.BASE_DOMAIN}` : undefined; const response: TrialClaimResponse = { projectId, claimedAt }; - c.header('Set-Cookie', clearClaimCookie()); + c.header('Set-Cookie', clearClaimCookie({ domain: cookieDomain })); return c.json(response, 200); }); diff --git a/apps/api/src/routes/trial/create.ts b/apps/api/src/routes/trial/create.ts index 3840cacc7..249c69aeb 100644 --- a/apps/api/src/routes/trial/create.ts +++ b/apps/api/src/routes/trial/create.ts @@ -38,6 +38,7 @@ import { rateLimitTrialCreate } from '../../middleware/rate-limit'; import { buildClaimCookie, buildFingerprintCookie, + DEFAULT_TRIAL_CLAIM_TTL_MS, signClaimToken, signFingerprint, type TrialClaimPayload, @@ -381,7 +382,7 @@ createRoutes.post('/create', async (c) => { // bound to the trialId alone, which is sufficient for claim validation. projectId: '', issuedAt: now, - expiresAt: now + 1000 * 60 * 60 * 48, // 48h + expiresAt: now + DEFAULT_TRIAL_CLAIM_TTL_MS, }; const claimSigned = await signClaimToken(claimPayload, secret); diff --git a/apps/api/src/routes/trial/events.ts b/apps/api/src/routes/trial/events.ts index e01d6a316..b7dec6656 100644 --- a/apps/api/src/routes/trial/events.ts +++ b/apps/api/src/routes/trial/events.ts @@ -21,6 +21,12 @@ import { Hono } from 'hono'; import type { Env } from '../../env'; import { log } from '../../lib/logger'; import { errors } from '../../middleware/error'; +import { + checkRateLimit, + createRateLimitKey, + getCurrentWindowStart, + getRateLimit, +} from '../../middleware/rate-limit'; import { verifyFingerprint } from '../../services/trial/cookies'; import { readTrial } from '../../services/trial/trial-store'; @@ -29,11 +35,36 @@ const eventsRoutes = new Hono<{ Bindings: Env }>(); const DEFAULT_HEARTBEAT_MS = 15_000; const DEFAULT_POLL_TIMEOUT_MS = 15_000; const DEFAULT_MAX_DURATION_MS = 30 * 60 * 1000; // 30 min +const SSE_RATE_LIMIT_WINDOW_SECONDS = 300; // 5-minute window eventsRoutes.get('/:trialId/events', async (c) => { const trialId = c.req.param('trialId'); if (!trialId) throw errors.badRequest('trialId is required'); + // Rate limit: per-IP to prevent SSE connection storms from a single source. + // CF-Connecting-IP is always present on Cloudflare Workers for real traffic; + // X-Forwarded-For covers reverse-proxy setups. If neither is present, reject + // rather than sharing a single "unknown" bucket across all headerless clients. + const clientIp = c.req.header('CF-Connecting-IP') + ?? c.req.header('X-Forwarded-For')?.split(',')[0]?.trim() + ?? null; + if (!clientIp) { + log.warn('trial_events.missing_client_ip', { trialId }); + return c.json({ error: 'Unable to determine client IP for rate limiting.' }, 400); + } + // KV-based rate limiting is not atomic (no CAS primitive), so concurrent + // requests from the same IP can overshoot by ~1. This is acceptable for SSE + // storm prevention — exact enforcement is not required. For strict limits + // (e.g. credential rotation), the codebase uses DO-based counters instead + // (see CodexRefreshLock). + const sseLimit = getRateLimit(c.env, 'TRIAL_SSE'); + const windowStart = getCurrentWindowStart(SSE_RATE_LIMIT_WINDOW_SECONDS); + const rateLimitKey = createRateLimitKey('trial-sse', clientIp, windowStart); + const { allowed } = await checkRateLimit(c.env.KV, rateLimitKey, sseLimit, SSE_RATE_LIMIT_WINDOW_SECONDS); + if (!allowed) { + return c.json({ error: 'Too many SSE connections. Please try again later.' }, 429); + } + // Resolve trial record const record = await readTrial(c.env, trialId); if (!record) throw errors.notFound('Trial'); @@ -106,7 +137,7 @@ eventsRoutes.get('/:trialId/events', async (c) => { } if (!pollResp.ok) { - enqueue(encoder.encode(formatSse('error', { + enqueue(encoder.encode(formatSse({ type: 'trial.error', error: 'invalid_url', message: `Event bus poll failed: ${pollResp.status}`, @@ -122,7 +153,7 @@ eventsRoutes.get('/:trialId/events', async (c) => { }; for (const { cursor: c2, event } of data.events) { - enqueue(encoder.encode(formatSse(event.type, event))); + enqueue(encoder.encode(formatSse(event))); cursor = c2; } @@ -179,7 +210,7 @@ function parseIntSafe(value: string | undefined, fallback: number): number { return Number.isFinite(n) && n > 0 ? n : fallback; } -export function formatSse(_eventName: string, data: unknown): string { +export function formatSse(data: unknown): string { // Emit as the default ("message") SSE event so that EventSource consumers // receive the payload via `source.onmessage`. Using a named `event:` field // would require the client to register `addEventListener(, ...)` for diff --git a/apps/api/src/services/trial/oauth-hook.ts b/apps/api/src/services/trial/oauth-hook.ts index 2b610052c..7f2a06d18 100644 --- a/apps/api/src/services/trial/oauth-hook.ts +++ b/apps/api/src/services/trial/oauth-hook.ts @@ -68,7 +68,8 @@ export async function maybeAttachTrialClaimCookie( expiresAt: now + DEFAULT_TRIAL_CLAIM_TTL_MS, }; const token = await signClaimToken(payload, secret); - const cookie = buildClaimCookie(token); + const cookieDomain = env.BASE_DOMAIN ? `.${env.BASE_DOMAIN}` : undefined; + const cookie = buildClaimCookie(token, { domain: cookieDomain }); // Rewrite the redirect Location to the app's claim landing page. const baseDomain = env.BASE_DOMAIN; diff --git a/apps/api/tests/unit/durable-objects/trial-event-bus.test.ts b/apps/api/tests/unit/durable-objects/trial-event-bus.test.ts index e007fc3ed..177494f14 100644 --- a/apps/api/tests/unit/durable-objects/trial-event-bus.test.ts +++ b/apps/api/tests/unit/durable-objects/trial-event-bus.test.ts @@ -33,7 +33,13 @@ vi.mock('cloudflare:workers', () => ({ const { TrialEventBus } = await import('../../../src/durable-objects/trial-event-bus'); function makeDO() { - const ctx = {} as unknown; + const store = new Map(); + const ctx = { + storage: { + get: async (key: string): Promise => store.get(key) as T | undefined, + put: async (key: string, value: unknown): Promise => { store.set(key, value); }, + }, + } as unknown; const env = {} as unknown; // eslint-disable-next-line @typescript-eslint/no-explicit-any return new TrialEventBus(ctx as any, env as any); diff --git a/apps/api/tests/unit/routes/trial-events-format.test.ts b/apps/api/tests/unit/routes/trial-events-format.test.ts index 9fda4f6f8..9d3324129 100644 --- a/apps/api/tests/unit/routes/trial-events-format.test.ts +++ b/apps/api/tests/unit/routes/trial-events-format.test.ts @@ -22,7 +22,7 @@ import { formatSse } from '../../../src/routes/trial/events'; describe('formatSse()', () => { it('produces an unnamed SSE frame so EventSource.onmessage fires', () => { - const frame = formatSse('trial.ready', { type: 'trial.ready', at: 123 }); + const frame = formatSse({ type: 'trial.ready', at: 123 }); expect(frame).toBe('data: {"type":"trial.ready","at":123}\n\n'); // No `event:` line — otherwise the client has to register listeners // per event type and `onmessage` silently never fires. @@ -30,14 +30,14 @@ describe('formatSse()', () => { }); it('json-encodes the data payload (newlines inside strings are escaped)', () => { - const frame = formatSse('trial.log', { msg: 'line1\nline2' }); + const frame = formatSse({ msg: 'line1\nline2' }); // JSON.stringify escapes the embedded newline to \n (literal backslash n), // preventing the payload from terminating the SSE frame early. expect(frame).toBe('data: {"msg":"line1\\nline2"}\n\n'); }); it('frame shape is exactly one data line plus the blank terminator', () => { - const frame = formatSse('trial.knowledge', { + const frame = formatSse({ type: 'trial.knowledge', key: 'description', value: 'hello', diff --git a/apps/api/tests/unit/routes/trial-events.test.ts b/apps/api/tests/unit/routes/trial-events.test.ts index 11916c581..39ba4449e 100644 --- a/apps/api/tests/unit/routes/trial-events.test.ts +++ b/apps/api/tests/unit/routes/trial-events.test.ts @@ -63,6 +63,7 @@ function makeEnvWithDO( return new Response('not found', { status: 404 }); }), }; + const kvStore = new Map(); return { TRIAL_CLAIM_TOKEN_SECRET: SECRET, TRIAL_SSE_HEARTBEAT_MS: '60000', @@ -72,6 +73,10 @@ function makeEnvWithDO( idFromName: vi.fn(() => 'do-id'), get: vi.fn(() => stub), }, + KV: { + get: vi.fn(async (key: string) => kvStore.get(key) ?? null), + put: vi.fn(async (key: string, value: string) => { kvStore.set(key, value); }), + }, ...overrides, } as unknown as Env; } @@ -82,7 +87,9 @@ async function getEvents( cookie: string | null, env: Env ): Promise { - const headers: Record = {}; + const headers: Record = { + 'CF-Connecting-IP': '203.0.113.1', // test IP for rate limiting + }; if (cookie) headers['cookie'] = `sam_trial_fingerprint=${encodeURIComponent(cookie)}`; return app.request( `/api/trial/${trialId}/events`, diff --git a/apps/api/tests/unit/services/trial-cookies.test.ts b/apps/api/tests/unit/services/trial-cookies.test.ts index e204b5b17..3f6d65525 100644 --- a/apps/api/tests/unit/services/trial-cookies.test.ts +++ b/apps/api/tests/unit/services/trial-cookies.test.ts @@ -242,3 +242,73 @@ describe('trial cookies — cookie string builders', () => { expect(c).toContain('Secure'); }); }); + +// --------------------------------------------------------------------------- +// Cookie domain consistency (regression test for CRITICAL cookie domain bug) +// --------------------------------------------------------------------------- +// The browser treats cookies with different Domain attributes as distinct. +// If create.ts sets `Domain=.example.com` but claim.ts clears without a Domain, +// the original cookie is never deleted — enabling replay attacks. This test +// asserts the invariant that all three cookie call sites produce matching Domain +// attributes. +// See: clearClaimCookie domain mismatch fix in claim.ts / oauth-hook.ts. + +describe('trial cookies — domain consistency invariant', () => { + const domain = '.example.com'; + + it('buildClaimCookie includes Domain when provided', () => { + const c = buildClaimCookie('tok', { domain }); + expect(c).toContain(`Domain=${domain}`); + }); + + it('clearClaimCookie includes the same Domain when provided', () => { + const c = clearClaimCookie({ domain }); + expect(c).toContain(`Domain=${domain}`); + }); + + it('buildFingerprintCookie includes Domain when provided', () => { + const c = buildFingerprintCookie('uuid.sig', { domain }); + expect(c).toContain(`Domain=${domain}`); + }); + + it('clearClaimCookie WITHOUT domain does NOT match a domain-scoped cookie', () => { + // This is the exact bug: if you set with Domain=.example.com but clear + // without Domain, the browser sees them as different cookies. + const setCookie = buildClaimCookie('tok', { domain }); + const clearCookie = clearClaimCookie(); // no domain — the old buggy path + + // Extract Domain= from each + const setDomain = setCookie.match(/Domain=[^;]+/)?.[0]; + const clearDomain = clearCookie.match(/Domain=[^;]+/)?.[0]; + + expect(setDomain).toBe(`Domain=${domain}`); + expect(clearDomain).toBeUndefined(); // no Domain in the clear cookie + // Therefore these are DIFFERENT cookies from the browser's perspective. + // The fix: always pass the same domain to clearClaimCookie. + expect(setDomain).not.toBe(clearDomain); + }); + + it('clearClaimCookie WITH domain matches the set cookie domain', () => { + // This is the fixed path: both set and clear use the same domain. + const setCookie = buildClaimCookie('tok', { domain }); + const clearCookie = clearClaimCookie({ domain }); + + const setDomain = setCookie.match(/Domain=[^;]+/)?.[0]; + const clearDomain = clearCookie.match(/Domain=[^;]+/)?.[0]; + + expect(setDomain).toBe(`Domain=${domain}`); + expect(clearDomain).toBe(`Domain=${domain}`); + // Same Domain → browser treats them as the same cookie → clear works. + }); + + it('all three builders omit Domain when no domain is provided', () => { + // Without a domain, all cookies are host-only — consistent by omission. + const claim = buildClaimCookie('tok'); + const clear = clearClaimCookie(); + const fp = buildFingerprintCookie('uuid.sig'); + + expect(claim).not.toContain('Domain='); + expect(clear).not.toContain('Domain='); + expect(fp).not.toContain('Domain='); + }); +}); diff --git a/apps/api/wrangler.toml b/apps/api/wrangler.toml index ce5d6f58c..9b5ec67e8 100644 --- a/apps/api/wrangler.toml +++ b/apps/api/wrangler.toml @@ -180,13 +180,14 @@ binding = "AI" # - Daily at 03:00 UTC: analytics event forwarding to external platforms (Phase 4) # - Daily at 04:00 UTC: trial waitlist purge (rows with notified_at older than # TRIAL_WAITLIST_PURGE_DAYS). Configurable via TRIAL_CRON_WAITLIST_CLEANUP. -# - Monthly on the 1st at 03:00 UTC: trial counter rollover audit (prune DO rows beyond +# - Monthly on the 1st at 05:00 UTC: trial counter rollover audit (prune DO rows beyond # TRIAL_COUNTER_KEEP_MONTHS). Configurable via TRIAL_CRON_ROLLOVER_CRON. +# NOTE: Intentionally offset from the 03:00 daily job to avoid collision on the 1st. # NOTE: Cron strings are matched exactly in index.ts:scheduled() to dispatch between # the different jobs. If you change them here, update the comparisons in the scheduled # handler (or the corresponding TRIAL_CRON_* env vars). [triggers] -crons = ["*/5 * * * *", "0 3 * * *", "0 4 * * *", "0 3 1 * *"] +crons = ["*/5 * * * *", "0 3 * * *", "0 4 * * *", "0 5 1 * *"] # Secrets (set via wrangler secret put): # - GITHUB_CLIENT_ID diff --git a/docs/architecture/secrets-taxonomy.md b/docs/architecture/secrets-taxonomy.md index 0545857e5..266d60c83 100644 --- a/docs/architecture/secrets-taxonomy.md +++ b/docs/architecture/secrets-taxonomy.md @@ -65,6 +65,8 @@ These override `ENCRYPTION_KEY` for their respective domain, providing secret is | `GITHUB_CLIENT_SECRET` | OAuth authentication | For GitHub login | | `GITHUB_APP_ID` | GitHub App integration | For repo access | | `GITHUB_APP_PRIVATE_KEY` | GitHub App authentication | For repo access | +| `TRIAL_CLAIM_TOKEN_SECRET` | HMAC-SHA256 signing for `sam_trial_claim` / `sam_trial_fingerprint` cookies | When trials are enabled | +| `ANTHROPIC_API_KEY_TRIAL` | Anthropic API key for trial AI inference (isolated from main key) | When trials use Anthropic provider | ### Optional Runtime Configuration (Not Secrets) @@ -186,6 +188,8 @@ Set via `wrangler secret put` or deployment workflow: - `GITHUB_CLIENT_SECRET` (optional) - `GITHUB_APP_ID` (optional) - `GITHUB_APP_PRIVATE_KEY` (optional) +- `TRIAL_CLAIM_TOKEN_SECRET` (required when trials enabled — 32+ bytes base64) +- `ANTHROPIC_API_KEY_TRIAL` (optional — trials use Workers AI when unset) ## Security Rules diff --git a/docs/guides/trial-configuration.md b/docs/guides/trial-configuration.md index b203732a7..a5cadc362 100644 --- a/docs/guides/trial-configuration.md +++ b/docs/guides/trial-configuration.md @@ -33,6 +33,12 @@ These are declared in `apps/api/wrangler.toml` at the top level (no `[env.*]` se | `TRIAL_DEFAULT_WORKSPACE_PROFILE` | `lightweight` | Devcontainer profile (see `packages/cloud-init`) used for trial workspaces. | | `TRIALS_ENABLED_KV_KEY` | `trials:enabled` | KV key read by the kill-switch. | | `TRIAL_KILL_SWITCH_CACHE_MS` | `30000` (30 s) | In-memory cache TTL for the kill-switch lookup. Lower = faster propagation, higher = fewer KV reads. | +| `TRIAL_MODEL` | `@cf/qwen/qwen3-30b-a3b-fp8` | AI model used for trial conversations. Override to use a different Workers AI or Anthropic model. | +| `TRIAL_LLM_PROVIDER` | `workers-ai` | LLM provider for trial inference: `"workers-ai"` (zero-cost, built-in) or `"anthropic"` (requires `ANTHROPIC_API_KEY_TRIAL`). | +| `ANTHROPIC_API_KEY_TRIAL` | — | Anthropic API key scoped to trial runs. Required **only** when `TRIAL_LLM_PROVIDER=anthropic` and `ENVIRONMENT=production`. Set via `wrangler secret put` or add to GitHub Environment secrets. | +| `ENVIRONMENT` | `staging` | Deployment mode: `"staging"` or `"production"`. Controls which agent type and model are used for trials. Set automatically by the deploy pipeline via `sync-wrangler-config.ts`. | +| `RATE_LIMIT_TRIAL_CREATE` | `10` | Per-IP rate limit (requests/hour) for `POST /api/trial/create`. | +| `RATE_LIMIT_TRIAL_SSE` | `30` | Per-IP rate limit (connections/5 min) for SSE event streaming. | ## Orchestrator and Fast-Path Knowledge @@ -79,6 +85,14 @@ ACP session notifications and MCP knowledge/idea tool calls are bridged into the | `TRIAL_KNOWLEDGE_MAX_EVENTS` | `10` | Upper bound on `trial.knowledge` events emitted by the fast-path probe. | | `TRIAL_GITHUB_TIMEOUT_MS` | `5000` | Per-request timeout for the default-branch probe in the TrialOrchestrator (`fetchDefaultBranch`). On timeout, 404, or network error the probe falls back to `main`. | +### SSE tunables + +| Variable | Default | Meaning | +|---|---|---| +| `TRIAL_SSE_HEARTBEAT_MS` | `15000` (15 s) | Interval between heartbeat comment frames sent to keep the SSE connection alive. | +| `TRIAL_SSE_POLL_TIMEOUT_MS` | `15000` (15 s) | Long-poll timeout per cycle when querying the TrialEventBus DO. | +| `TRIAL_SSE_MAX_DURATION_MS` | `1800000` (30 min) | Hard cap on SSE connection duration. | + ## Kill Switch Trials are **disabled by default**. The kill switch is a KV flag (`trials:enabled`) that must equal the literal string `"true"` for trials to start. diff --git a/scripts/deploy/configure-secrets.sh b/scripts/deploy/configure-secrets.sh index 61f865433..2f5034941 100644 --- a/scripts/deploy/configure-secrets.sh +++ b/scripts/deploy/configure-secrets.sh @@ -183,6 +183,14 @@ else echo -e "${YELLOW}ℹ Skipping GA4 secrets (GA4_API_SECRET/GA4_MEASUREMENT_ID not set — GA4 analytics forwarding disabled)${NC}" fi +# Configure trial Anthropic API key (optional — only needed when trials use Anthropic provider) +ANTHROPIC_API_KEY_TRIAL="${ANTHROPIC_API_KEY_TRIAL:-}" +if [ -n "$ANTHROPIC_API_KEY_TRIAL" ]; then + set_worker_secret "ANTHROPIC_API_KEY_TRIAL" "$ANTHROPIC_API_KEY_TRIAL" "$ENVIRONMENT" "false" +else + echo -e "${YELLOW}ℹ Skipping ANTHROPIC_API_KEY_TRIAL (not set — trials will use Workers AI provider)${NC}" +fi + # Configure smoke test auth (optional — only needed for staging/test environments) SMOKE_TEST_AUTH_ENABLED="${SMOKE_TEST_AUTH_ENABLED:-}" if [ -n "$SMOKE_TEST_AUTH_ENABLED" ]; then diff --git a/scripts/deploy/sync-wrangler-config.ts b/scripts/deploy/sync-wrangler-config.ts index 06b4cb3c9..f6efcd25c 100644 --- a/scripts/deploy/sync-wrangler-config.ts +++ b/scripts/deploy/sync-wrangler-config.ts @@ -197,6 +197,8 @@ function generateApiWorkerEnv( ...(process.env.HETZNER_BASE_IMAGE ? { HETZNER_BASE_IMAGE: process.env.HETZNER_BASE_IMAGE } : {}), // AI Gateway ID matches the resource prefix (created by configure-ai-gateway.sh) AI_GATEWAY_ID: DEPLOYMENT_CONFIG.prefix, + // Deployment environment — used by trial runner to choose agent type + model + ENVIRONMENT: DEPLOYMENT_CONFIG.getEnvironmentFromStack(stack), }, // Dynamic bindings from Pulumi outputs