diff --git a/docs/02_architecture/C4_SYSTEM_DIAGRAM.md b/docs/02_architecture/C4_SYSTEM_DIAGRAM.md new file mode 100644 index 0000000..4ee47e8 --- /dev/null +++ b/docs/02_architecture/C4_SYSTEM_DIAGRAM.md @@ -0,0 +1,352 @@ +# C4 System Diagrams — Code Kit Ultra + +**Status:** Authoritative +**Version:** 1.2.0 +**Last reviewed:** 2026-04-04 +**See also:** `docs/02_architecture/SYSTEM_ARCHITECTURE.md`, `docs/02_architecture/AUTH_ARCHITECTURE.md` + +--- + +## Overview + +This document presents the Code Kit Ultra architecture at three levels of abstraction following the [C4 model](https://c4model.com/): + +- **Level 1 — System Context:** Code Kit Ultra in relation to users and external systems. +- **Level 2 — Container Diagram:** The deployable units (apps, packages) and their responsibilities. +- **Level 3 — Component Diagrams:** Internal component breakdown for the Orchestrator and Auth containers. + +All diagrams use [Mermaid](https://mermaid.js.org/) syntax and are renderable in GitHub, GitLab, and most modern documentation tooling. + +--- + +## Level 1 — System Context + +```mermaid +C4Context + title System Context — Code Kit Ultra + + Person(developer, "Developer / Operator", "Human user who submits ideas, reviews gates, and approves actions via CLI or Web UI.") + Person(operator, "Operator (Automated)", "CI/CD system or script that drives runs via the Control Service API.") + Person(svcAccount, "Service Account", "Machine identity issued by Code Kit Ultra for non-interactive automation flows.") + + System(cku, "Code Kit Ultra", "Orchestration, governance, execution, and learning plane for AI-assisted software engineering. Runs are submitted, planned, gated, executed, healed, and recorded here.") + + System_Ext(insforge, "InsForge Identity Platform", "Issues RS256-signed session JWTs via Supabase Auth. Provides JWKS endpoint, Supabase PostgreSQL, object storage, and Realtime SSE infrastructure.") + System_Ext(aiProviders, "AI Providers", "LLM inference endpoints: Anthropic Claude, Google Gemini, OpenAI GPT-4o, Cursor, Windsurf, AntiGravity. Receive adapter-routed prompts and return structured completions.") + System_Ext(github, "GitHub", "Source code host. The GitHub provider adapter reads repositories, creates branches, commits files, and opens pull requests.") + System_Ext(redis, "Redis", "JWT jti revocation blacklist and JWKS cache TTL store. Used for sub-second token invalidation without DB round-trips.") + + Rel(developer, cku, "Submits ideas, approves gates, views audit trail", "HTTPS / CLI stdin") + Rel(operator, cku, "Drives runs programmatically", "HTTPS REST API") + Rel(svcAccount, cku, "Executes automated runs", "HTTPS + HS256 JWT") + + Rel(cku, insforge, "Verifies session JWTs via JWKS; reads/writes run data to Supabase DB; streams events via Supabase Realtime", "HTTPS / WebSocket") + Rel(cku, aiProviders, "Routes inference requests through adapter layer", "HTTPS / vendor SDK") + Rel(cku, github, "Reads repos, creates branches, commits files, opens PRs", "HTTPS / GitHub REST API") + Rel(cku, redis, "Checks/writes jti blacklist; caches JWKS public keys", "TCP / Redis protocol") + + UpdateLayoutConfig($c4ShapeInRow="3", $c4BoundaryInRow="2") +``` + +### System Context Notes + +| Actor / System | Role | +|---|---| +| Developer / Operator | Primary human interface. Uses `apps/cli` or `apps/web-control-plane`. | +| Service Account | Machine-to-machine identity. JWT issued by `packages/auth/src/service-account.ts`. | +| InsForge | Identity plane. Code Kit Ultra does **not** own human identity. | +| AI Providers | Stateless inference backends. All routing is done by `packages/adapters`. | +| GitHub | Target environment for code-producing runs. | +| Redis | Fast revocation store. System falls back to in-memory cache if Redis is unavailable. | + +--- + +## Level 2 — Container Diagram + +```mermaid +C4Container + title Container Diagram — Code Kit Ultra + + Person(developer, "Developer / Operator") + + System_Boundary(cku, "Code Kit Ultra") { + + Container(cli, "CLI", "Node.js / TypeScript", "Interactive terminal interface. Issues commands (run, approve, rollback, validate). Located at apps/cli/.") + Container(webUI, "Web Control Plane", "React + Vite / TypeScript", "Browser-based dashboard. Displays run status, gate decisions, audit trail, and live SSE stream. Located at apps/web-control-plane/.") + Container(controlService, "Control Service", "Node.js / Express / TypeScript", "Single HTTP API server. Handles auth middleware, command routing, SSE event stream, and realtime bridging. Located at apps/control-service/.") + + Container(orchestrator, "Orchestrator", "TypeScript (internal package)", "Drives the full run lifecycle. Contains: phase-engine, execution-engine, intake, planner, gate-manager, action-runner, mode-controller, batch-queue, outcome-engine, healing-integration, resume-run, rollback-engine.") + Container(governance, "Governance", "TypeScript (internal package)", "Evaluates the 9 governance gates. Contains: gate-controller, governed-pipeline, confidence-engine, consensus-engine, validation-engine, constraint-engine, intent-engine, adaptive-consensus, kill-switch.") + Container(adapters, "Adapters", "TypeScript (internal package)", "AI routing adapters (claude, gemini, openai, cursor, windsurf, antigravity) and provider adapters (FileSystem, Terminal, GitHub). Translates internal action contracts to vendor APIs.") + Container(auth, "Auth", "TypeScript (internal package)", "Verifies InsForge session JWTs, resolves session context, issues per-run execution tokens, and manages service accounts.") + Container(healing, "Healing", "TypeScript (internal package)", "Post-failure recovery. Classifies failures, selects healing strategies, executes recovery actions, and revalidates outcomes.") + Container(learning, "Learning", "TypeScript (internal package)", "Post-run intelligence. Persists outcome records, updates reliability scores, tunes execution policies, and surfaces execution optimisations.") + Container(audit, "Audit", "TypeScript (internal package)", "Writes immutable AuditEvents to the DB with SHA256 hash chain for tamper detection.") + Container(events, "Events", "TypeScript (internal package)", "Publishes CanonicalEvents to the SSE stream and Supabase Realtime channel using domain.noun.verb naming convention.") + Container(observability, "Observability", "TypeScript (internal package)", "Structured trace engine, timeline builder, logger, report renderer, and score explainer for run introspection.") + Container(security, "Security", "TypeScript (internal package)", "Action policy enforcement, batch signing, and batch provenance tracking.") + Container(policy, "Policy", "TypeScript (internal package)", "RBAC permission resolution, role mapping, and permission constants.") + Container(skillEngine, "Skill Engine", "TypeScript (internal package)", "Selects skills for a run plan, resolves manifests, and validates skill schemas.") + Container(commandEngine, "Command Engine", "TypeScript (internal package)", "17 command handlers (execute, approve-batch, rollback, validate, etc.) that translate API routes into orchestrator calls.") + Container(memory, "Memory", "TypeScript (internal package)", "Run state persistence (run-store.ts) used as the authoritative in-process run record.") + Container(shared, "Shared", "TypeScript (internal package)", "Cross-package type definitions: types.ts, contracts.ts, governance-types.ts, observability-types.ts.") + } + + System_Ext(insforge, "InsForge Platform", "Supabase Auth JWKS + PostgreSQL + Realtime") + System_Ext(aiProviders, "AI Providers", "Claude / Gemini / OpenAI / Cursor / Windsurf / AntiGravity") + System_Ext(github, "GitHub") + System_Ext(redis, "Redis") + + Rel(developer, cli, "Runs commands", "stdin / stdout") + Rel(developer, webUI, "Views dashboard, approves gates", "HTTPS browser") + Rel(cli, controlService, "Issues API requests", "HTTPS REST + SSE") + Rel(webUI, controlService, "Issues API requests, consumes SSE", "HTTPS REST + SSE") + + Rel(controlService, auth, "Resolves session on every request", "in-process import") + Rel(controlService, commandEngine, "Dispatches to command handlers", "in-process import") + Rel(commandEngine, orchestrator, "Starts / resumes / rolls back runs", "in-process import") + Rel(orchestrator, governance, "Evaluates gates during gating phase", "in-process import") + Rel(orchestrator, adapters, "Executes adapter actions during building phase", "in-process import") + Rel(orchestrator, healing, "Triggers healing on step failure", "in-process import") + Rel(orchestrator, skillEngine, "Selects skills during skills phase", "in-process import") + Rel(orchestrator, audit, "Emits AuditEvents at each lifecycle boundary", "in-process import") + Rel(orchestrator, events, "Publishes CanonicalEvents for SSE", "in-process import") + Rel(orchestrator, learning, "Records outcomes post-run", "in-process import") + Rel(orchestrator, security, "Validates action policy and signs batches", "in-process import") + Rel(orchestrator, memory, "Reads/writes run state", "in-process import") + Rel(orchestrator, observability, "Traces phases and steps", "in-process import") + Rel(adapters, aiProviders, "Routes inference requests", "HTTPS / vendor SDK") + Rel(adapters, github, "Executes GitHub actions", "HTTPS / GitHub REST API") + Rel(auth, insforge, "Fetches JWKS, validates JWTs", "HTTPS") + Rel(auth, redis, "Checks jti revocation blacklist", "TCP") + Rel(events, insforge, "Publishes to Supabase Realtime channel", "WebSocket") + + UpdateLayoutConfig($c4ShapeInRow="4", $c4BoundaryInRow="2") +``` + +### Container Technology Summary + +| Container | Runtime | Key Technology | Notes | +|---|---|---|---| +| CLI (`apps/cli`) | Node.js | TypeScript, commander or similar | No business logic — translates commands to API calls | +| Web Control Plane (`apps/web-control-plane`) | Browser | React, Vite, TypeScript | Consumes SSE for live updates | +| Control Service (`apps/control-service`) | Node.js | Express, TypeScript | Sole HTTP ingress point | +| Orchestrator | Node.js (in-process) | TypeScript | Stateful phase/step runner | +| Governance | Node.js (in-process) | TypeScript | 9-gate evaluation pipeline | +| Adapters | Node.js (in-process) | TypeScript, vendor SDKs | 6 AI + 3 provider adapters | +| Auth | Node.js (in-process) | TypeScript, jose (JWKS/JWT) | Three-strategy auth chain | +| Healing | Node.js (in-process) | TypeScript | Strategy-registry pattern | +| Learning | Node.js (in-process) | TypeScript | Post-run outcome processing | +| Audit | Node.js (in-process) | TypeScript, crypto (SHA256) | Append-only hash chain | +| Events | Node.js (in-process) | TypeScript, SSE | domain.noun.verb naming | + +### Container Boundary Rules + +The following call directions are **permitted**: + +``` +Control Service → Auth, Command Engine +Command Engine → Orchestrator +Orchestrator → Governance, Adapters, Healing, Skill Engine, Audit, + Events, Learning, Security, Memory, Observability +Adapters → AI Providers (external), GitHub (external) +Auth → InsForge JWKS (external), Redis (external) +Events → InsForge Realtime (external) +``` + +The following calls are **prohibited** to maintain layering integrity: + +- `Adapters → Orchestrator` (adapters are leaves) +- `Governance → Orchestrator` (governance is a pure evaluator) +- `Audit → Orchestrator` (audit is append-only) +- `CLI / Web UI → Orchestrator` (must route through Control Service) + +--- + +## Level 3 — Orchestrator Components + +```mermaid +C4Component + title Component Diagram — Orchestrator Package + + Container_Boundary(orch, "packages/orchestrator/src") { + + Component(phaseEngine, "phase-engine.ts", "TypeScript module", "Top-level phase sequencer. Iterates the 8 phases (intake → deployment). Calls sub-engines per phase. Emits phase-level AuditEvents and CanonicalEvents.") + Component(executionEngine, "execution-engine.ts", "TypeScript module", "10-step pipeline runner for the building phase. Executes audit-start, policy-eval, adapter-lookup, simulation, approval-gate, validation, execution-with-retry, outcome-verify, healing-integration (step 10.5), and rollback.") + Component(intake, "intake.ts", "TypeScript module", "Phase handler for the intake phase. Calls normalizeIdeaText, inferSolutionCategory, and generateClarifyingQuestions.") + Component(planner, "planner.ts", "TypeScript module", "Phase handler for the planning phase. Builds a structured PlanTask[] from clarification answers using the active AI adapter.") + Component(gateManager, "gate-manager.ts", "TypeScript module", "Coordinates the 9 governance gates. Delegates to packages/governance. Pauses the run if any gate returns NEEDS_REVIEW.") + Component(actionRunner, "action-runner.ts", "TypeScript module", "Executes individual adapter actions with retry logic. Reports success/failure to the execution engine.") + Component(modeController, "mode-controller.ts", "TypeScript module", "Resolves the active execution Mode (turbo | builder | pro | expert | safe | balanced | god) and sets mode-specific constraints on gate thresholds and retry limits.") + Component(batchQueue, "batch-queue.ts", "TypeScript module", "Manages ordered execution batches. Handles sequential/parallel step dispatch to the action runner.") + Component(outcomeEngine, "outcome-engine.ts", "TypeScript module", "Post-run outcome aggregation. Computes quality score, records failures, writes OutcomeRecord, forwards to learning engine.") + Component(healingIntegration, "healing-integration.ts", "TypeScript module", "Bridge between execution-engine (step 10.5) and packages/healing. Invokes the failure classifier and healing strategy pipeline.") + Component(resumeRun, "resume-run.ts", "TypeScript module", "Resumes a paused run after a gate approval event. Re-enters the phase engine at the paused checkpoint.") + Component(rollbackEngine, "rollback-engine.ts", "TypeScript module", "Executes compensating actions when healing is exhausted or a rollback command is issued. Records rollback_actions rows.") + } + + Container(governance, "packages/governance", "", "Gate evaluation layer") + Container(adapters, "packages/adapters", "", "AI and provider adapters") + Container(healing, "packages/healing", "", "Healing strategy pipeline") + Container(learning, "packages/learning", "", "Outcome and learning recording") + Container(audit, "packages/audit", "", "Immutable audit event writer") + Container(events, "packages/events", "", "SSE CanonicalEvent publisher") + + Rel(phaseEngine, intake, "Calls for intake phase") + Rel(phaseEngine, planner, "Calls for planning phase") + Rel(phaseEngine, gateManager, "Calls for gating phase") + Rel(phaseEngine, executionEngine, "Calls executeRunBundle for building phase") + Rel(phaseEngine, outcomeEngine, "Calls post-run") + Rel(phaseEngine, audit, "Emits run.started, phase.completed events") + Rel(phaseEngine, events, "Publishes run.phase.changed CanonicalEvents") + + Rel(executionEngine, modeController, "Reads mode constraints") + Rel(executionEngine, batchQueue, "Dispatches action batches") + Rel(executionEngine, actionRunner, "Executes individual actions") + Rel(executionEngine, healingIntegration, "Invokes on step failure (step 10.5)") + Rel(executionEngine, rollbackEngine, "Invokes on healing exhaustion") + Rel(executionEngine, audit, "Emits action.executed, action.failed events") + + Rel(gateManager, governance, "Evaluates 9 governance gates") + Rel(gateManager, resumeRun, "Calls after approval received") + + Rel(actionRunner, adapters, "Routes to AI and provider adapters") + Rel(healingIntegration, healing, "Delegates to healing engine pipeline") + Rel(outcomeEngine, learning, "Sends OutcomeRecord for learning") + + UpdateLayoutConfig($c4ShapeInRow="4", $c4BoundaryInRow="1") +``` + +### Orchestrator Phase-to-Component Mapping + +| Phase | Primary Component | Secondary Components | +|---|---|---| +| `intake` | `intake.ts` | `phaseEngine`, `audit`, `events` | +| `planning` | `planner.ts` | `phaseEngine`, AI adapter | +| `skills` | `skill-engine selector` | `phaseEngine` | +| `gating` | `gate-manager.ts` | `governance` (all 9 gates) | +| `building` | `execution-engine.ts` | `batchQueue`, `actionRunner`, `adapters` | +| `testing` | `phaseEngine` (simulated) | `audit`, `events` | +| `reviewing` | `phaseEngine` (simulated) | `audit`, `events` | +| `deployment` | `phaseEngine` (simulated) | `audit`, `events` | +| Recovery | `healing-integration.ts` | `healing`, `rollback-engine.ts` | +| Post-run | `outcome-engine.ts` | `learning` | + +--- + +## Level 3 — Auth Components + +```mermaid +C4Component + title Component Diagram — Auth Package + + Container_Boundary(authPkg, "packages/auth/src") { + + Component(resolveSession, "resolve-session.ts", "TypeScript module", "Entry point for all auth resolution. Determines which strategy applies (session JWT, service account JWT, legacy API key) and delegates accordingly. Returns a unified ResolvedSession object.") + Component(verifyInsforgeToken, "verify-insforge-token.ts", "TypeScript module", "Verifies RS256-signed InsForge session JWTs. Fetches and caches the JWKS from INSFORGE_JWKS_URI (10-min TTL). Validates iss, exp, aud, and performs jti Redis lookup.") + Component(issueExecutionToken, "issue-execution-token.ts", "TypeScript module", "Issues short-lived (10-min) HS256 execution tokens scoped to a specific runId and orgId. Used by adapters to authenticate outgoing calls without exposing the primary session JWT.") + Component(serviceAccount, "service-account.ts", "TypeScript module", "Verifies HS256-signed service account JWTs issued by Code Kit Ultra itself. Resolves scopes, orgId, workspaceId, and projectId from claims. Also provides issueServiceAccountToken() for enrollment flows.") + } + + System_Ext(insforgeJwks, "InsForge JWKS Endpoint", "RS256 public key set") + System_Ext(redis, "Redis", "jti revocation blacklist") + Container(controlService, "Control Service", "", "authenticate.ts middleware calls resolve-session") + Container(orchestrator, "Orchestrator", "", "Receives resolved session; calls issue-execution-token per run") + Container(policy, "packages/policy", "", "Receives resolved session to run permission checks") + + Rel(controlService, resolveSession, "Calls on every authenticated request") + Rel(resolveSession, verifyInsforgeToken, "Delegates when Bearer token matches InsForge format") + Rel(resolveSession, serviceAccount, "Delegates when Bearer token is a service-account JWT") + Rel(resolveSession, resolveSession, "Falls through to legacy API key check if both fail") + + Rel(verifyInsforgeToken, insforgeJwks, "Fetches JWKS (cached 10 min)", "HTTPS") + Rel(verifyInsforgeToken, redis, "Checks jti blacklist", "TCP") + + Rel(serviceAccount, redis, "Checks jti blacklist for service account tokens", "TCP") + + Rel(orchestrator, issueExecutionToken, "Issues per-run scoped execution token") + Rel(resolveSession, policy, "Resolved session passed to permission resolver") + + UpdateLayoutConfig($c4ShapeInRow="3", $c4BoundaryInRow="1") +``` + +### Auth Strategy Resolution Order + +``` +Request arrives at Control Service + │ + ▼ + Extract Bearer token from Authorization header + │ + ├── token.iss === INSFORGE_ISSUER? + │ └── YES → verify-insforge-token.ts + │ ├── Fetch/cache JWKS + │ ├── Verify RS256 signature + │ ├── Validate iss / exp / aud + │ ├── Redis jti revocation check + │ └── Build ResolvedSession { authMode: 'session' } + │ + ├── token has `svc:` prefix in sub or known service-account issuer? + │ └── YES → service-account.ts + │ ├── Verify HS256 with SERVICE_ACCOUNT_JWT_SECRET + │ ├── Validate exp, scopes + │ ├── Redis jti revocation check + │ └── Build ResolvedSession { authMode: 'service-account' } + │ + └── Legacy API key header (X-Api-Key)? + └── YES → legacy key lookup in DB + └── Build ResolvedSession { authMode: 'legacy-api-key' } + ⚠ DEPRECATED — planned for removal +``` + +### Execution Token Lifecycle + +``` +Orchestrator starts a new run + │ + ▼ +issue-execution-token.ts + sign({ sub: actorId, runId, orgId, scope: 'run:execute' }, HS256, exp: +10min) + │ + ▼ +Token stored in run context (not persisted to DB) + │ + ▼ +Adapters use token for outgoing calls to AI providers + │ + ▼ +Token expires automatically after 10 minutes +(No explicit revocation path — expiry is the revocation mechanism) +``` + +--- + +## Cross-Cutting Architecture Notes + +### Deployment Topology + +``` +┌─────────────────────────────────────────────────────┐ +│ Single Node.js Process │ +│ apps/control-service │ +│ ├── Express HTTP server (port configurable) │ +│ ├── SSE endpoint: GET /v1/events │ +│ ├── All packages imported in-process │ +│ └── No inter-service network calls (monolith) │ +└──────────────────────┬──────────────────────────────┘ + │ external calls only + ┌──────────┼──────────────┐ + ▼ ▼ ▼ + InsForge Redis AI Providers + (Supabase) (Claude, etc.) +``` + +All packages (`packages/*`) are compiled TypeScript imported directly into the control service process. There are no separate microservices. This is an intentional **modular monolith** design that minimises operational complexity while keeping internal boundaries enforced through module imports rather than network contracts. + +### Key Design Invariants + +1. **Identity plane separation:** Code Kit Ultra never issues or stores human passwords or primary identity. All human auth is delegated to InsForge. +2. **Governance immutability:** AuditEvents are never updated or deleted. Gate decisions are permanent records. +3. **Adapter isolation:** AI providers are never called directly from orchestrator, governance, or auth. All calls route through `packages/adapters`. +4. **CLI/UI are surfaces only:** `apps/cli` and `apps/web-control-plane` contain no business logic. All logic lives in packages imported by `apps/control-service`. +5. **Execution tokens are ephemeral:** Short-lived HS256 tokens (10 min) prevent long-lived credential leakage to adapters. diff --git a/docs/02_architecture/ERD.md b/docs/02_architecture/ERD.md new file mode 100644 index 0000000..5c0c76f --- /dev/null +++ b/docs/02_architecture/ERD.md @@ -0,0 +1,470 @@ +# Entity-Relationship Diagram — Code Kit Ultra + +**Status:** Authoritative +**Version:** 1.2.0 +**Last reviewed:** 2026-04-04 +**See also:** `docs/02_architecture/DATA_MODEL.md`, `/db/schema.sql`, `packages/shared/src/types.ts` + +--- + +## Overview + +The Code Kit Ultra database schema is hosted in **Supabase (PostgreSQL)** as part of the InsForge plane. The central entity is a **Run** — every other table exists to scope, govern, observe, or recover runs. + +The schema reflects three architectural concerns: + +1. **Multi-tenancy:** `organizations → workspaces → projects → runs` form a strict ownership hierarchy. Every row is scoped to at least an `organization_id`. +2. **Governance traceability:** `run_gates`, `run_events`, and `audit_logs` record every decision, event, and action taken during a run's lifecycle. These are append-only. +3. **Operational recovery:** `healing_actions` and `rollback_actions` provide full forensic traceability for automated recovery operations. + +> **Table naming note:** The canonical spec names used in this document map to the following repo table names: +> `gate_decisions` → `run_gates` (run_approvals in older migrations), +> `audit_events` → `audit_logs`, +> `canonical_events` → `run_events`. +> See `DATA_MODEL.md §Schema Alignment` for the rename migration reference. + +--- + +## Entity-Relationship Diagram + +```mermaid +erDiagram + + %% ───────────────────────────────────────────── + %% TENANT HIERARCHY + %% ───────────────────────────────────────────── + + organizations { + uuid id PK + text name "NOT NULL" + timestamptz created_at "NOT NULL DEFAULT now()" + } + + workspaces { + uuid id PK + uuid organization_id FK + text name "NOT NULL" + timestamptz created_at "NOT NULL DEFAULT now()" + } + + projects { + uuid id PK + uuid workspace_id FK + text name "NOT NULL" + text slug "NOT NULL, UNIQUE per workspace" + timestamptz created_at "NOT NULL DEFAULT now()" + } + + %% ───────────────────────────────────────────── + %% IDENTITY & MEMBERSHIP + %% ───────────────────────────────────────────── + + users { + text id PK "actorId — sourced from InsForge sub claim" + text email "NOT NULL" + text display_name + timestamptz created_at "NOT NULL DEFAULT now()" + } + + organization_memberships { + uuid id PK + uuid organization_id FK + text user_id FK + text role "owner | admin | member | viewer" + timestamptz created_at "NOT NULL DEFAULT now()" + } + + project_memberships { + uuid id PK + uuid project_id FK + text user_id FK + text role "owner | admin | member | viewer" + timestamptz created_at "NOT NULL DEFAULT now()" + } + + service_accounts { + text id PK "svc_" + text name "NOT NULL" + uuid org_id FK + uuid workspace_id FK "nullable — org-level if null" + uuid project_id FK "nullable — workspace-level if null" + jsonb scopes "NOT NULL DEFAULT '[]'" + text created_by "actorId of creator" + timestamptz created_at "NOT NULL DEFAULT now()" + } + + %% ───────────────────────────────────────────── + %% RBAC + %% ───────────────────────────────────────────── + + permissions { + uuid id PK + text name "e.g. run:create, gate:approve, rollback:trigger" + text description + } + + role_permissions { + uuid id PK + text role "FK-equivalent to role column on memberships" + uuid permission_id FK + } + + %% ───────────────────────────────────────────── + %% RUN CORE + %% ───────────────────────────────────────────── + + runs { + uuid id PK "run_YYYYMMDD_NNNN format" + uuid organization_id FK + uuid workspace_id FK "nullable" + uuid project_id FK "nullable" + text actor_id "actorId — human or service account" + text actor_type "human | service-account | system" + text auth_mode "session | service-account | legacy-api-key" + text correlation_id "NOT NULL — ties audit trail together" + text idea "NOT NULL — raw operator intent" + text mode "turbo | builder | pro | expert | safe | balanced | god" + text status "planned | running | paused | completed | failed | cancelled" + text priority "speed | quality | cost" + text deliverable "app | api | script | report" + timestamptz created_at "NOT NULL DEFAULT now()" + timestamptz updated_at "NOT NULL DEFAULT now()" + } + + plan_tasks { + uuid id PK + uuid run_id FK + text phase "intake | planning | skills | gating | building | testing | reviewing | deployment" + text title "NOT NULL" + text description "NOT NULL" + text done_definition "NOT NULL — revalidation target" + text status "pending | running | success | failed | paused | skipped | rolled-back" + int position "NOT NULL — ordering within run" + } + + %% ───────────────────────────────────────────── + %% GOVERNANCE + %% ───────────────────────────────────────────── + + run_gates { + uuid id PK + uuid run_id FK + text gate_type "risk_threshold | policy_compliance | confidence_score | kill_switch | consensus | constraint | validation | intent_alignment | approval" + text status "pending | pass | needs-review | blocked | approved | rejected" + text reason "NOT NULL — human-readable evaluation result" + bool should_pause "NOT NULL DEFAULT false" + text decided_by "actorId of approver/rejecter (nullable)" + timestamptz decided_at "nullable — set on approval/rejection" + text decision_note "optional operator note on decision" + } + + %% ───────────────────────────────────────────── + %% EVENTS & AUDIT + %% ───────────────────────────────────────────── + + run_events { + uuid id PK + uuid run_id FK "nullable — org-level events have no run" + text event_name "NOT NULL — domain.noun.verb e.g. run.phase.completed" + jsonb payload "NOT NULL" + text actor_id + text actor_type + uuid org_id FK + uuid workspace_id FK "nullable" + uuid project_id FK "nullable" + text auth_mode + text correlation_id + timestamptz created_at "NOT NULL DEFAULT now()" + } + + audit_logs { + uuid id PK + uuid run_id FK "nullable" + uuid organization_id FK + uuid workspace_id FK "nullable" + uuid project_id FK "nullable" + text actor_id "NOT NULL" + text actor_type "NOT NULL — human | service-account | system" + text auth_mode "NOT NULL" + text correlation_id "NOT NULL" + text event_type "NOT NULL — e.g. run.created, gate.approved" + jsonb payload "NOT NULL" + text previous_hash "SHA256 of prior event content + prior hash" + timestamptz created_at "NOT NULL DEFAULT now()" + } + + outcome_records { + uuid id PK + uuid run_id FK + bool success "NOT NULL" + jsonb failures "NOT NULL DEFAULT '[]'" + int retry_count "NOT NULL DEFAULT 0" + int duration_ms "NOT NULL" + numeric quality_score "0.0000 – 1.0000 (nullable)" + text user_feedback "nullable — operator-provided text" + int operator_rating "1–5 CHECK constraint (nullable)" + timestamptz created_at "NOT NULL DEFAULT now()" + } + + %% ───────────────────────────────────────────── + %% RECOVERY + %% ───────────────────────────────────────────── + + healing_actions { + uuid id PK + uuid run_id FK + text step_id "plan_tasks.id reference (text for flexibility)" + text strategy "retry-same | fallback-adapter | prompt-revision | partial-replan | add-context | escalate-mode" + int attempt "NOT NULL — attempt number within this healing episode" + text status "pending | running | success | failed | exhausted" + jsonb input "Action input passed to healing engine" + jsonb output "Action output / error from healing attempt" + timestamptz created_at "NOT NULL DEFAULT now()" + timestamptz updated_at "NOT NULL DEFAULT now()" + } + + rollback_actions { + uuid id PK + uuid run_id FK + text step_id "plan_tasks.id reference (text for flexibility)" + text action_type "compensating action type" + text status "pending | running | success | failed" + jsonb payload "Compensating action parameters" + text triggered_by "actorId or 'system' for automatic rollbacks" + timestamptz created_at "NOT NULL DEFAULT now()" + timestamptz updated_at "NOT NULL DEFAULT now()" + } + + %% ───────────────────────────────────────────── + %% RELATIONSHIPS + %% ───────────────────────────────────────────── + + organizations ||--o{ workspaces : "contains" + organizations ||--o{ organization_memberships : "has members" + organizations ||--o{ service_accounts : "owns" + organizations ||--o{ runs : "scopes" + organizations ||--o{ audit_logs : "scopes" + organizations ||--o{ run_events : "scopes" + + workspaces ||--o{ projects : "contains" + workspaces ||--o{ service_accounts : "scoped to (optional)" + workspaces ||--o{ runs : "scopes (optional)" + + projects ||--o{ project_memberships : "has members" + projects ||--o{ service_accounts : "scoped to (optional)" + projects ||--o{ runs : "scopes (optional)" + + users ||--o{ organization_memberships : "member of" + users ||--o{ project_memberships : "member of" + + role_permissions }o--|| permissions : "grants" + + runs ||--o{ plan_tasks : "contains" + runs ||--o{ run_gates : "evaluated by" + runs ||--o{ run_events : "generates" + runs ||--o{ audit_logs : "recorded in" + runs ||--o| outcome_records : "produces" + runs ||--o{ healing_actions : "attempts" + runs ||--o{ rollback_actions : "reverses with" + + plan_tasks ||--o{ healing_actions : "healed via" + plan_tasks ||--o{ rollback_actions : "rolled back via" +``` + +--- + +## Commentary + +### 1. Tenant Hierarchy and Run Scoping + +The schema enforces a strict four-level ownership hierarchy: + +``` +organizations + └── workspaces (organization_id FK → organizations.id) + └── projects (workspace_id FK → workspaces.id) + └── runs (organization_id FK required; workspace_id + project_id optional) +``` + +Every `run` row carries `organization_id` as a **required** foreign key, making org-level tenancy the mandatory scoping unit. `workspace_id` and `project_id` are optional — a run may be scoped as narrowly as a specific project or as broadly as an entire organization. + +This design supports three run contexts: + +| Context | organization_id | workspace_id | project_id | +|---|---|---|---| +| Org-level run | Required | NULL | NULL | +| Workspace-level run | Required | Required | NULL | +| Project-level run | Required | Required | Required | + +The `runs` table's `correlation_id` column is set at request ingress (sourced from the InsForge JWT `jti` or generated) and is threaded through every subsequent `audit_logs` and `run_events` row for the run. This makes it possible to reconstruct the complete causal chain of a run from any event by filtering on `correlation_id`. + +--- + +### 2. RBAC Through Memberships and Role Permissions + +Access control is a two-table lookup: + +``` +users → organization_memberships.role + │ + └── role_permissions.role + │ + └── permissions.name + (e.g. 'run:create', 'gate:approve', 'rollback:trigger') +``` + +`organization_memberships` and `project_memberships` record a `role` text column (values: `owner`, `admin`, `member`, `viewer`). The `role_permissions` table maps each role to a set of `permissions` rows. Permission resolution at runtime uses `packages/policy/src/resolve-permissions.ts`, which joins these tables and returns a `PermissionSet` object attached to `req.auth`. + +**Key design points:** + +- Project memberships override organization memberships — a user can have `viewer` at the org level but `admin` at a specific project. +- Service accounts carry an explicit `scopes` JSONB array (e.g., `["run:create", "gate:read"]`) that bypasses the membership table entirely. Their permissions are evaluated directly from the JWT claims by `service-account.ts`. +- The `permissions` table is the canonical enumeration of every grantable capability in the system. Changes to what a role can do require a migration that updates `role_permissions` rows. + +--- + +### 3. Run Lineage — runs → run_gates → run_events + +The traceability chain for any run is: + +``` +runs (1) + ├── run_gates (0..*) — one per governance gate evaluated + ├── run_events (0..*) — one per CanonicalEvent emitted + ├── plan_tasks (0..*) — one per planned step + └── outcome_records (0..1) — exactly one post-run summary +``` + +**`run_gates`** is the authoritative record of every governance decision. It captures: +- `gate_type` — which of the 9 gates was evaluated. +- `status` — the final status after any human decisions. +- `should_pause` — whether this evaluation caused a run pause. +- `decided_by` / `decided_at` / `decision_note` — human approval/rejection attribution. + +Because `should_pause` and `status` are set at evaluation time and then updated only on approval/rejection, the full decision history (initial evaluation + subsequent human action) is captured in a single row. This differs from an event-sourced model where two rows would be written. A full event log is still available via `audit_logs` for forensic reconstruction. + +**`run_events`** holds all `CanonicalEvents` (the SSE stream persisted to DB). These use the `domain.noun.verb` naming convention (e.g., `run.phase.completed`, `gate.approval.required`). They are optimised for timeline rendering and are indexed on `(run_id, created_at ASC)` to support ordered replay. Unlike `audit_logs`, `run_events` rows may be queried and filtered by `event_name` without decoding JSONB. + +**`plan_tasks`** maps 1:N to both `healing_actions` and `rollback_actions`, allowing post-run analysis of which specific steps required recovery and what strategies were attempted. + +--- + +### 4. Audit Integrity — SHA256 Hash Chain in `audit_logs` + +The `audit_logs` table provides governance-grade immutability through a **SHA256 hash chain**, implemented in `packages/audit/src/write-audit-event.ts`: + +``` +┌─────────────────────────────────────────────────────────┐ +│ audit_logs row N-1 │ +│ previous_hash: │ +│ this_hash: sha256(content_N-1 + previous_hash_N-1) │ +└─────────────────────────────────────────────────────────┘ + │ + │ previous_hash_N = this_hash_N-1 + ▼ +┌─────────────────────────────────────────────────────────┐ +│ audit_logs row N │ +│ previous_hash: │ +│ this_hash: sha256(content_N + previous_hash_N) │ +└─────────────────────────────────────────────────────────┘ +``` + +The genesis event uses `previous_hash = '0'.repeat(64)`. + +Any tampering with a historical row will invalidate all subsequent `previous_hash` values in the chain, making tampering detectable by a chain verification scan. The hash covers the full event content (`id`, `event_type`, `payload`, `actor_id`, `created_at`) plus the prior hash. + +**Known limitation:** The `lastHash` state is held in module-level memory in the current implementation. On process restart, the last hash must be loaded from the DB before writing new events, and multi-replica deployments require a DB-level advisory lock or sequence to prevent chain forks. This is tracked as risk R-09 in `docs/04_tracking/risk-log.md`. + +**Immutability guarantees:** +- No `UPDATE` or `DELETE` paths exist in `write-audit-event.ts`. +- The `audit_logs` table has no application-level soft-delete column. +- Row-level security in Supabase should be configured to deny `UPDATE`/`DELETE` for the application role. + +--- + +### 5. Healing and Rollback Traceability + +Two tables capture operational recovery events: + +**`healing_actions`** records every attempt by `packages/healing/src/healing-engine.ts` to recover a failed step: + +- One row per attempt (not per episode) — `attempt` column distinguishes retries within a single episode. +- `strategy` identifies which `HealingStrategy` from `healing-strategy-registry.ts` was applied. +- `status` progresses: `pending → running → success | failed | exhausted`. +- `input` / `output` JSONB columns store the full action parameters and result for forensic replay. + +**`rollback_actions`** records compensating actions executed by `rollback-engine.ts` when healing is exhausted or a manual rollback command is issued: + +- One row per compensating action (one per completed `plan_tasks` step, executed in reverse order). +- `triggered_by` distinguishes automatic rollback (`'system'`) from operator-initiated rollback (actorId). +- `status` tracks whether each individual compensating action succeeded. + +Together, these tables allow a post-incident investigator to reconstruct the exact sequence: which step failed, what healing was attempted, how many attempts were made, which strategy succeeded or failed, and exactly which compensating actions were executed to restore system state. + +**Example forensic query (healing episode for a run):** + +```sql +-- Full healing and rollback timeline for run 'run_20260404_0042' +SELECT + 'healing' AS record_type, + ha.step_id, + ha.strategy, + ha.attempt, + ha.status, + ha.created_at +FROM healing_actions ha +WHERE ha.run_id = 'run_20260404_0042' + +UNION ALL + +SELECT + 'rollback' AS record_type, + ra.step_id, + ra.action_type AS strategy, + NULL AS attempt, + ra.status, + ra.created_at +FROM rollback_actions ra +WHERE ra.run_id = 'run_20260404_0042' + +ORDER BY created_at ASC; +``` + +--- + +## Key Index Summary + +| Index | Purpose | +|---|---| +| `idx_runs_project_status (project_id, status, created_at DESC)` | Dashboard and CLI run-list queries | +| `idx_runs_org (organization_id, created_at DESC)` | Org-level run history | +| `idx_run_gates_run (run_id, status)` | Gate status lookups per run | +| `idx_run_gates_pending (status) WHERE status = 'pending'` | Approval queue queries | +| `idx_audit_logs_org_type (organization_id, event_type, created_at DESC)` | Governance audit queries | +| `idx_audit_logs_correlation (correlation_id)` | Cross-event correlation chain reconstruction | +| `idx_run_events_run (run_id, created_at ASC)` | Ordered timeline rendering for UI | +| `idx_outcome_records_success (success, created_at DESC)` | Learning engine analytics | + +--- + +## Table Ownership Summary + +| Table | Owner Package | Written by | Read by | +|---|---|---|---| +| `organizations` | `packages/core` | Control Service (org create) | Auth, Policy | +| `workspaces` | `packages/core` | Control Service | Auth, Policy | +| `projects` | `packages/core` | Control Service | Auth, Policy | +| `users` | `packages/core` | InsForge sync | Auth, Policy | +| `organization_memberships` | `packages/policy` | Control Service | Policy | +| `project_memberships` | `packages/policy` | Control Service | Policy | +| `service_accounts` | `packages/auth` | `service-account.ts` | Auth | +| `permissions` | `packages/policy` | Migrations only | Policy | +| `role_permissions` | `packages/policy` | Migrations only | Policy | +| `runs` | `packages/memory` | `run-store.ts` | Orchestrator, Command Engine | +| `plan_tasks` | `packages/memory` | `run-store.ts` | Orchestrator, Planner | +| `run_gates` | `packages/governance` | `gate-manager.ts` | Gate handlers, Approval API | +| `run_events` | `packages/events` | `publish-event.ts` | Realtime, Observability | +| `audit_logs` | `packages/audit` | `write-audit-event.ts` | Audit API, Compliance | +| `outcome_records` | `packages/learning` | `outcome-engine.ts` | Learning Engine | +| `healing_actions` | `packages/healing` | `healing-engine.ts` | Observability, Audit | +| `rollback_actions` | `packages/orchestrator` | `rollback-engine.ts` | Observability, Audit | diff --git a/docs/02_architecture/SEQUENCE_DIAGRAMS.md b/docs/02_architecture/SEQUENCE_DIAGRAMS.md new file mode 100644 index 0000000..5bb28cf --- /dev/null +++ b/docs/02_architecture/SEQUENCE_DIAGRAMS.md @@ -0,0 +1,476 @@ +# Sequence Diagrams — Code Kit Ultra + +**Status:** Authoritative +**Version:** 1.2.0 +**Last reviewed:** 2026-04-04 +**See also:** `docs/02_architecture/AUTH_ARCHITECTURE.md`, `docs/02_architecture/SYSTEM_ARCHITECTURE.md`, `docs/02_architecture/C4_SYSTEM_DIAGRAM.md` + +--- + +## Overview + +This document provides detailed sequence diagrams for the four most critical flows in Code Kit Ultra: + +1. **Auth & Session Resolution** — how every authenticated request is verified. +2. **Run Lifecycle (Happy Path)** — the end-to-end flow from CLI submission to run completion. +3. **Gate Approval Flow** — how a gate pause is raised, reviewed, and resolved. +4. **Healing Loop (Phase 10.5)** — how step failures are classified, healed, or rolled back. + +All diagrams use [Mermaid `sequenceDiagram`](https://mermaid.js.org/syntax/sequenceDiagram.html) syntax. + +--- + +## Flow 1 — Auth & Session Resolution + +This flow executes on **every authenticated API request**. The `authenticate.ts` middleware in `apps/control-service` is the entry point. It delegates to `packages/auth/src/resolve-session.ts`, which fans out to the appropriate strategy module. + +```mermaid +sequenceDiagram + autonumber + participant Client as Client
(CLI / Web UI / Service Account) + participant API as Control Service
authenticate.ts + participant RS as resolve-session.ts
packages/auth + participant VIT as verify-insforge-token.ts
packages/auth + participant SA as service-account.ts
packages/auth + participant JWKS as InsForge JWKS Endpoint
(external) + participant Redis as Redis
(jti blacklist) + participant Policy as resolve-permissions.ts
packages/policy + participant IET as issue-execution-token.ts
packages/auth + participant Orch as Orchestrator
packages/orchestrator + + Client->>API: HTTP request
Authorization: Bearer + + API->>RS: resolveSession(token) + + alt InsForge Session JWT (Primary — RS256) + RS->>VIT: verifyInsforgeToken(token) + + VIT->>JWKS: GET /.well-known/jwks.json + note over VIT,JWKS: Response cached for 10 minutes.
Subsequent requests use in-memory cache. + JWKS-->>VIT: { keys: [...] } + + VIT->>VIT: Verify RS256 signature
Validate iss === INSFORGE_ISSUER
Validate exp not expired
Extract sub (actorId), aud (tenancy claims) + + VIT->>Redis: SISMEMBER jti_blacklist + Redis-->>VIT: 0 (not revoked) or 1 (revoked) + + alt jti is revoked + VIT-->>RS: Error: TOKEN_REVOKED + RS-->>API: 401 Unauthorized + API-->>Client: 401 { error: "Token has been revoked" } + end + + VIT-->>RS: { actorId: sub, tenancy, authMode: 'session', claims } + + else Service Account JWT (Secondary — HS256) + RS->>SA: verifyServiceAccountToken(token) + + SA->>SA: Verify HS256 with SERVICE_ACCOUNT_JWT_SECRET
Validate exp not expired
Extract serviceAccountId, orgId, scopes + + SA->>Redis: SISMEMBER jti_blacklist + Redis-->>SA: 0 (not revoked) + + SA-->>RS: { actorId: serviceAccountId, tenancy, authMode: 'service-account', scopes } + + else Legacy API Key (Deprecated — X-Api-Key header) + RS->>RS: Lookup key in DB → resolve orgId / actorId + note over RS: ⚠ Deprecated. Planned for removal.
No jti tracking. Revocation via DB delete only. + RS-->>RS: { actorId, tenancy, authMode: 'legacy-api-key' } + end + + RS->>Policy: resolvePermissions(authMode, role, scopes) + Policy-->>RS: PermissionSet + + RS-->>API: ResolvedSession { actor, tenant, permissions, authMode, correlationId } + + API->>API: Attach session to req.auth
Proceed to command handler + + note over API,Orch: When orchestrator starts a new run,
it issues a scoped execution token. + + API->>Orch: startRun(runInput, session) + Orch->>IET: issueExecutionToken({ actorId, runId, orgId, exp: +10min }) + IET->>IET: Sign HS256 { sub: actorId, runId, orgId, scope: 'run:execute' } + IET-->>Orch: executionToken (10-min HS256 JWT) + note over Orch: Token stored in run context only.
Not persisted to DB. Expires automatically. + Orch->>Orch: Attach executionToken to all adapter calls
within this run +``` + +### Auth Notes + +| Aspect | Detail | +|---|---| +| JWKS Cache TTL | 10 minutes (in-memory). First request per instance fetches from InsForge. | +| jti Revocation | Redis `SISMEMBER` on `jti_blacklist` set. Falls back to in-memory set if Redis unavailable. | +| Execution Token | HS256, 10-min expiry, scoped to `{ runId, orgId, scope: 'run:execute' }`. Never persisted. | +| Legacy Key | Deprecated. No jti — revocation requires DB row deletion. Removed in a future release. | +| Auth Failure | Returns HTTP 401 with structured `{ error, code }` body. No partial session built. | + +--- + +## Flow 2 — Run Lifecycle (Happy Path) + +This flow covers the full end-to-end journey of a run from CLI submission through all 8 phases to completion. The "happy path" assumes all gates pass and no step failures occur. + +```mermaid +sequenceDiagram + autonumber + participant CLI as CLI
apps/cli + participant API as Control Service
apps/control-service + participant Auth as resolve-session.ts + participant CMD as execute.ts
command-engine + participant Orch as phase-engine.ts
orchestrator + participant Intake as intake.ts + participant Planner as planner.ts + participant Skills as skill-engine selector + participant Gates as gate-manager.ts + participant Gov as governance
9 gates + participant ExecEng as execution-engine.ts + participant Adapters as Adapters
claude / gemini / etc. + participant Audit as write-audit-event.ts + participant Events as publish-event.ts
(SSE) + participant Memory as run-store.ts + + CLI->>API: POST /v1/runs { idea, mode, projectId } + + API->>Auth: resolveSession(bearerToken) + Auth-->>API: ResolvedSession + + API->>CMD: execute(runInput, session) + CMD->>Memory: createRun({ id, status: 'planned', ... }) + Memory-->>CMD: RunState + + CMD->>Audit: writeAuditEvent(run.created, { runId, actorId }) + CMD->>Events: publishEvent(run.created, { runId }) + note over Events: SSE stream delivers run.created to
subscribed CLI / Web UI clients. + + CMD->>Orch: startPhaseEngine(run, session) + Orch->>Memory: updateRun({ status: 'running' }) + Orch->>Audit: writeAuditEvent(run.started, { runId }) + Orch->>Events: publishEvent(run.started, { runId }) + + rect rgb(240, 248, 255) + note right of Orch: PHASE: intake + Orch->>Intake: runIntakePhase(run) + Intake->>Adapters: normalizeIdeaText(idea) → structured summary + Adapters-->>Intake: normalized idea + Intake->>Adapters: inferSolutionCategory(summary) → category + Adapters-->>Intake: { category, confidence } + Intake->>Adapters: generateClarifyingQuestions(summary, category) + Adapters-->>Intake: clarifyingQuestions[] + Intake-->>Orch: IntakeResult { summary, category, questions } + Orch->>Events: publishEvent(run.phase.completed, { phase: 'intake' }) + end + + rect rgb(240, 255, 240) + note right of Orch: PHASE: planning + Orch->>Planner: runPlanningPhase(intakeResult) + Planner->>Adapters: buildTaskPlan(clarificationAnswers) + Adapters-->>Planner: PlanTask[] (ordered tasks per phase) + Planner-->>Orch: PlanningResult { tasks } + Orch->>Memory: updateRun({ planTasks }) + Orch->>Events: publishEvent(run.phase.completed, { phase: 'planning' }) + end + + rect rgb(255, 255, 240) + note right of Orch: PHASE: skills + Orch->>Skills: selectSkills(planTasks) + Skills->>Skills: resolveManifest + validateSchema per skill + Skills-->>Orch: SkillSelection { selectedSkills } + Orch->>Events: publishEvent(run.phase.completed, { phase: 'skills' }) + end + + rect rgb(255, 245, 230) + note right of Orch: PHASE: gating (9 gates evaluated) + Orch->>Gates: evaluateGates(run, plan, skills) + loop For each of 9 gates + Gates->>Gov: evaluateGate(gateType, context) + Gov-->>Gates: GateDecision { status, reason, shouldPause } + Gates->>Audit: writeAuditEvent(gate.evaluated, { gateType, status }) + Gates->>Events: publishEvent(gate.evaluated, { gateType, status }) + end + note over Gates: If any gate returns NEEDS_REVIEW → pause.
(See Flow 3 for gate approval detail.) + Gates-->>Orch: GateResult { allPassed: true } + Orch->>Events: publishEvent(run.phase.completed, { phase: 'gating' }) + end + + rect rgb(245, 230, 255) + note right of Orch: PHASE: building — executeRunBundle + Orch->>ExecEng: executeRunBundle(run, session, executionToken) + + ExecEng->>Audit: writeAuditEvent(execution.started, { runId }) + + loop For each step/action in plan + ExecEng->>Adapters: executeAction(action, executionToken) + Adapters-->>ExecEng: ActionResult { success, output } + ExecEng->>Audit: writeAuditEvent(action.executed, { actionId, success }) + ExecEng->>Events: publishEvent(run.step.completed, { stepId, status: 'success' }) + ExecEng->>Memory: updateStep({ status: 'success' }) + end + + ExecEng-->>Orch: ExecutionResult { success: true } + Orch->>Events: publishEvent(run.phase.completed, { phase: 'building' }) + end + + rect rgb(230, 255, 245) + note right of Orch: PHASES: testing / reviewing / deployment (simulated) + Orch->>Orch: runSimulatedPhase('testing') + Orch->>Events: publishEvent(run.phase.completed, { phase: 'testing' }) + Orch->>Orch: runSimulatedPhase('reviewing') + Orch->>Events: publishEvent(run.phase.completed, { phase: 'reviewing' }) + Orch->>Orch: runSimulatedPhase('deployment') + Orch->>Events: publishEvent(run.phase.completed, { phase: 'deployment' }) + end + + Orch->>Memory: updateRun({ status: 'completed' }) + Orch->>Audit: writeAuditEvent(run.completed, { runId, durationMs }) + Orch->>Events: publishEvent(run.completed, { runId, status: 'completed' }) + + note over Orch: outcome-engine.ts runs post-completion + Orch->>Orch: outcomeEngine.record(run) + note over Orch: learning-engine.ts updates reliability scores + + CMD-->>API: { runId, status: 'completed' } + API-->>CLI: 200 { runId, status: 'completed' } +``` + +### Run Lifecycle Notes + +| Phase | Handler | AI Call | Gate Check | +|---|---|---|---| +| `intake` | `intake.ts` | Yes (normalize, categorize, questions) | No | +| `planning` | `planner.ts` | Yes (task plan) | No | +| `skills` | `skill-engine/selector.ts` | No | No | +| `gating` | `gate-manager.ts` | Depends on gate type | Yes (9 gates) | +| `building` | `execution-engine.ts` | Yes (via adapters) | Yes (approval gate, step 5) | +| `testing` | `phase-engine.ts` | No (simulated) | No | +| `reviewing` | `phase-engine.ts` | No (simulated) | No | +| `deployment` | `phase-engine.ts` | No (simulated) | No | + +--- + +## Flow 3 — Gate Approval Flow + +This flow covers the case where a governance gate returns `NEEDS_REVIEW`, pausing the run until a human operator approves or rejects via the API. The flow branches on approval vs. rejection. + +```mermaid +sequenceDiagram + autonumber + participant Gates as gate-manager.ts
orchestrator + participant Gov as governance gate
(any of 9) + participant Memory as run-store.ts + participant Audit as write-audit-event.ts + participant Events as publish-event.ts
(SSE) + participant CLI as CLI / Web UI
(operator) + participant API as Control Service
approve handler + participant Resume as resume-run.ts
orchestrator + participant Orch as phase-engine.ts
orchestrator + + Gates->>Gov: evaluateGate(gateType, context) + Gov-->>Gates: GateDecision { status: 'needs-review', reason, shouldPause: true } + + Gates->>Memory: updateGateDecision({ status: 'needs-review' }) + Gates->>Memory: updateRun({ status: 'paused' }) + + Gates->>Audit: writeAuditEvent(gate.paused, { gateType, reason, runId }) + Gates->>Events: publishEvent(gate.approval.required, { runId, gateType, reason }) + note over Events: SSE stream pushes gate.approval.required
to all subscribers on this run channel. + + Gates-->>Orch: GateResult { needsReview: true, gateType } + Orch->>Orch: Suspend phase-engine execution
(awaiting resume signal) + + CLI->>CLI: Operator receives SSE notification:
"Gate [type] requires review" + CLI->>CLI: Operator reviews reason and context + + alt Operator APPROVES + + CLI->>API: POST /v1/gates/:gateId/approve { note } + API->>API: Authenticate + authorize request + API->>Memory: updateGateDecision({ status: 'approved', decidedBy, decidedAt, decisionNote }) + API->>Memory: updateRun({ status: 'running' }) + + API->>Audit: writeAuditEvent(gate.approved, { gateId, gateType, decidedBy, note }) + API->>Events: publishEvent(gate.approved, { runId, gateType, decidedBy }) + + API->>Resume: resumeRun(runId, session) + Resume->>Memory: loadRun(runId) → RunState with checkpoint + Resume->>Orch: reenterPhaseEngine(run, checkpoint) + + Orch->>Orch: Continue execution from paused checkpoint + Orch->>Events: publishEvent(run.resumed, { runId }) + + note over Orch: Run continues with the remaining gates
and then proceeds to the building phase. + + else Operator REJECTS + + CLI->>API: POST /v1/gates/:gateId/reject { reason } + API->>API: Authenticate + authorize request + API->>Memory: updateGateDecision({ status: 'rejected', decidedBy, decidedAt, decisionNote: reason }) + API->>Memory: updateRun({ status: 'cancelled' }) + + API->>Audit: writeAuditEvent(gate.rejected, { gateId, gateType, decidedBy, reason }) + API->>Events: publishEvent(gate.rejected, { runId, gateType, reason }) + + note over Events: SSE stream pushes gate.rejected.
CLI / Web UI marks run as cancelled. + + API-->>CLI: 200 { runId, status: 'cancelled' } + end +``` + +### Gate Type Reference + +The 9 governance gates evaluated during the `gating` phase, in evaluation order: + +| # | Gate Type | Evaluator | Pause Trigger | +|---|---|---|---| +| 1 | Risk Threshold Gate | `gate-controller.ts` | Risk score exceeds mode threshold | +| 2 | Policy Compliance Gate | `governed-pipeline.ts` + `constraint-engine.ts` | Policy violation detected | +| 3 | Confidence Score Gate | `confidence-engine.ts` | Score below mode minimum | +| 4 | Kill Switch Gate | `kill-switch.ts` | Kill switch active for org/workspace | +| 5 | Consensus Gate | `consensus-engine.ts` + `adaptive-consensus.ts` | Consensus not reached across adapters | +| 6 | Constraint Gate | `constraint-engine.ts` | Hard constraint violated | +| 7 | Validation Gate | `validation-engine.ts` | Output fails validation schema | +| 8 | Intent Alignment Gate | `intent-engine.ts` | Plan intent diverges from idea | +| 9 | Approval Gate | `gate-controller.ts` | Mode requires explicit human approval | + +### GateStatus Transitions + +``` +pending → pass (gate evaluated and passed — run continues) +pending → needs-review (gate requires human decision — run paused) +pending → blocked (gate hard-blocked — run fails immediately) +needs-review → approved (human approved — run resumes) +needs-review → rejected (human rejected — run cancelled) +``` + +--- + +## Flow 4 — Healing Loop (Phase 10.5) + +This flow executes within `execution-engine.ts` at step 10.5 — between a step failure (step 7) and the final rollback decision (step 10). It is triggered automatically whenever an action returns a failure result. + +```mermaid +sequenceDiagram + autonumber + participant ExecEng as execution-engine.ts
(step 7 → 10.5) + participant HealInt as healing-integration.ts
orchestrator + participant FC as failure-classifier.ts
packages/healing + participant Registry as healing-strategy-registry.ts
packages/healing + participant HealEng as healing-engine.ts
packages/healing + participant Reval as revalidation.ts
packages/healing + participant Adapters as Adapters
(retry target) + participant Rollback as rollback-engine.ts
orchestrator + participant Audit as write-audit-event.ts + participant Events as publish-event.ts
(SSE) + participant Memory as run-store.ts + + ExecEng->>ExecEng: Action fails (step 7 — execution with retry exhausted) + ExecEng->>Audit: writeAuditEvent(action.failed, { actionId, error, attempt }) + ExecEng->>Events: publishEvent(run.step.failed, { stepId, error }) + ExecEng->>Memory: updateStep({ status: 'failed' }) + + ExecEng->>HealInt: invokeHealingPipeline(failure, runContext) + note over HealInt: Phase 10.5 — healing-integration bridges
execution-engine to packages/healing. + + HealInt->>FC: classifyFailure(failure) + note over FC: Analyses error type, stack trace, and context.
Maps to a FailureCategory enum. + FC-->>HealInt: FailureClassification { category, severity, retryable } + + alt Not retryable (e.g. auth failure, schema violation) + HealInt-->>ExecEng: HealingResult { healed: false, reason: 'not-retryable' } + ExecEng->>Rollback: triggerRollback(run, failedStep) + note over Rollback: Skip healing loop — go straight to rollback. + end + + HealInt->>Registry: resolveStrategy(classification) + note over Registry: Matches classification to registered
healing strategies (retry, fallback-adapter,
partial-replan, prompt-revision, etc.) + Registry-->>HealInt: HealingStrategy { strategyId, maxAttempts, actions } + + loop Healing attempts (up to maxAttempts per strategy) + HealInt->>HealEng: executeHealingStrategy(strategy, failure, runContext) + + HealEng->>HealEng: Apply strategy actions
(e.g. swap adapter, revise prompt,
reduce scope, add context) + + HealEng->>Adapters: Retry action with modified parameters + Adapters-->>HealEng: ActionResult { success, output } + + alt Action succeeds + HealEng->>Reval: revalidate(output, step.doneDefinition) + Reval-->>HealEng: RevalidationResult { valid, score } + + alt Revalidation passes + HealEng-->>HealInt: HealingResult { healed: true, attempt, strategy } + HealInt->>Audit: writeAuditEvent(healing.succeeded, { stepId, strategy, attempt }) + HealInt->>Events: publishEvent(run.step.healed, { stepId, strategy }) + HealInt->>Memory: updateHealingAction({ status: 'success', runId, stepId }) + HealInt-->>ExecEng: HealingResult { healed: true } + ExecEng->>ExecEng: Continue with next step + else Revalidation fails + HealEng->>HealEng: Increment attempt counter + note over HealEng: Output did not meet doneDefinition.
Try next healing attempt. + end + + else Action fails again + HealEng->>HealEng: Increment attempt counter + HealEng->>Audit: writeAuditEvent(healing.attempt.failed, { attempt, error }) + end + end + + note over HealInt: All healing attempts exhausted without success. + + HealInt->>Audit: writeAuditEvent(healing.exhausted, { runId, stepId, attempts }) + HealInt->>Events: publishEvent(run.healing.exhausted, { runId, stepId }) + HealInt->>Memory: updateHealingAction({ status: 'exhausted' }) + HealInt-->>ExecEng: HealingResult { healed: false, reason: 'exhausted' } + + ExecEng->>Rollback: triggerRollback(run, failedStep) + note over Rollback: rollback-engine executes compensating actions
in reverse order for all completed steps. + + loop For each completed step (reverse order) + Rollback->>Adapters: executeCompensatingAction(step) + Adapters-->>Rollback: CompensationResult + Rollback->>Memory: updateStep({ status: 'rolled-back' }) + Rollback->>Audit: writeAuditEvent(rollback.action.executed, { stepId }) + Rollback->>Memory: writeRollbackAction({ runId, stepId, status }) + end + + Rollback->>Memory: updateRun({ status: 'failed' }) + Rollback->>Audit: writeAuditEvent(run.failed, { runId, reason: 'healing-exhausted' }) + Rollback->>Events: publishEvent(run.failed, { runId, reason: 'healing-exhausted' }) +``` + +### Healing Strategy Types + +| Strategy | Trigger Condition | Action | +|---|---|---| +| `retry-same` | Transient network/timeout error | Retry identical action with exponential backoff | +| `fallback-adapter` | AI provider error or low-confidence output | Switch to next AI adapter in priority order | +| `prompt-revision` | Output failed validation but adapter responded | Revise prompt with additional constraints | +| `partial-replan` | Step scope too large for single action | Decompose step into smaller sub-actions | +| `add-context` | Insufficient context in original action | Inject additional context from run memory | +| `escalate-mode` | Low confidence across all adapters | Temporarily elevate execution mode | + +### Healing Loop Limits + +| Mode | Max Healing Attempts per Step | +|---|---| +| `turbo` | 1 | +| `builder` | 2 | +| `pro` | 3 | +| `expert` | 3 | +| `safe` | 5 | +| `balanced` | 3 | +| `god` | 5 | + +When `maxAttempts` is exhausted, the healing integration returns `{ healed: false }` and `execution-engine.ts` immediately invokes `rollback-engine.ts`. + +### Audit Events in Healing + +| Event Type | When Emitted | +|---|---| +| `action.failed` | Initial step failure (execution-engine, step 7) | +| `healing.attempt.started` | Each healing attempt begins | +| `healing.attempt.failed` | A healing attempt produces failure | +| `healing.succeeded` | Healing attempt produces passing revalidation | +| `healing.exhausted` | All attempts used without success | +| `rollback.action.executed` | Each compensating action runs | +| `run.failed` | Run status transitions to `failed` after rollback | diff --git a/docs/03_specs/SPEC_EXECUTION_ENGINE.md b/docs/03_specs/SPEC_EXECUTION_ENGINE.md new file mode 100644 index 0000000..1963ff0 --- /dev/null +++ b/docs/03_specs/SPEC_EXECUTION_ENGINE.md @@ -0,0 +1,402 @@ +# SPEC — Execution Engine +**Status:** Draft +**Version:** 1.0 +**Linked to:** packages/orchestrator/src/execution-engine.ts +**Implements:** executeRunBundle pipeline, per-task execution contract, retry policy, healing integration, rollback, and manual retry/rollback operations + +--- + +## Objective + +Define the complete behavioral contract for the Execution Engine — the component responsible for running a `RunBundle` through a sequential task pipeline. This spec covers the 6-stage per-task pipeline (within `executeTask`), the outer bundle loop (within `executeRunBundle`), retry semantics, adapter selection, risk simulation, approval gating, outcome capture, healing integration, rollback mechanics, and the manual `retryTask` and `rollbackTask` operations. + +--- + +## Scope + +- `executeRunBundle` function: bundle-level orchestration loop +- `executeTask` function: per-task 6-stage pipeline +- Retry policy: configurable attempts, healing extension +- Adapter selection: `createProviderAdapters` + `findAdapter` +- Simulation and risk assessment per task +- Approval gating within task execution +- Learning optimizer: `optimizeTasks` applied before the task loop +- Outcome capture after bundle completion or failure +- Healing integration: `healFailedStep` invocation and result handling +- Automatic rollback via `adapter.rollback` +- `retryTask`: manual single-step retry +- `rollbackTask`: manual single-step rollback +- Audit events and step log entries emitted at each stage + +Out of scope: phase engine coordination, gate evaluation, and post-deployment observability. + +--- + +## Inputs / Outputs + +| Direction | Item | Type | Description | +|-----------|------|------|-------------| +| Input | `bundle` | `RunBundle` | Complete run bundle including plan, state, and existing logs | +| Input | `actor` | `string` | Identity string for audit records (default: `"system"`) | +| Output | `RunBundle` | `RunBundle` | Mutated bundle with updated `state`, `executionLog`, `adapters`, and `auditLog` | + +--- + +## Data Structures + +```typescript +// packages/shared/src/types.ts + +interface RunBundle { + state: RunState; // mutable run state + plan: PlanArtifact; // tasks to execute + intake: IntakeArtifact; + adapters: AdapterLog; // per-task adapter execution summaries + executionLog: ExecutionLog; // ordered step execution log + auditLog: AuditLog; // append-only audit entries + gates: GateDecision[]; + reportMarkdown: string; +} + +interface PlanTask { + id: string; + title: string; + adapterId: string; + payload: Record; + rollbackPayload?: Record; + requiresApproval?: boolean; + retryPolicy?: { maxAttempts: number }; + dependencies?: string[]; +} + +interface StepExecutionLog { + stepId: string; + title: string; + adapter: string; + attempt: number; + status: StepStatus; // "pending" | "running" | "success" | "failed" | "paused" | "rolled-back" + startedAt: string; + finishedAt?: string; + output?: string; + error?: string; + rollbackAvailable: boolean; + risk?: ExecutionRisk; // "low" | "medium" | "high" + simulationSummary?: string; + verificationStatus?: "passed" | "failed"; + verificationSummary?: string; + fixSuggestion?: string; +} + +interface AdapterExecutionSummary { + taskId: string; + adapter: string; + status: "success" | "failed" | "rolled-back"; + attempts: number; + output: string; +} +``` + +--- + +## Interfaces / APIs + +### `executeRunBundle` + +```typescript +export async function executeRunBundle( + bundle: RunBundle, + actor: string = "system" +): Promise +``` + +### `retryTask` + +```typescript +export async function retryTask( + runId: string, + targetStepId?: string, + actor: string = "system" +): Promise +``` + +### `rollbackTask` + +```typescript +export async function rollbackTask( + runId: string, + targetStepId?: string, + actor: string = "system" +): Promise +``` + +--- + +## `executeRunBundle`: Full Pipeline + +### Pre-loop: Learning Optimizer + +Before the task loop begins, the execution engine applies the learning optimizer: + +1. `loadLearningStore()` retrieves historical run outcome data. +2. `optimizeTasks(bundle.plan.tasks, store)` returns an optimized task list with adjusted `retryLimit` values and a list of `suggestions`. +3. Any task with an adjusted `retryLimit` has its `retryPolicy.maxAttempts` overwritten in `bundle.plan.tasks`. +4. If suggestions are non-empty, a `OPTIMIZER_SUGGESTIONS_APPLIED` audit event is written. + +### Task Loop + +``` +for index = bundle.state.currentStepIndex to bundle.plan.tasks.length - 1: + result = await executeTask(bundle, task, index, actor) + if result.completed === false: + if bundle.state.status === "failed": + recordRunOutcome(success=false) + return bundle // exits early on pause or failure +// all tasks completed: +writeAuditEvent("RUN_COMPLETED") +markState(bundle, "completed", { currentStepIndex: tasks.length }) +recordRunOutcome(success=true) +return bundle +``` + +The loop starts at `bundle.state.currentStepIndex`, enabling resume from a checkpoint without re-executing already-completed steps. + +--- + +## `executeTask`: 6-Stage Per-Task Pipeline + +Each invocation of `executeTask` runs the following stages in strict sequence: + +### Stage 1: Audit Start + +`writeAuditEvent` with action `TASK_EXECUTION_ATTEMPT`. Fields included: `runId`, `actorName`, `actorId`, `actorType`, `orgId`, `workspaceId`, `projectId`, `correlationId`, `role`, `stepId`, `details.index`, `details.title`, `details.adapter`. + +### Stage 2: Policy Evaluation + +`evaluatePolicy(task)` is called. If `policyResult.allowed === false`: +- Writes `POLICY_BLOCK` audit event with `details.reason` +- Appends `StepExecutionLog` with `status: "failed"` +- Upserts `AdapterExecutionSummary` with `status: "failed"` +- Calls `markState(bundle, "failed", { currentStepIndex: index })` +- Returns `{ completed: false }` + +### Stage 3: Adapter Lookup + +`findAdapter(createProviderAdapters(), task.adapterId)` resolves the adapter. If `null` is returned: +- Writes `ADAPTER_NOT_FOUND` audit event +- Appends step log with `status: "failed"`, upserts adapter summary +- Calls `markState(bundle, "failed")` +- Returns `{ completed: false }` + +### Stage 4: Simulation and Risk Assessment + +If the adapter exposes a `simulate` method, `adapter.simulate(task.payload)` is called. The result's `risk` field determines `estimatedRisk`. If no `simulate` method, `adapter.estimateRisk(task.payload)` is tried. If neither exists, `estimatedRisk` defaults to `"medium"`. + +`requiresApproval` is determined as: +``` +requiresApproval = task.requiresApproval + || policyResult.requiresApproval + || simulation?.requiresApproval + || estimatedRisk === "high" +``` + +### Stage 4b: Approval Gating + +If `requiresApproval === true` and `bundle.state.approved === false`: +- `markState(bundle, "paused", { currentStepIndex: index, approvalRequired: true, pauseReason: })` +- Appends step log with `status: "paused"`, `attempt: 0` +- Writes `APPROVAL_REQUIRED` audit event +- Returns `{ completed: false, paused: true }` + +### Stage 5: Validation + +`adapter.validate(task.payload)` must return truthy. On `false`: +- `adapter.suggestFix` is called if available; result stored as `fixSuggestion` +- Writes `VALIDATION_FAILED` audit event with `fixSuggestion` +- Appends step log with `status: "failed"` and `fixSuggestion` +- Upserts adapter summary, calls `markState(bundle, "failed")` +- Returns `{ completed: false }` + +### Stage 6: Execution with Retry and Outcome Verification + +A loop runs from `attempt = 1` to `maxAttempts` (from `task.retryPolicy?.maxAttempts ?? 1`): + +**On each attempt:** +1. Writes `STEP_EXECUTION_STARTED` audit event with `attempt` and `risk`. +2. Calls `adapter.execute(task.payload)`. If `result.success === false`, throws `result.error`. +3. Calls `adapter.verify(task.payload, result)`. If `!verification.ok`, throws `"Verification failed: {summary}"`. +4. On success: writes `STEP_EXECUTION_SUCCEEDED`, appends step log `status: "success"`, upserts adapter summary, increments `currentStepIndex`, resets `approved = false`. + +**On catch (error):** +1. Writes `STEP_EXECUTION_FAILED` audit event with attempt, error, and `fixSuggestion`. +2. Appends step log with `status: "failed"`. +3. If this is the final attempt, invokes healing integration (Stage 6b). + +### Stage 6b: Healing Integration (final attempt only) + +`healFailedStep(context)` is called with `runId`, `stepId`, `adapterId`, `errorMessage`, `payload`, and `scope`. + +| Healing result | Action | +|----------------|--------| +| `status === "verified"` | Writes `HEALING_APPLIED_AND_VERIFIED`; increments `maxAttempts` by 1; continues loop for one additional retry | +| `approvalRequired === true` | `markState(bundle, "paused", { ... })`; returns `{ completed: false, paused: true }` | +| Any other status | Writes `HEALING_ATTEMPTED_BUT_ESCALATED`; falls through to automatic rollback | +| `healFailedStep` throws | Writes `HEALING_ENGINE_ERROR`; falls through to automatic rollback | + +### Stage 6c: Automatic Rollback (after final failed attempt) + +If `task.rollbackPayload` exists and `adapter.rollback` is defined: +1. Calls `adapter.rollback(task.rollbackPayload)`. +2. Writes `ROLLBACK_COMPLETED` audit event with `details.automatic: true`. +3. Appends step log with `status: "rolled-back"`. + +After rollback (or if rollback is not available): +- Upserts adapter summary with `status: "failed"`, output includes `fixSuggestion` if present. +- `markState(bundle, "failed", { currentStepIndex: index })`. +- Returns `{ completed: false }`. + +--- + +## Retry Policy + +| Property | Source | Default | +|----------|--------|---------| +| `maxAttempts` | `task.retryPolicy.maxAttempts` | `1` | +| Learning-adjusted limit | `optimizeTasks` → `retryLimit` field | Overrides task default | +| Healing extension | On `healFailedStep` returning `"verified"`, `maxAttempts += 1` | One additional attempt granted | + +**Retryable errors:** All thrown errors are retried up to `maxAttempts`. There is no per-error-type allow-list; the retry decision is purely count-based. Non-retryable conditions (policy block, adapter not found, validation failure) exit before the retry loop. + +**Backoff:** Not currently implemented. All retries execute immediately with no delay. + +--- + +## Adapter Selection + +```typescript +const adapters = createProviderAdapters(); // from packages/adapters/src +const adapter = findAdapter(adapters, task.adapterId); +``` + +`createProviderAdapters` returns all registered adapters. `findAdapter` performs a lookup by `adapterId`. If not found, the stage 3 failure path fires. The adapter interface requires: +- `validate(payload): Promise` +- `execute(payload): Promise<{ success: boolean; output?: unknown; error?: string }>` + +Optional methods that enhance behavior: +- `simulate(payload): Promise<{ risk: ExecutionRisk; summary: string; requiresApproval?: boolean }>` +- `estimateRisk(payload): Promise` +- `verify(payload, result): Promise<{ ok: boolean; summary: string }>` +- `suggestFix(error, payload): Promise` +- `rollback(rollbackPayload): Promise` + +--- + +## Outcome Capture + +`recordRunOutcome` from `outcome-engine.ts` is called in two places: + +| When | `success` | `dominantFailureType` | +|------|-----------|----------------------| +| Bundle loop exits early with `status === "failed"` | `false` | `"step-failed"` | +| All tasks complete successfully | `true` | undefined | + +`computeMetrics(bundle, success)` calculates: +- `timeTakenMs`: sum of `finishedAt - startedAt` across all steps +- `retryCount`: count of steps where `attempt > 1` +- `qualityScore`: `1` if success, `0` otherwise +- `adaptersUsed`: deduplicated list of adapter IDs from step logs + +`recordRunOutcome` delegates to `learnFromOutcome` in the learning engine to update the learning store for future optimizer runs. + +--- + +## Concurrency Model + +Execution is **strictly sequential**. The task loop processes one task at a time. `executeRunBundle` does not use `Promise.all` or the batch queue for task execution. The batch queue (`batch-queue.ts`) is a separate utility used by the action runner for agent-generated action batches; it does not influence the core execution engine loop. + +--- + +## `retryTask` Specification + +1. `loadRunBundle(runId)` — throws if not found. +2. Resolves `stepId`: uses `targetStepId` if provided; otherwise uses `bundle.plan.tasks[bundle.state.currentStepIndex].id`. +3. Finds task index by `id`. Throws if not found. +4. Writes `RunState`: `currentStepIndex = index`, `status = "running"`, `pauseReason = undefined`, calls `updateRunState`. +5. Writes `STEP_RETRY_REQUESTED` audit event. +6. Calls `executeTask(bundle, task, index, actor)`. +7. Reloads bundle from store via `loadRunBundle(runId)` and returns it. + +Note: `retryTask` re-runs the full 6-stage pipeline for the single target task. It does not continue to subsequent tasks after the retry. + +--- + +## `rollbackTask` Specification + +1. `loadRunBundle(runId)` — throws if not found. +2. Resolves `stepId`: uses `targetStepId` if provided; otherwise uses the last entry in `bundle.executionLog.steps`. +3. Finds `PlanTask` by `stepId`. Throws if not found. +4. Resolves adapter via `findAdapter`. Throws if `adapter.rollback` is absent or `task.rollbackPayload` is absent. +5. Calls `adapter.rollback(task.rollbackPayload)`. +6. Writes `TASK_ROLLBACK_MANUAL` audit event with `role = "operator"` if actor is not `"system"`. +7. Appends step log with `status: "rolled-back"`, `title: "{title} manual rollback"`. +8. Upserts adapter summary with `status: "rolled-back"`. +9. Decrements `bundle.state.currentStepIndex` by 1 (if > 0). +10. Calls `updateRunState`. +11. Reloads and returns updated bundle. + +--- + +## Dependencies + +| Dependency | Package | Purpose | +|-----------|---------|---------| +| `adapters/src` | `packages/adapters` | `createProviderAdapters`, `findAdapter` | +| `memory/src/run-store` | `packages/memory` | `loadRunBundle`, `updateAdapters`, `updateExecutionLog`, `updateRunState` | +| `core/src/policy-engine` | `packages/core` | `evaluatePolicy` — allows/blocks tasks | +| `audit/src` | `packages/audit` | `writeAuditEvent` — append-only audit trail | +| `learning/src/store` | `packages/learning` | `loadLearningStore` — historical outcome data | +| `learning/src/execution-optimizer` | `packages/learning` | `optimizeTasks` — adjusts retry limits | +| `healing-integration.ts` | `packages/orchestrator` | `healFailedStep` — invokes healing engine | +| `outcome-engine.ts` | `packages/orchestrator` | `recordRunOutcome` → `learnFromOutcome` | + +--- + +## Edge Cases + +- **Zero tasks in plan:** The task loop exits immediately; `RUN_COMPLETED` is written and `status = "completed"`. +- **Resume at last step:** `currentStepIndex === tasks.length - 1`; only that one step re-runs. +- **`adapter.verify` absent:** Treated as verified; step log records `"No verification hook; accepting successful execution."`. +- **Healing throws:** Caught internally; `HEALING_ENGINE_ERROR` written; automatic rollback proceeds normally. +- **`rollbackTask` with no prior step log:** `bundle.executionLog.steps.at(-1)` returns `undefined`; `stepId` is `undefined`; `task` lookup fails; throws `"Step not found for rollback: undefined"`. +- **`retryTask` on a completed run:** Allowed by implementation — the step is re-executed. Callers should check `bundle.state.status` before calling to avoid redundant retries. +- **Healing extends `maxAttempts` beyond the loop bounds:** The `continue` statement after `maxAttempts += 1` re-enters the `for` loop with the new limit, so execution correctly attempts one more time. + +--- + +## Risks + +| Risk | Likelihood | Impact | Mitigation | +|------|-----------|--------|-----------| +| No backoff between retries causes rapid failure cascade | Medium | Medium | Add configurable backoff to `retryPolicy`; expose `backoffMs` in `PlanTask` | +| `approved` reset after each step may cause re-pause on next high-risk step | Low | Medium | Expected behavior; document that approval is per-step, not per-run | +| Healing extending `maxAttempts` indefinitely if healing keeps succeeding | Low | High | Cap total attempts at a hard limit (e.g., `maxAttempts + 1`, never more) | +| Learning optimizer applying incorrect retry limits from stale store | Medium | Medium | Version the learning store; include store timestamp in optimizer suggestions | +| Manual rollback decrementing index incorrectly when step was not the current one | Medium | Low | `rollbackTask` decrements unconditionally; validate that `currentStepIndex > 0` before decrement | + +--- + +## Definition of Done + +- [ ] `executeRunBundle` processes all tasks sequentially and returns completed bundle when all succeed +- [ ] Policy block at stage 2 produces `status: failed` and correct audit events +- [ ] Adapter not found at stage 3 produces `status: failed` with `ADAPTER_NOT_FOUND` event +- [ ] Approval gating at stage 4b pauses bundle with correct `pauseReason` and `APPROVAL_REQUIRED` event +- [ ] Validation failure at stage 5 includes `fixSuggestion` in step log and audit +- [ ] Retry loop runs `maxAttempts` times before triggering healing +- [ ] Healing `"verified"` result causes one additional retry attempt +- [ ] Healing `approvalRequired` result pauses the run +- [ ] Automatic rollback fires when `rollbackPayload` is present after final failure +- [ ] `recordRunOutcome` called with `success=false` on early loop exit and `success=true` on completion +- [ ] `retryTask` resumes from exact step index with `STEP_RETRY_REQUESTED` audit event +- [ ] `rollbackTask` throws when adapter has no `rollback` method or task has no `rollbackPayload` +- [ ] Learning optimizer suggestions are logged as `OPTIMIZER_SUGGESTIONS_APPLIED` when non-empty +- [ ] Zero-task bundle completes immediately with `RUN_COMPLETED` event diff --git a/docs/03_specs/SPEC_ORCHESTRATOR.md b/docs/03_specs/SPEC_ORCHESTRATOR.md new file mode 100644 index 0000000..188a62f --- /dev/null +++ b/docs/03_specs/SPEC_ORCHESTRATOR.md @@ -0,0 +1,480 @@ +# SPEC — Orchestrator +**Status:** Draft +**Version:** 1.0 +**Linked to:** packages/orchestrator/src/index.ts +**Implements:** System-level orchestration wiring, entry points, mode control, batch queue, trust pipeline, healing hooks, resume/rollback flows, log writing, and canonical event contract + +--- + +## Objective + +Define the complete behavioral contract for the Orchestrator package — the central coordination layer that ties together intake, planning, skill selection, gate evaluation, execution, outcome capture, healing, resume, and rollback into coherent pipelines. This spec describes how all sub-components are connected, which entry points exist, how mode policy propagates, and which events the orchestrator must emit in which order. + +--- + +## Scope + +- System overview and component wiring +- Entry points: `runVerticalSlice`, `runOrchestrationStep`, `resumeRun` +- Mode controller: `getModePolicy`, `trimQuestionsByMode` +- Batch queue: `QueuedBatch` lifecycle and storage +- Trust pipeline: `prepareTrustedBatch` and what it evaluates +- Healing integration hook within execution loop +- Resume flow: state reconstruction and continuation +- Rollback engine: `rollbackRun` and `rollbackTask` +- Log writer: artifact and log file conventions +- Canonical events contract: ordering, required fields, scoping rules + +Out of scope: individual adapter implementations, authentication infrastructure, database schema migrations. + +--- + +## Inputs / Outputs + +| Direction | Item | Type | Description | +|-----------|------|------|-------------| +| Input | `RunVerticalSliceInput` | `{ idea, mode?, dryRun?, approvedGates?, currentRun? }` | Full pipeline trigger | +| Input | `RunReport` | Full report | Single-step trigger via `runOrchestrationStep` | +| Input | `runId` + `approve` flag | Strings | Resume trigger via `resumeRun` | +| Output | `RunVerticalSliceResult` | `{ report, artifactDirectory, artifactReportPath, memoryPath, overallGateStatus, currentPhase }` | Full result after pipeline completes or halts | +| Output | `RunBundle` | Persisted in run-store | State snapshot after any execution step | + +--- + +## Data Structures + +```typescript +// packages/orchestrator/src/run-vertical-slice.ts +interface RunVerticalSliceInput { + idea: string; + mode?: Mode; // default: "builder" + dryRun?: boolean; + approvedGates?: string[]; + currentRun?: RunReport; // inject existing report to continue from checkpoint +} + +interface RunVerticalSliceResult { + report: RunReport; + artifactDirectory: string; + artifactReportPath: string; + memoryPath: string; + overallGateStatus: string; + currentPhase: string; +} + +// packages/orchestrator/src/batch-queue.ts +interface QueuedBatch { + id: string; // "batch_" + random 8-char hex + runId: string; + phase: string; + createdAt: string; + status: QueueStatus; // "pending" | "approved" | "executed" | "blocked" + riskSummary: { low: number; medium: number; high: number }; + generatedBy: string; + summary: string; + batch: BuilderActionBatch; +} + +// packages/orchestrator/src/rollback-metadata.ts +interface RollbackEntry { + type: "file_write" | "file_append" | "dir_create" | "command"; + target: string; // relative path from workspaceRoot + timestamp: string; + note: string; +} + +interface RollbackMetadata { + runId: string; + entries: RollbackEntry[]; +} +``` + +--- + +## Interfaces / APIs + +### Public Exports from `packages/orchestrator/src/index.ts` + +```typescript +// Mode control +export { getModePolicy, trimQuestionsByMode } from "./mode-controller"; + +// Intake +export { runIntake } from "./intake"; + +// Gate evaluation +export { evaluateGates } from "./gate-manager"; + +// Full pipeline + single-step +export { runVerticalSlice } from "./run-vertical-slice"; + +// Task execution +export * from "./execution-engine"; // executeRunBundle, retryTask, rollbackTask + +// Run management +export * from "./resume-run"; // resumeRun, inspectRun +export * from "./rollback-engine"; // rollbackRun +export * from "./outcome-engine"; // recordRunOutcome +export * from "./healing-integration"; // healFailedStep + +// Utility +export * from "./action-runner"; // runActionBatch +export * from "./log-writer"; // writeArtifact, writeJsonRecord, writeActionLog +export * from "./rollback-metadata"; // RollbackEntry, RollbackMetadata types +export * from "./batch-queue"; // createQueuedBatch, listQueuedBatches, etc. +``` + +--- + +## System Overview: Component Wiring + +The following diagram shows how orchestrator sub-components are invoked during a standard `runVerticalSlice` call. + +``` +runVerticalSlice(input) + │ + ├─► getModePolicy(mode) + │ └─► Returns ModePolicy (gateThresholds, execution config) + │ + ├─► runOrchestrationStep(report) [loop until blocked/finished/expert-mode-pause] + │ │ + │ ├─[intake phase]──► runIntake(idea, mode) + │ │ ├─► normalizeIdeaText + │ │ ├─► inferSolutionCategory + │ │ ├─► deriveAssumptions + │ │ ├─► generateClarifyingQuestions + │ │ └─► trimQuestionsByMode(questions, mode) + │ │ + │ ├─[planning phase]─► buildPlanFromClarification(intakeResult) + │ │ + │ ├─[skills phase]──► selectSkills(clarification, plan) + │ │ + │ ├─[gating phase]──► evaluateGates(intakeResult, plan, skills, mode, approvedGates) + │ │ └─► [if needs-review] emitGateAwaitingApproval(state, reason) + │ │ + │ └─[building phase]─► executeRunBundle(bundle, actor) + │ ├─► [pre-loop] optimizeTasks(tasks, learningStore) + │ ├─► [per-task] executeTask(bundle, task, index, actor) + │ │ ├─► writeAuditEvent(TASK_EXECUTION_ATTEMPT) + │ │ ├─► evaluatePolicy(task) + │ │ ├─► findAdapter(adapters, task.adapterId) + │ │ ├─► adapter.simulate / adapter.estimateRisk + │ │ ├─► [if approval needed] markState(paused) + │ │ ├─► adapter.validate + │ │ ├─► adapter.execute + adapter.verify [retry loop] + │ │ ├─► [on final failure] healFailedStep(context) + │ │ │ └─► attemptHealing (healing-engine) + │ │ └─► [on heal fail] adapter.rollback(rollbackPayload) + │ │ + │ └─► [on completion] recordRunOutcome → learnFromOutcome + │ + └─► recordRun(report) + └─► Persists to memory; returns artifactDirectory, reportPath, memoryPath +``` + +--- + +## Mode Controller + +`getModePolicy(mode: Mode): ModePolicy` returns a `ModePolicy` object containing: + +- `maxClarifyingQuestions`: upper bound on questions surfaced to user +- `gateThresholds`: numeric thresholds used by all 5 gate evaluators +- `execution`: flags controlling approval requirements and dry-run behavior + +`trimQuestionsByMode(questions: T[], mode: Mode): T[]`: +- Sorts questions by priority weight: `required` (100) > `critical` (90) > `high` (70) > `medium` (50) > `low` (30) > default (40) +- Slices to `policy.maxClarifyingQuestions` +- Used in `intake.ts` to limit clarifying questions surfaced per mode + +### Mode Policy Summary + +| Mode | Max Questions | Medium Risk Approval | High Risk Approval | Dry Run Default | Command Exec | +|------|--------------|---------------------|-------------------|-----------------|-------------| +| turbo | 2 | No | Yes | No | Yes | +| builder | 5 | Yes | Yes | No | Yes | +| pro | 8 | Yes | Yes | Yes | Yes | +| expert | 15 | Yes | Yes | Yes | Yes | +| safe | 20 | Yes | Yes | Yes | No | +| balanced | 10 | Yes | Yes | Yes | Yes | +| god | 0 | No | No | No | Yes | + +--- + +## Batch Queue + +The batch queue is a file-backed queue stored at `{workspaceRoot}/.ck/queue/{id}.json`. It is used by the action runner to stage agent-generated `BuilderActionBatch` objects for approval or execution. + +### Lifecycle + +``` +createQueuedBatch(params) → writes {id}.json with status "pending" +updateQueuedBatchStatus(id) → overwrites file with new status +getQueuedBatch(id) → reads and parses {id}.json +listQueuedBatches(runId?) → reads all .json files, filters by runId, sorts by createdAt +``` + +### Status Transitions + +``` +pending → approved (human approves the batch) +pending → blocked (gate evaluation or policy rejects) +approved → executed (action runner processes the batch) +``` + +### Risk Summary + +Each `QueuedBatch` carries a `riskSummary: { low, medium, high }` count summarizing the actions in its `BuilderActionBatch`. This is used to surface a concise approval prompt to operators. + +### Concurrency + +No locking mechanism. Concurrent writes to the same batch file will result in last-writer-wins. Consumers must not assume atomic updates across multiple batches. + +--- + +## Trust Pipeline + +`prepareTrustedBatch` in `trust-pipeline.ts` is called before a `BuilderActionBatch` is executed to establish cryptographic provenance and a diff preview. + +### What It Evaluates and Produces + +1. **Diff preview:** `writeDiffPreview(workspaceRoot, batch)` generates a human-readable preview of all file changes in the batch. Written to the artifact store. +2. **Provenance record:** `createBatchProvenance({ batch, sourcePhase, sourceArtifact, actor })` captures who generated the batch and from which phase/artifact. +3. **Batch signing:** `signBatch({ batch, provenance, secret })` produces a signed envelope (`BatchSignedEnvelope`) using the provided `signingSecret`. Written to the workspace as a signature file. + +### Return Values + +```typescript +{ + diffArtifactPath: string; // path to generated diff preview + provenancePath: string; // path to provenance JSON + signaturePath: string; // path to signed batch envelope + envelope: BatchSignedEnvelope; +} +``` + +### When It Is Invoked + +`prepareTrustedBatch` is called by components that generate and stage action batches before execution (e.g., agent-generated file write batches). It is not automatically invoked by `executeRunBundle` — it is an opt-in call from the action runner or phase-level code that produces batch artifacts. + +--- + +## Healing Integration Hook + +`healFailedStep(context: FailedStepContext): Promise` is the orchestrator's integration point with the healing engine. It is invoked exclusively from within `executeTask` in `execution-engine.ts`, on the final retry attempt of a failing task. + +```typescript +interface FailedStepContext { + runId: string; + stepId: string; + adapterId: string; + errorMessage: string; + payload?: Record; + workingDirectory?: string; + scope?: ExecutionScope; +} +``` + +`healFailedStep` delegates to `attemptHealing` from `packages/healing/src/healing-engine`. The `HealingAttempt` return type (from `packages/shared/src/phase10_5-types`) carries: + +| Field | Meaning | +|-------|---------| +| `status: "verified"` | Healing applied and re-validated; execution engine grants one more retry | +| `approvalRequired: true` | Healing strategy selected but requires human approval; run pauses | +| Any other status | Healing could not resolve; execution engine proceeds to automatic rollback | + +The orchestrator does not retry healing more than once per step failure. Healing outcome is recorded via `writeAuditEvent` with one of: `HEALING_APPLIED_AND_VERIFIED`, `HEALING_ATTEMPTED_BUT_ESCALATED`, or `HEALING_ENGINE_ERROR`. + +--- + +## Resume Flow + +`resumeRun(runId: string, approve: boolean, actor: string): Promise` reconstructs and continues a paused run. + +### Steps + +1. `loadRunBundle(runId)` loads the full `RunBundle` from the memory store. Throws `"Run not found: {runId}"` if absent. +2. If `approve === true`: + - Sets `bundle.state.approved = true` + - Sets `bundle.state.approvalRequired = false` + - Sets `bundle.state.updatedAt = now()` + - Calls `updateRunState(runId, bundle.state)` to persist the approval +3. If `bundle.state.status === "completed"`, returns immediately without re-executing. +4. Calls `executeRunBundle(bundle, actor)` which resumes from `bundle.state.currentStepIndex`. + +### Phase-level Resume + +`runVerticalSlice` supports phase-level resume via `input.currentRun`. When provided: +- The report's `completedPhases`, `currentPhase`, `approvedGates`, and all artifacts are preserved. +- The orchestration loop begins at `currentPhase` without re-running already-completed phases. +- This enables resuming a run that was blocked at gating after adding new gate approvals. + +--- + +## Rollback Engine + +`rollbackRun(workspaceRoot: string, runId: string): RollbackOutcome` coordinates a full multi-step rollback based on persisted rollback metadata files. + +### How It Works + +1. Reads all files matching `{workspaceRoot}/.ck/logs/{runId}/*-rollback.json`. +2. Processes files in **reverse chronological order** (most recent first). +3. For each file, processes entries in **reverse order** (last action undone first). + +### Entry Type Handling + +| Entry Type | Action | Can Be Reverted? | +|-----------|--------|-----------------| +| `file_write` | `fs.unlinkSync(target)` if file exists | Yes | +| `dir_create` | `fs.rmdirSync(target)` if empty | Conditional (non-empty dirs skipped) | +| `file_append` | Skipped with note | No (manual only) | +| `command` | Skipped with note | No (manual only) | +| Unknown | Skipped with note | No | + +### Return Value + +```typescript +interface RollbackOutcome { + runId: string; + attempted: number; // total entries processed + reverted: number; // successfully undone + skipped: number; // not undoable automatically + notes: string[]; // human-readable log of each action +} +``` + +### Relationship to `rollbackTask` + +`rollbackRun` (in `rollback-engine.ts`) operates on filesystem metadata records — it is a coarse-grained undo of file system changes. `rollbackTask` (in `execution-engine.ts`) calls `adapter.rollback(rollbackPayload)` which is a fine-grained, adapter-aware undo of a single task's side effects. Both can be used independently. + +--- + +## Log Writer + +`log-writer.ts` provides three file-writing utilities: + +### `writeArtifact(workspaceRoot, runId, phase, markdown): string` +- Path: `{workspaceRoot}/.ck/artifacts/{runId}/{phase}.md` +- Used to persist markdown reports for each phase execution +- Returns the full written path + +### `writeJsonRecord(workspaceRoot, bucket, runId, filename, payload): string` +- Path: `{workspaceRoot}/.ck/{bucket}/{runId}/{filename}` +- Used for structured JSON records (e.g., gate results, plan artifacts) +- Returns the full written path + +### `writeActionLog(workspaceRoot, runId, filename, payload): string` +- Path: `{workspaceRoot}/.ck/logs/{runId}/{filename}` +- Accepts string or JSON-serializable payload +- Used for rollback metadata files and action execution logs +- Returns the full written path + +All three functions call `fs.mkdirSync(dir, { recursive: true })` before writing, so directories are always created as needed. + +--- + +## Canonical Events Contract + +The orchestrator must emit the following events in the following order during a standard run. All events are published via `publishEvent(eventType, payload)` from `packages/events/src`. + +### Event Emission Rules + +1. Events are only emitted when `RunState.orgId` and `RunState.workspaceId` are present. Missing tenant scope silently skips emission. +2. Every event payload includes: `runId`, `tenant: { orgId, workspaceId, projectId }`, `actor: { id, type, authMode }`, `correlationId`. +3. Events are fire-and-forget from the orchestrator's perspective; emission errors do not halt execution. + +### Canonical Sequence for a Successful Run + +| Order | Event Type | Emitted By | Trigger | +|-------|-----------|-----------|---------| +| 1 | `execution.started` | phase-engine.ts building handler | Before `executeRunBundle` is called | +| 2 | `execution.completed` | phase-engine.ts building handler | After `executeRunBundle` returns with `status: "completed"` | + +### Canonical Sequence for a Paused Run (approval required) + +| Order | Event Type | Emitted By | Trigger | +|-------|-----------|-----------|---------| +| 1 | `execution.started` | phase-engine.ts building handler | Before `executeRunBundle` is called | +| 2 | `gate.awaiting_approval` | phase-engine.ts building handler | After `executeRunBundle` returns with `status: "paused"` | + +### Canonical Sequence for a Gating Pause + +| Order | Event Type | Emitted By | Trigger | +|-------|-----------|-----------|---------| +| 1 | `gate.awaiting_approval` | phase-engine.ts gating handler | When `evaluateGates` returns `overallStatus: "needs-review"` | + +### Canonical Sequence for a Failed Run + +| Order | Event Type | Emitted By | Trigger | +|-------|-----------|-----------|---------| +| 1 | `execution.started` | phase-engine.ts building handler | Before `executeRunBundle` is called | +| 2 | `execution.failed` | phase-engine.ts building handler | After `executeRunBundle` returns with `status: "failed"` | + +### Additional Events (verification) + +| Event Type | Emitted By | Trigger | +|-----------|-----------|---------| +| `verification.completed` | Callers using `emitVerificationCompleted` helper | After adapter verification step, if caller opts in | + +--- + +## Dependencies + +| Dependency | Package | Purpose | +|-----------|---------|---------| +| `intake.ts` | `packages/orchestrator` | Phase 1: idea normalization and clarification | +| `planner.ts` | `packages/orchestrator` | Phase 2: task plan generation | +| `skill-engine` | `packages/skill-engine` | Phase 3: skill selection | +| `gate-manager.ts` | `packages/orchestrator` | Phase 4: gate evaluation | +| `execution-engine.ts` | `packages/orchestrator` | Phase 5: task execution | +| `mode-controller.ts` | `packages/orchestrator` | Policy per mode | +| `events.ts` | `packages/orchestrator` | Orchestrator-scoped event emission helpers | +| `memory/src` | `packages/memory` | `recordRun`, `loadRunBundle`, `updateRunState` | +| `healing/src` | `packages/healing` | `attemptHealing` via `healing-integration.ts` | +| `learning/src` | `packages/learning` | `learnFromOutcome` via `outcome-engine.ts` | +| `security/src` | `packages/security` | Batch provenance and signing via `trust-pipeline.ts` | +| `audit/src` | `packages/audit` | `writeAuditEvent` | +| `events/src` | `packages/events` | `publishEvent` | + +--- + +## Edge Cases + +- **`runVerticalSlice` with `mode === "expert"`:** Only one phase executes per call. The caller must re-invoke with the returned report as `currentRun` to advance. `shouldContinue` returns `false` immediately in expert mode. +- **`runVerticalSlice` with `mode === "turbo"`:** The inner `while` loop continues until `isFinished === true` or `status !== "in-progress"`, resulting in a fully autonomous single-call pipeline. +- **`prepareTrustedBatch` called without signing secret:** `signBatch` receives an empty string; signing behavior depends on the security package's handling of empty secrets — this should be treated as an error. +- **`rollbackRun` with no rollback log directory:** Returns `RollbackOutcome` with `attempted=0`, `reverted=0`, and note `"No rollback logs found."` — a safe no-op. +- **Event emission with partial tenant scope (orgId present but workspaceId missing):** `emitOrchestratorEvent` silently skips; no warning is logged by default (there is a commented-out `console.warn` in the source). +- **`resumeRun` called on a completed run:** Returns the bundle immediately without re-execution. Idempotent. +- **Batch queue file corruption:** `getQueuedBatch` will throw a JSON parse error. No error recovery is built in; the file must be manually repaired or deleted. + +--- + +## Risks + +| Risk | Likelihood | Impact | Mitigation | +|------|-----------|--------|-----------| +| Event emission skipped silently when tenant scope is missing | High | Medium | Enable the commented-out `console.warn`; add telemetry for skipped event count | +| Batch queue race condition on concurrent `createQueuedBatch` with identical random IDs | Very Low | Low | `Math.random().toString(36).slice(2,10)` collision probability is negligible; add UUID for production | +| `rollbackRun` deletes files without confirming they were created by this run | Low | High | Rollback metadata entries should include a run-scoped checksum or creation marker | +| Trust pipeline signing secret passed as plain string in function call | Medium | High | Move `signingSecret` to environment variable retrieval inside `trust-pipeline.ts` | +| `runVerticalSlice` has no timeout; turbo mode loops indefinitely on phase errors | Low | Medium | Add a maximum phase iteration count guard inside the `while` loop | + +--- + +## Definition of Done + +- [ ] `runVerticalSlice` completes full 8-phase pipeline in builder mode with a valid idea +- [ ] `runVerticalSlice` stops after phase 1 in expert mode and returns correct `currentPhase` +- [ ] `runVerticalSlice` loops to completion in turbo mode without requiring multiple calls +- [ ] `resumeRun(runId, approve=true)` resumes a paused bundle from the correct `currentStepIndex` +- [ ] `resumeRun` on a completed run returns the bundle without re-executing +- [ ] `rollbackRun` processes rollback files in reverse chronological order with entries in reverse order +- [ ] `rollbackRun` skips `file_append` and `command` entries with appropriate notes +- [ ] `prepareTrustedBatch` produces a diff artifact, provenance file, and signature file +- [ ] All 5 canonical event types emit with correct `runId`, `tenant`, `actor`, `correlationId` fields +- [ ] Events are not emitted when `orgId` or `workspaceId` is missing from `RunState` +- [ ] Batch queue `listQueuedBatches` returns results sorted by `createdAt` ascending +- [ ] `getModePolicy("god")` returns policy with `requireApprovalForHighRisk: false` +- [ ] `trimQuestionsByMode` returns no more than `maxClarifyingQuestions` items sorted by priority weight +- [ ] `writeArtifact`, `writeJsonRecord`, and `writeActionLog` create parent directories automatically diff --git a/docs/03_specs/SPEC_PHASE_ENGINE.md b/docs/03_specs/SPEC_PHASE_ENGINE.md new file mode 100644 index 0000000..2bd0801 --- /dev/null +++ b/docs/03_specs/SPEC_PHASE_ENGINE.md @@ -0,0 +1,319 @@ +# SPEC — Phase Engine +**Status:** Draft +**Version:** 1.0 +**Linked to:** packages/orchestrator/src/phase-engine.ts +**Implements:** Sequential phase execution pipeline, per-phase contracts, runOrchestrationStep function, and mode-influenced behavior + +--- + +## Objective + +Define the complete behavioral contract for the Phase Engine — the component that orchestrates a run through eight sequential phases from idea intake to deployment. This spec covers the function signature and return semantics of `runOrchestrationStep`, the interface every phase handler must satisfy, how mode policy shapes each phase, how progress is checkpointed for resume, and which audit events are emitted at each phase boundary. + +--- + +## Scope + +- 8-phase sequence definition and ordering +- `PhaseHandler` interface contract +- `runOrchestrationStep` function specification +- Per-phase: inputs, outputs, preconditions, postconditions +- Gating phase: 5-gate sequential evaluation and short-circuit logic +- Building phase: full execution delegation to execution engine +- Mode influence table per phase +- Checkpointing and resume behavior +- Audit events emitted per phase transition + +Out of scope: individual gate logic (see `SPEC_RUN_LIFECYCLE.md`), adapter implementation, and post-deployment observability. + +--- + +## Inputs / Outputs + +| Direction | Item | Type | Description | +|-----------|------|------|-------------| +| Input | `RunReport` | `RunReport` | Full report snapshot at the start of each step | +| Output | `RunReport & { isFinished: boolean }` | Extended report | Updated report with new phase, status, and finished flag | + +--- + +## Data Structures + +```typescript +// Defined in packages/orchestrator/src/phase-engine.ts + +export interface PhaseContext { + idea: string; // normalized idea text from report.input.idea + mode: Mode; // execution mode from report.input.mode + approvedGates: string[]; // list of gate IDs manually approved + report: Partial; // full report snapshot passed into the step +} + +export type PhaseHandler = (context: PhaseContext) => Promise<{ + nextPhase: Phase | null; // null signals the pipeline is finished + updates: Partial; // fields to merge into RunReport + status: "success" | "blocked" | "awaiting-approval"; +}>; + +// Phase sequence order (PHASE_HANDLERS keys, evaluated left to right) +type Phase = "intake" | "planning" | "skills" | "gating" | "building" + | "testing" | "reviewing" | "deployment"; +``` + +--- + +## Interfaces / APIs + +### `runOrchestrationStep` + +```typescript +export async function runOrchestrationStep( + report: RunReport +): Promise +``` + +**Behavior:** +1. Look up the handler for `report.currentPhase` in `PHASE_HANDLERS`. +2. Build a `PhaseContext` from the report (idea, mode, approvedGates, report). +3. Invoke the handler and await the result. +4. Merge `result.updates` onto `report` and set `updatedAt = now()`. +5. Apply status and phase advancement logic: + - `result.status === "success"` + `nextPhase != null` → advance `currentPhase`, push completed phase into `completedPhases`, set `status = "in-progress"`, `isFinished = false` + - `result.status === "success"` + `nextPhase === null` → set `status = "success"`, `isFinished = true` + - `result.status === "blocked"` → set `status = "blocked"`, `isFinished = false` + - `result.status === "awaiting-approval"` → set `status = "awaiting-approval"`, `isFinished = false` +6. Return the merged report with `isFinished` attached. + +**Error handling:** If `PHASE_HANDLERS[report.currentPhase]` is undefined, throws `Error("Unknown phase: {phase}")`. Individual phase handlers may throw; errors propagate to the caller (`runVerticalSlice`). + +--- + +## Phase Sequence Definition + +``` +intake → planning → skills → gating → building → testing → reviewing → deployment +``` + +Phases are strictly sequential. No phase can be skipped by the engine itself (only mode policy can reduce their work). Each phase reads exclusively from `PhaseContext` and writes exclusively to `Partial` via its `updates` return value. + +--- + +## Per-Phase Contracts + +### Phase 1: intake + +| | Detail | +|---|---| +| **Preconditions** | `context.idea` must be a non-empty string | +| **Inputs consumed** | `context.idea`, `context.mode` | +| **Operations** | `runIntake({ idea, mode })` → calls `normalizeIdeaText`, `inferSolutionCategory`, `deriveAssumptions`, `generateClarifyingQuestions`, `trimQuestionsByMode` | +| **Outputs** | `updates.intakeResult`, `updates.assumptions`, `updates.clarifyingQuestions` | +| **Next phase** | `"planning"` | +| **Postconditions** | `report.intakeResult` is populated; `report.clarifyingQuestions` is trimmed to mode limit | +| **Can block?** | No — always returns `status: "success"` | + +### Phase 2: planning + +| | Detail | +|---|---| +| **Preconditions** | `context.report.intakeResult` must exist; throws otherwise | +| **Inputs consumed** | `context.report.intakeResult` | +| **Operations** | `buildPlanFromClarification(intakeResult)` → produces task list | +| **Outputs** | `updates.plan` | +| **Next phase** | `"skills"` | +| **Postconditions** | `report.plan` is a non-empty `Task[]` | +| **Can block?** | No | + +### Phase 3: skills + +| | Detail | +|---|---| +| **Preconditions** | `context.report.intakeResult` and `context.report.plan` must exist; throws otherwise | +| **Inputs consumed** | `context.report.intakeResult`, `context.report.plan` | +| **Operations** | `selectSkills({ clarification, plan })` from skill-engine | +| **Outputs** | `updates.selectedSkills` | +| **Next phase** | `"gating"` | +| **Postconditions** | `report.selectedSkills` is a `SelectedSkill[]` | +| **Can block?** | No | + +### Phase 4: gating + +| | Detail | +|---|---| +| **Preconditions** | `intakeResult`, `plan`, `selectedSkills` must all be populated | +| **Inputs consumed** | All three artifacts plus `context.mode`, `context.approvedGates` | +| **Operations** | `evaluateGates(...)` → evaluates 5 gates sequentially | +| **Outputs** | `updates.gates`, `updates.overallGateStatus`, `updates.status` | +| **Next phase** | `"building"` if `overallStatus === "pass"`; stays `"gating"` otherwise | +| **Postconditions** | `report.gates` contains 5 `GateDecision` entries; `overallGateStatus` is one of `pass \| needs-review \| blocked` | +| **Can block?** | Yes — `blocked` propagates as handler `status: "blocked"` | +| **Can pause?** | Yes — `needs-review` propagates as `status: "awaiting-approval"`, emits `gate.awaiting_approval` | + +### Phase 5: building + +| | Detail | +|---|---| +| **Preconditions** | `context.report.id` or auto-generated runId; `context.report.input` required to initialize a new bundle | +| **Inputs consumed** | Existing `RunBundle` from `loadRunBundle(runId)` or initialized from context | +| **Operations** | `emitExecutionStarted` → `executeRunBundle(bundle)` → emit completion/failure/pause event | +| **Outputs** | `updates.id`, `updates.summary`, `updates.status` | +| **Next phase** | `"testing"` if `isCompleted`; stays `"building"` if paused or failed | +| **Postconditions** | `bundle.state.status` reflects actual execution outcome; `RunBundle` persisted to run-store | +| **Can block?** | No — delegates failure to execution engine | +| **Can pause?** | Yes — when `bundle.state.status === "paused"`, returns `awaiting-approval` | + +### Phase 6: testing + +| | Detail | +|---|---| +| **Preconditions** | Building phase must have completed | +| **Operations** | Simulated — returns immediately with placeholder summary | +| **Outputs** | `updates.summary = "Testing phase completed (SIMULATED)."` | +| **Next phase** | `"reviewing"` | +| **Can block?** | No | + +### Phase 7: reviewing + +| | Detail | +|---|---| +| **Operations** | Simulated — returns immediately with placeholder summary | +| **Next phase** | `"deployment"` | +| **Can block?** | No | + +### Phase 8: deployment + +| | Detail | +|---|---| +| **Operations** | Simulated — returns immediately with success summary | +| **Outputs** | `updates.status = "success"`, `updates.summary = "Deployment completed (SIMULATED). Pipeline finished."` | +| **Next phase** | `null` — signals `isFinished = true` to `runOrchestrationStep` | +| **Can block?** | No | + +--- + +## Phase Transition Rules + +A phase advances to its `nextPhase` if and only if its handler returns `status: "success"`. The following conditions gate advancement: + +- **Blocked:** The handler returned `status: "blocked"`. `currentPhase` does not change. `runOrchestrationStep` returns with `status: "blocked"` and `isFinished: false`. The caller must resolve the block before re-invoking. +- **Awaiting approval:** The handler returned `status: "awaiting-approval"`. `currentPhase` does not change. The run persists in its current phase until a resume call grants approval. +- **Finished:** `nextPhase === null` and `status: "success"`. `isFinished = true`; `runReport.status = "success"`. + +--- + +## Gating Phase: 5-Gate Sequential Evaluation + +`evaluateGates` in `gate-manager.ts` runs the following 5 evaluators in order. Short-circuit semantics apply at the `getOverallGateStatus` aggregation level, not at individual gate evaluation — all 5 gates always evaluate, but the first `blocked` result wins overall. + +| # | Gate ID | Evaluator | Block Condition | Review Condition | +|---|---------|-----------|-----------------|-----------------| +| 1 | `objective-clarity` | `evaluateObjectiveClarityGate` | No normalized idea | Category is `"unknown"` or `"unclear"` | +| 2 | `requirements-completeness` | `evaluateRequirementsCompletenessGate` | Questions ≥ `maxQuestionsBeforeBlock` | Questions ≥ `maxQuestionsBeforeReview` | +| 3 | `plan-readiness` | `evaluatePlanReadinessGate` | Zero tasks in plan | Tasks < `minimumPlanTasks` or no dependencies | +| 4 | `skill-coverage` | `evaluateSkillCoverageGate` | Zero skills selected | Skills < `minimumSelectedSkills` or no specialist skills | +| 5 | `ambiguity-risk` | `evaluateAmbiguityRiskGate` | Questions ≥ `ambiguityBlockThreshold` | Assumptions > 6 or questions ≥ `ambiguityReviewThreshold` | + +**Turbo mode override:** After all 5 decisions are computed, any gate with `status === "needs-review"` is rewritten to `status === "pass"` with reason suffix `"(AUTO-PASSED VIA TURBO)"` and `shouldPause = false`. + +**Manual approval override:** Any gate whose `gate` ID appears in `context.approvedGates` is rewritten to `status === "pass"` with reason suffix `"(MANUALLY APPROVED)"` and `shouldPause = false`. This override is applied before the turbo override. + +--- + +## Building Phase: Execution Delegation + +When the building phase handler runs: + +1. **Load or initialize `RunBundle`:** `loadRunBundle(runId)` is attempted. On miss, a new bundle is constructed from context (intake artifact, plan artifact, run state initialized to `status: "planned"`, `currentStepIndex: 0`). +2. **Persist artifacts:** `updateIntake`, `updatePlan`, `updateRunState` are called before execution begins. +3. **Emit start event:** `emitExecutionStarted(bundle.state)` fires `execution.started` event (requires `orgId` + `workspaceId`). +4. **Delegate to execution engine:** `executeRunBundle(bundle)` runs the full 6-step per-task pipeline (see `SPEC_EXECUTION_ENGINE.md`). +5. **Emit outcome event:** Based on returned bundle state: `emitExecutionCompleted`, `emitExecutionFailed`, or `emitGateAwaitingApproval`. +6. **Return phase result:** Status maps as — completed → `"success"` + nextPhase `"testing"`; paused → `"awaiting-approval"` + nextPhase `"building"`; failed → handler `status: "failure"` (treated as non-success by engine) + nextPhase `"building"`. + +--- + +## Mode Influence Per Phase + +| Phase | turbo | builder | pro | expert | safe | balanced | god | +|-------|-------|---------|-----|--------|------|----------|-----| +| intake | max 2 questions | max 5 | max 8 | max 15 | max 20 | max 10 | max 0 | +| planning | no change | no change | no change | no change | no change | no change | no change | +| skills | min 1 skill required | min 2 | min 2 | min 3 | min 3 | min 2 | min 0 | +| gating | needs-review auto-passed | standard | stricter thresholds | strictest thresholds | most strict | moderate | all gates effectively bypassed | +| building | medium-risk no approval | medium+high require approval | dry-run default | dry-run default | no commands allowed | dry-run + approval | no approval required for any risk | +| testing/reviewing/deployment | simulated | simulated | simulated | simulated | simulated | simulated | simulated | + +--- + +## Audit Events Per Phase + +| Phase | Audit Action | Source | +|-------|-------------|--------| +| gating (needs-review) | `gate.awaiting_approval` via `emitGateAwaitingApproval` | phase-engine.ts gating handler | +| building (start) | `execution.started` via `emitExecutionStarted` | phase-engine.ts building handler | +| building (complete) | `execution.completed` via `emitExecutionCompleted` | phase-engine.ts building handler | +| building (failed) | `execution.failed` via `emitExecutionFailed` | phase-engine.ts building handler | +| building (paused) | `gate.awaiting_approval` via `emitGateAwaitingApproval` | phase-engine.ts building handler | + +Per-step audit events (`TASK_EXECUTION_ATTEMPT`, `STEP_EXECUTION_STARTED`, `STEP_EXECUTION_SUCCEEDED`, `STEP_EXECUTION_FAILED`, etc.) are emitted by `execution-engine.ts` via `writeAuditEvent`, not the phase engine. + +--- + +## Checkpointing + +Partial progress is preserved for resume through the following mechanism: + +1. **Per-task checkpoint:** After each successful `executeTask`, `bundle.state.currentStepIndex` is incremented and `updateRunState` is called immediately. +2. **On pause:** `markState(bundle, "paused", { currentStepIndex: index, ... })` persists the exact index of the paused task. +3. **On resume:** `resumeRun(runId, approve=true)` calls `loadRunBundle(runId)` to reload state from the store, then calls `executeRunBundle(bundle)` which starts the loop at `bundle.state.currentStepIndex` — the exact step that was paused. +4. **`runVerticalSlice` continuation:** Supports `input.currentRun` parameter to inject an existing `RunReport` (carrying `completedPhases`, `currentPhase`, etc.) for phase-level resume without re-running completed phases. + +--- + +## Dependencies + +| Dependency | Package | Purpose | +|-----------|---------|---------| +| `intake.ts` | `packages/orchestrator` | Phase 1 operations | +| `planner.ts` | `packages/orchestrator` | Phase 2 operations | +| `skill-engine` | `packages/skill-engine` | Phase 3 skill selection | +| `gate-manager.ts` | `packages/orchestrator` | Phase 4 gate evaluation | +| `execution-engine.ts` | `packages/orchestrator` | Phase 5 task execution | +| `events.ts` | `packages/orchestrator` | Orchestrator-scoped event emission | +| `run-store` | `packages/memory` | Bundle persistence and retrieval | +| `mode-controller.ts` | `packages/orchestrator` | Policy per mode for question trimming | + +--- + +## Edge Cases + +- **Unknown phase value:** `PHASE_HANDLERS[report.currentPhase]` returns `undefined`; `runOrchestrationStep` throws immediately before any state mutation. +- **Missing `intakeResult` at planning phase:** Handler throws `"Intake result missing"` — this indicates an out-of-sequence call; caller must ensure phases run in order. +- **Building phase with no `context.report.input`:** Throws `"Cannot initialize RunBundle: ctx.report.input is undefined."` — only safe when bundle already exists in the store. +- **`runVerticalSlice` in expert mode:** Executes exactly one phase and returns, regardless of `isFinished`. The caller must call `runVerticalSlice` again with the returned report as `input.currentRun` to advance. +- **Re-entering gating after manual approval:** If `POST /v1/gates/{id}/approve` adds a gate to `approvedGates`, the caller must re-invoke the run from the gating phase with the updated `approvedGates` list; the phase engine does not automatically re-evaluate. + +--- + +## Risks + +| Risk | Likelihood | Impact | Mitigation | +|------|-----------|--------|-----------| +| Phase state divergence if handler throws mid-update | Medium | High | Ensure all `updates` are built before any async operation; wrap handler calls in try/catch at `runOrchestrationStep` level | +| Testing/reviewing/deployment phases returning success without real work | High | Medium | These phases are explicitly marked `(SIMULATED)` — replace with real implementations before production readiness | +| `buildInitialPlan` in building phase diverging from `buildPlanFromClarification` used in planning | Medium | High | Planning phase result must be propagated into building phase via `RunBundle.plan`; avoid re-planning from raw input | + +--- + +## Definition of Done + +- [ ] All 8 phases have unit tests asserting correct `nextPhase` and `updates` shape +- [ ] `runOrchestrationStep` tested with each possible handler return status: success/blocked/awaiting-approval +- [ ] Gating phase tests cover all 5 gates individually and in combination (block + needs-review) +- [ ] Turbo auto-pass override tested against a `needs-review` gate result +- [ ] Manual approval override tested against a gate that would otherwise block +- [ ] Building phase tested for completed, paused, and failed execution engine outcomes +- [ ] Checkpointing test: pause at step N, resume, verify execution continues at step N not step 0 +- [ ] Expert mode single-step test: `runVerticalSlice` returns after first phase with `isFinished: false` +- [ ] `runVerticalSlice` with `currentRun` input correctly skips already-completed phases +- [ ] Unknown phase throws typed error — confirmed not a silent no-op diff --git a/docs/03_specs/SPEC_RUN_LIFECYCLE.md b/docs/03_specs/SPEC_RUN_LIFECYCLE.md new file mode 100644 index 0000000..391d9df --- /dev/null +++ b/docs/03_specs/SPEC_RUN_LIFECYCLE.md @@ -0,0 +1,311 @@ +# SPEC — Run Lifecycle State Machine +**Status:** Draft +**Version:** 1.0 +**Linked to:** docs/02_architecture/ARCHITECTURE.md +**Implements:** Complete state machine for RunStatus, StepStatus, and GateStatus across all phases of execution + +--- + +## Objective + +Define the canonical state machine governing the lifecycle of a Run in Code-Kit-Ultra. This spec establishes every valid state, every legal transition, the trigger that causes each transition, the side effects that must occur (events emitted, DB writes, tokens issued), and the invariants that must hold at all times. All runtime components — orchestrator, execution engine, gate manager, and API layer — must conform to this machine. + +--- + +## Scope + +- `RunStatus` state machine (top-level run lifecycle) +- `StepStatus` state machine (per-task execution within a run) +- `GateStatus` state machine (per-gate evaluation result) +- Phase-to-status mapping +- API endpoints that drive state transitions +- Error states and recovery paths +- Invariants and consistency rules + +Out of scope: individual adapter behavior, billing lifecycle, authentication flows. + +--- + +## Inputs / Outputs + +| Direction | Item | Type | Description | +|-----------|------|------|-------------| +| Input | Run creation request | `RunVerticalSliceInput` | idea, mode, dryRun, approvedGates | +| Input | Resume request | runId + approve flag | Triggers paused → running transition | +| Input | Gate approval | gateId, actorId | Satisfies a `needs-review` gate | +| Input | Cancel request | runId, actorId | Triggers running/paused → cancelled | +| Output | `RunBundle` | Persistent bundle in run-store | Full snapshot of run state at any point | +| Output | Audit events | `writeAuditEvent` calls | Append-only audit log per transition | +| Output | Orchestrator events | `publishEvent` via events.ts | Scoped to orgId + workspaceId | + +--- + +## Data Structures + +```typescript +// Defined in packages/shared/src/types.ts +type RunStatus = "planned" | "running" | "paused" | "completed" | "failed" | "cancelled"; +type StepStatus = "pending" | "running" | "success" | "failed" | "paused" | "skipped" | "rolled-back"; +type GateStatus = "pass" | "fail" | "needs-review" | "blocked" | "pending"; + +interface RunState { + runId: string; + createdAt: string; + updatedAt: string; + currentStepIndex: number; + status: RunStatus; + approvalRequired: boolean; + approved: boolean; + pauseReason?: string; + orgId?: string; + workspaceId?: string; + projectId?: string; + actorId?: string; + actorType?: ActorType; + correlationId?: string; +} + +interface StepExecutionLog { + stepId: string; + title: string; + adapter: string; + attempt: number; + status: StepStatus; + startedAt: string; + finishedAt?: string; + output?: string; + error?: string; + rollbackAvailable: boolean; + risk?: ExecutionRisk; + simulationSummary?: string; + verificationStatus?: "passed" | "failed"; + verificationSummary?: string; + fixSuggestion?: string; +} +``` + +--- + +## Interfaces / APIs + +### API Endpoints That Trigger Transitions + +| Endpoint | Transition | Required Role | +|----------|-----------|---------------| +| `POST /v1/runs` | `planned → running` | operator, admin | +| `POST /v1/runs/{id}/resume` | `paused → running` | operator, admin | +| `POST /v1/runs/{id}/cancel` | `running/paused → cancelled` | operator, admin | +| `POST /v1/gates/{id}/approve` | gate: `needs-review → pass` | reviewer, admin | +| `POST /v1/gates/{id}/reject` | gate: `needs-review → blocked` | reviewer, admin | + +--- + +## RunStatus State Machine + +```mermaid +stateDiagram-v2 + [*] --> planned : POST /v1/runs created + + planned --> running : runVerticalSlice or executeRunBundle called + running --> paused : approvalRequired gate encountered + running --> completed : all tasks succeed + running --> failed : task fails after all retries and healing + running --> cancelled : POST /v1/runs/{id}/cancel + + paused --> running : POST /v1/runs/{id}/resume (approve=true) + paused --> cancelled : POST /v1/runs/{id}/cancel + + completed --> [*] + failed --> [*] + cancelled --> [*] +``` + +### Transition Details + +| From | To | Trigger | Actions | +|------|----|---------|---------| +| `planned` | `running` | `executeRunBundle` called; `markState(bundle, "running")` | Write `RunState.status = running`, audit `TASK_EXECUTION_ATTEMPT`, emit `execution.started` | +| `running` | `paused` | Task requires approval and `bundle.state.approved === false` | Write `RunState.status = paused`, `approvalRequired = true`, `pauseReason = `, audit `APPROVAL_REQUIRED`, emit `gate.awaiting_approval` | +| `running` | `completed` | All tasks in `bundle.plan.tasks` complete successfully | Write `RunState.status = completed`, `currentStepIndex = tasks.length`, audit `RUN_COMPLETED`, emit `execution.completed`, call `recordRunOutcome(success=true)` | +| `running` | `failed` | `executeTask` returns `completed: false` after retry exhaustion | Write `RunState.status = failed`, audit `STEP_EXECUTION_FAILED`, emit `execution.failed`, call `recordRunOutcome(success=false)` | +| `running` | `cancelled` | API call to cancel endpoint | Write `RunState.status = cancelled`, audit `RUN_CANCELLED` | +| `paused` | `running` | `resumeRun(runId, approve=true)` called | Write `RunState.approved = true`, `approvalRequired = false`, re-invoke `executeRunBundle` | +| `paused` | `cancelled` | API call to cancel endpoint while paused | Write `RunState.status = cancelled` | + +--- + +## Phase-to-Status Mapping + +The phase engine (`phase-engine.ts`) runs phases sequentially. The following table shows which `RunStatus` values are valid at each phase boundary. + +| Phase | Entry Status | Exit on Success | Exit on Blocked | Exit on Awaiting-Approval | +|-------|-------------|-----------------|-----------------|--------------------------| +| intake | `in-progress` | `in-progress` | `blocked` | — | +| planning | `in-progress` | `in-progress` | — | — | +| skills | `in-progress` | `in-progress` | — | — | +| gating | `in-progress` | `in-progress` | `blocked` | `awaiting-approval` | +| building | `in-progress` | `in-progress` | — | `awaiting-approval` | +| testing | `in-progress` | `in-progress` | — | — | +| reviewing | `in-progress` | `in-progress` | — | — | +| deployment | `in-progress` | `success` | — | — | + +Note: `runOrchestrationStep` maps phase handler `status` to `RunReport.status` as follows: +- handler `"success"` + `nextPhase != null` → report status `"in-progress"` +- handler `"success"` + `nextPhase === null` → report status `"success"`, `isFinished = true` +- handler `"blocked"` → report status `"blocked"` +- handler `"awaiting-approval"` → report status `"awaiting-approval"` + +--- + +## StepStatus State Machine + +```mermaid +stateDiagram-v2 + [*] --> pending : task created in plan + + pending --> running : executeTask begins attempt + running --> success : adapter.execute succeeds and verification passes + running --> failed : adapter.execute throws after all retries + running --> paused : requiresApproval=true and bundle.state.approved=false + running --> rolled-back : healing fails and rollbackPayload present + failed --> rolled-back : automatic rollback via adapter.rollback + paused --> running : resumeRun(approve=true) restarts from currentStepIndex + success --> [*] + rolled-back --> [*] + failed --> [*] +``` + +### StepStatus Transition Details + +| From | To | Trigger | +|------|----|---------| +| `pending` | `running` | `executeTask` invoked for task at `currentStepIndex` | +| `running` | `success` | `adapter.execute` returns `{ success: true }` and `adapter.verify` returns `{ ok: true }` | +| `running` | `failed` | Error thrown on final retry attempt; healing does not produce `"verified"` status | +| `running` | `paused` | `requiresApproval === true` and `bundle.state.approved === false` | +| `running` | `rolled-back` | Execution fails; `task.rollbackPayload` exists; `adapter.rollback(task.rollbackPayload)` called | +| `paused` | `running` | `resumeRun` sets `approved = true`, re-enters task loop at saved `currentStepIndex` | +| `failed` | `rolled-back` | Automatic rollback path in `executeTask` after retry exhaustion | + +--- + +## GateStatus State Machine + +```mermaid +stateDiagram-v2 + [*] --> pending : gate registered + + pending --> pass : evaluateGates returns pass + pending --> needs-review : evaluateGates returns needs-review + pending --> blocked : evaluateGates returns blocked + + needs-review --> pass : POST /v1/gates/{id}/approve (manual approval) + needs-review --> blocked : POST /v1/gates/{id}/reject + needs-review --> pass : mode=turbo auto-pass applied + + pass --> [*] + blocked --> [*] +``` + +### Gate Resolution Priority + +`getOverallGateStatus` applies the following precedence across all 5 gate decisions: +1. If any gate is `"blocked"` → overall is `"blocked"` (run cannot proceed) +2. If any gate is `"needs-review"` → overall is `"needs-review"` (run pauses for human) +3. If all gates are `"pass"` → overall is `"pass"` (run advances to building phase) + +Manual approvals are tracked in `RunReport.approvedGates: string[]`. Any gate whose ID appears in that array is force-set to `"pass"` regardless of evaluation result. + +--- + +## Invariants + +The following rules must hold at all times across the system: + +1. **Terminal states are final.** A run with status `completed`, `failed`, or `cancelled` must never transition to another status without an explicit retry mechanism creating a new run. +2. **`currentStepIndex` monotonicity.** `RunState.currentStepIndex` only decreases during explicit `rollbackTask` calls; it never decreases automatically during forward execution. +3. **`approved` resets after each step.** Upon successful step completion, `bundle.state.approved` is reset to `false` to prevent approval bleed to subsequent steps. +4. **Audit events are append-only.** `writeAuditEvent` is never called to overwrite or delete prior events. +5. **Events require tenant scope.** `emitOrchestratorEvent` silently skips emission if `orgId` or `workspaceId` is missing from `RunState`, preventing orphan events. +6. **Gate approval list is cumulative.** `approvedGates` only grows during a run; approved gates are never un-approved during a single run lifecycle. +7. **Paused run state is persisted before returning.** `markState(bundle, "paused", {...})` calls `updateRunState` before the function returns, ensuring the checkpoint is durable. + +--- + +## Error States and Recovery Paths + +### Policy Block (`POLICY_BLOCK`) +- **Cause:** `evaluatePolicy(task)` returns `allowed: false` +- **State written:** `RunStatus = failed`, `StepStatus = failed` +- **Recovery:** Fix the policy rule or request an exemption; no automatic recovery + +### Adapter Not Found (`ADAPTER_NOT_FOUND`) +- **Cause:** `findAdapter(adapters, task.adapterId)` returns null +- **State written:** `RunStatus = failed`, `StepStatus = failed` +- **Recovery:** Register the missing adapter in the adapter registry; then call `retryTask` + +### Validation Failure (`VALIDATION_FAILED`) +- **Cause:** `adapter.validate(task.payload)` returns `false` +- **State written:** `RunStatus = failed`, audit includes `fixSuggestion` if adapter provides one +- **Recovery:** Use `fixSuggestion` to correct payload; call `retryTask` + +### Execution Failure with Healing +- **Cause:** `adapter.execute` throws on final retry attempt +- **Path:** `healFailedStep` invoked → if `status === "verified"`, `maxAttempts` incremented and execution resumes; if `approvalRequired`, run pauses; otherwise run fails +- **State written:** `RunStatus = failed` if healing cannot recover +- **Recovery:** Call `retryTask(runId, stepId)` after fixing root cause + +### Resume After Approval +- **Cause:** Run is `paused` due to `requiresApproval` +- **Path:** `POST /v1/runs/{id}/resume` → `resumeRun(runId, approve=true)` → `bundle.state.approved = true` → `executeRunBundle` resumes from `currentStepIndex` +- **Side effect:** `approved` resets to `false` after the approved step completes + +--- + +## Dependencies + +| Dependency | Package | Purpose | +|-----------|---------|---------| +| `run-store` | `packages/memory` | `loadRunBundle`, `updateRunState`, persist state | +| `execution-engine` | `packages/orchestrator` | `executeRunBundle`, transitions running/paused/completed/failed | +| `gate-manager` | `packages/orchestrator` | `evaluateGates`, GateStatus transitions | +| `events.ts` | `packages/orchestrator` | `emitExecutionStarted/Completed/Failed`, `emitGateAwaitingApproval` | +| `audit` | `packages/audit` | `writeAuditEvent`, append-only audit record | +| `policy-engine` | `packages/core` | `evaluatePolicy`, blocks or allows task execution | + +--- + +## Edge Cases + +- **God mode with no tenant scope:** `emitOrchestratorEvent` skips event emission; run still executes but is unobservable via event stream. +- **Expert mode exits after each phase:** `runVerticalSlice` returns after the first `runOrchestrationStep` call in expert mode, requiring the caller to re-invoke for each phase. +- **Turbo mode auto-passes `needs-review` gates:** In `evaluateGates`, any gate returning `needs-review` is rewritten to `pass` when `mode === "turbo"`, bypassing the approval pause. +- **Resume of a completed run:** `resumeRun` checks `bundle.state.status === "completed"` and returns early without re-executing, preventing double-execution. +- **Rollback of step with no rollback payload:** `rollbackTask` throws `"Rollback not available for step: {id}"` — callers must check `StepExecutionLog.rollbackAvailable` before calling. +- **Concurrent approval and cancellation:** No lock is held on `RunState`; last writer wins. API layer must serialize concurrent state mutations per runId. + +--- + +## Risks + +| Risk | Likelihood | Impact | Mitigation | +|------|-----------|--------|-----------| +| State desync between memory store and audit log | Medium | High | Wrap `markState` and `writeAuditEvent` in the same call path; do not split them | +| `approved` flag persisting across steps due to resume edge cases | Low | High | Invariant #3 enforced in `executeTask` success path; covered by tests | +| Orphan paused runs (no actor to resume) | Medium | Medium | `gate.awaiting_approval` event must trigger notification; add TTL on paused state | +| Terminal state re-entry via concurrent API calls | Low | High | Idempotency guard needed at API layer before `executeRunBundle` | + +--- + +## Definition of Done + +- [ ] All `RunStatus` transitions are covered by integration tests with state assertions +- [ ] All `StepStatus` transitions emit the correct audit event with expected fields +- [ ] Gate approval flow tested end-to-end: `needs-review → approved → run resumes` +- [ ] `resumeRun` with `approve=false` does not re-approve the paused step +- [ ] `rollbackTask` throws a typed error when `rollbackPayload` is absent +- [ ] Turbo mode auto-pass behavior validated against all 5 gates +- [ ] Expert mode single-step behavior validated in `runVerticalSlice` +- [ ] All 5 gate IDs (`objective-clarity`, `requirements-completeness`, `plan-readiness`, `skill-coverage`, `ambiguity-risk`) are reflected in `approvedGates` tracking +- [ ] Terminal state guard tested: completed run re-submitted returns bundle without re-executing +- [ ] Mermaid diagrams in this spec render correctly in the project documentation site diff --git a/docs/06_validation/GO_NO_GO_CHECKLIST.md b/docs/06_validation/GO_NO_GO_CHECKLIST.md new file mode 100644 index 0000000..a678ca5 --- /dev/null +++ b/docs/06_validation/GO_NO_GO_CHECKLIST.md @@ -0,0 +1,159 @@ +# Go/No-Go Release Gate — v1.3.0 + +| Field | Value | +|-------|-------| +| Release | v1.3.0 | +| Target date | [TBD] | +| Decision date | [to be filled at review meeting] | +| Decision makers | Engineering Lead, Security Lead, Product Owner | +| Meeting format | Synchronous review of this document | +| Document status | Draft — all gates open | +| Last updated | 2026-04-04 | + +--- + +## Purpose + +This document is the formal gate that controls whether Code-Kit-Ultra v1.3.0 may +proceed to production release. It is reviewed in a synchronous meeting attended +by all decision makers listed above. No release may proceed unless the outcome +recorded in the Decision Log is "GO" or "CONDITIONAL GO" with documented +exceptions approved by the Security Lead. + +This document is completed fresh for each release. Items are verified — not +assumed — before being checked. Evidence (test run URLs, coverage reports, or +audit screenshots) must be linked in the notes for every Security Gate item. + +--- + +## Gate 1 — Security Gate + +> **HARD BLOCK — the release cannot proceed if any item in this gate is unchecked.** +> Evidence of verification is required for every item. + +- [ ] Zero P0 (Critical) open security vulnerabilities + - _Evidence:_ link to security scan results +- [ ] Zero P1 (High) open security vulnerabilities + - _Evidence:_ link to security scan results +- [ ] **R-01 verified:** SA secret loaded from env var (`SA_SECRET`); service startup + throws and refuses to start if the env var is absent or empty + - _Test:_ start service without `SA_SECRET` set → must exit with non-zero code and + log `FATAL: SA_SECRET is required` + - _Evidence:_ CI run link +- [ ] **R-02 verified:** default org bypass removed from `resolveSession`; cross-tenant + access blocked at API layer + - _Test:_ `POST /v1/runs` with `orgId="default"` → `400 INVALID_ORG_ID` + - _Evidence:_ passing test case in `TEST_PLAN_RUN_SCOPING.md §4.5` +- [ ] **R-03 verified:** Redis jti blacklist implemented; revoked session tokens return 401 + - _Test:_ issue token, revoke it via logout endpoint, reuse token → `401 TOKEN_REVOKED` + - _Evidence:_ passing security test `auth/revocation.test.ts` +- [ ] **R-04 verified:** execution token validated on every protected API call; + expired or missing execution token returns 401 + - _Test:_ call `POST /v1/runs/{id}/resume` with expired exec token → `401` + - _Evidence:_ passing security test `exec-token-validation.test.ts` +- [ ] **R-05 verified:** audit hash chain is restart-safe (uses DB-persisted `lastHash`, + not module-level variable); chain integrity survives service restart + - _Test:_ append 50 events, restart service, append 10 more, run chain verifier → no + mismatch + - _Evidence:_ passing test in `SECURITY_TESTING_PLAN.md §3 Audit Integrity` + +--- + +## Gate 2 — Quality Gate + +> **HARD BLOCK — the release cannot proceed if any item in this gate is unchecked.** + +- [ ] All smoke tests pass on staging environment + - _Command:_ `pnpm test:smoke --env=staging` + - _Evidence:_ CI run link +- [ ] `packages/auth` test coverage ≥ 90% (measured, not estimated) + - _Command:_ `pnpm test --coverage --filter=auth` + - _Evidence:_ coverage report screenshot or artifact link +- [ ] `packages/orchestrator` test coverage ≥ 80% + - _Command:_ `pnpm test --coverage --filter=orchestrator` + - _Evidence:_ coverage report +- [ ] Zero P0 functional bugs open + - _Evidence:_ link to issue tracker filtered by P0 + open +- [ ] Zero regressions from v1.2.0 verified by regression test suite + - _Command:_ `pnpm test:regression` + - _Evidence:_ CI run link + +--- + +## Gate 3 — Operations Gate + +> **HARD BLOCK — the release cannot proceed if any item in this gate is unchecked.** + +- [ ] Staging deployment successful: Dockerfile built and service started without errors + - _Evidence:_ deployment log link +- [ ] DB migrations ran cleanly on staging against a clean schema (no pre-existing tables) + - _Evidence:_ migration runner output in deployment log +- [ ] Rollback tested: deployed v1.3.0 on staging, rolled back to v1.2.0, verified core + functionality remained intact + - _Evidence:_ rollback test log link +- [ ] Health and readiness endpoints functional on staging + - `GET /health` → `200 {"status":"healthy"}` + - `GET /ready` → `200` when DB and Redis are reachable; `503` when either is down + - _Evidence:_ curl output +- [ ] Alerts configured and tested for P0 errors (5xx bursts, auth failures) + - _Evidence:_ alert rule screenshot + test notification confirmation + +--- + +## Gate 4 — Product Gate + +> **CONDITIONAL** — release may proceed with documented exceptions approved by +> the Product Owner and Engineering Lead. Any unchecked item must be logged in +> the Decision Log with a resolution date. + +- [ ] Product Owner sign-off on feature completeness for v1.3.0 scope + - _Sign-off by:_ [name, date] +- [ ] Customer-facing changelog reviewed and approved for accuracy + - _Evidence:_ link to reviewed `CHANGELOG.md` diff +- [ ] Documentation complete: OpenAPI 3.1 spec generated and matches implementation + - _Evidence:_ spec file path + validation command output +- [ ] `README.md` and quickstart guide updated for v1.3.0 changes + - _Evidence:_ PR link + +--- + +## Outcome + +| Outcome | Condition | Action | +|---------|-----------|--------| +| GO | All Gate 1 + 2 + 3 items checked; Gate 4 items checked | Proceed to production release | +| CONDITIONAL GO | All Gate 1 + 2 + 3 items checked; one or more Gate 4 items pending with Product Owner approval | Release with documented limitations; Gate 4 items tracked as follow-up | +| NO-GO | Any Gate 1, 2, or 3 item unchecked | Release blocked — schedule remediation sprint, re-convene for re-review | + +--- + +## Decision Log + +| Date | Release | Outcome | Blocker (if NO-GO) | Resolved By | Sign-off | +|------|---------|---------|-------------------|-------------|----------| +| — | v1.3.0 | Pending | — | — | — | + +--- + +## Current Status — v1.3.0 + +> Status as of 2026-04-04. All gates open; work in progress. + +| Gate | Items | Checked | Remaining | Status | +|------|-------|---------|-----------|--------| +| Gate 1 — Security | 7 | 0 | 7 | OPEN (HARD BLOCK) | +| Gate 2 — Quality | 5 | 0 | 5 | OPEN (HARD BLOCK) | +| Gate 3 — Operations | 5 | 0 | 5 | OPEN (HARD BLOCK) | +| Gate 4 — Product | 4 | 0 | 4 | OPEN (CONDITIONAL) | +| **Overall** | **21** | **0** | **21** | **NO-GO** | + +--- + +## Related Documents + +- `docs/06_validation/PRODUCTION_READINESS.md` — detailed effort estimates and owners +- `docs/06_validation/SECURITY_TESTING_PLAN.md` — security test cases and evidence requirements +- `docs/06_validation/TEST_PLAN_RUN_SCOPING.md` — run isolation test plan (required for Gate 1 R-02) +- `docs/06_validation/TEST_PLAN_AUTH.md` — auth package test plan (required for Gate 2 coverage) +- `docs/SECURITY_AUDIT.md` — open risk register (R-01 through R-08) +- `docs/ROLLBACK.md` — rollback procedure (required for Gate 3) diff --git a/docs/06_validation/PRODUCTION_READINESS.md b/docs/06_validation/PRODUCTION_READINESS.md new file mode 100644 index 0000000..6b70363 --- /dev/null +++ b/docs/06_validation/PRODUCTION_READINESS.md @@ -0,0 +1,135 @@ +# Production Readiness Checklist + +- **Document type**: Release Checklist +- **Version target**: v1.3.0 +- **Last updated**: 2026-04-04 +- **Status**: In progress — all items open + +--- + +## Purpose + +This checklist must be completed in full before any production release. Work +through each item with the designated owner, mark it checked, and record the +date and reviewer in the notes column. Categories 1–3 are hard gates: a single +unchecked item in Security, Reliability, or Observability blocks release. +Categories 4–6 are strong recommendations; any exception requires explicit +sign-off from the Engineering Lead. + +**How to use this checklist** + +1. Open this file at the start of each release milestone. +2. Assign owners to unchecked items. +3. Work items in priority order: Security first, then Reliability, then + Observability, then Testing, then Deployment, then Documentation. +4. Mark each item `[x]` when verified — not just implemented. +5. Bring outstanding items to the Go/No-Go review meeting + (`GO_NO_GO_CHECKLIST.md`). + +--- + +## Category 1 — Security + +> **HARD GATE — release cannot proceed if any item in this category is unchecked.** + +| # | Item | Effort | Owner | Done | +|---|------|--------|-------|------| +| S-01 | R-01: SA secret loaded from env var (`SA_SECRET`); service throws on startup if env var is absent or empty | 1h | eng | [ ] | +| S-02 | R-02: Default org bypass removed from `resolveSession`; cross-tenant access blocked at API layer | 2h | eng | [ ] | +| S-03 | R-03: Redis-backed session revocation via jti blacklist; revoked tokens return 401 | 4h | eng | [ ] | +| S-04 | R-04: Execution token validated on every protected API call; expired/missing exec token returns 401 | 3h | eng | [ ] | +| S-05 | R-05: Audit hash chain uses DB-persisted `lastHash` (not module-level variable); chain survives service restart | 3h | eng | [ ] | +| S-06 | R-06: Rate limiting enforced — 100 req/min per actor globally, 10/min for token creation endpoint | 4h | eng | [ ] | +| S-07 | Secrets (tokens, keys, passwords) are never written to application logs; audit log sanitization verified | 2h | eng | [ ] | +| S-08 | HTTPS enforced in production (HTTP → HTTPS redirect at load balancer or app layer) | 1h | infra | [ ] | +| S-09 | CORS policy configured with an explicit origin allowlist — wildcard (`*`) not permitted in production | 1h | eng | [ ] | +| S-10 | CSP headers configured on the web control plane (`Content-Security-Policy` response header present) | 2h | eng | [ ] | + +--- + +## Category 2 — Reliability + +> **HARD GATE — release cannot proceed if any item in this category is unchecked.** + +| # | Item | Effort | Owner | Done | +|---|------|--------|-------|------| +| R-01 | PostgreSQL runtime wired: `run-store.ts` reads and writes to DB (not in-memory map) | 8h | eng | [ ] | +| R-02 | Connection pooling configured: pg pool `min=2`, `max=10`; connections reused across requests | 1h | eng | [ ] | +| R-03 | DB migrations run automatically on service startup (via migration runner in entrypoint) | 2h | eng | [ ] | +| R-04 | Graceful shutdown: `SIGTERM` handler drains in-flight requests and closes DB pool before exit | 3h | eng | [ ] | +| R-05 | `GET /health` returns `200 {"status":"healthy"}` and does not gate on DB connectivity | 1h | eng | [ ] | +| R-06 | `GET /ready` gates on both DB and Redis connectivity; returns `503` if either is unreachable | 2h | eng | [ ] | + +--- + +## Category 3 — Observability + +> **HARD GATE — release cannot proceed if any item in this category is unchecked.** + +| # | Item | Effort | Owner | Done | +|---|------|--------|-------|------| +| O-01 | Structured JSON logging in place (`logger.ts`); all `console.log` / `console.error` calls replaced | 4h | eng | [ ] | +| O-02 | Trace ID (`X-Trace-ID` header) injected on every inbound request and included in all log lines | 2h | eng | [ ] | +| O-03 | `GET /metrics` Prometheus endpoint exposed; request count, latency histograms, error rates present | 6h | eng | [ ] | +| O-04 | Error alerting configured: critical errors (5xx bursts, auth failures) route to alert channel | 3h | infra | [ ] | +| O-05 | Audit log persisted to DB for every material action (run create/cancel/resume, gate approve/reject) | 4h | eng | [ ] | + +--- + +## Category 4 — Testing + +| # | Item | Effort | Owner | Done | +|---|------|--------|-------|------| +| T-01 | `packages/auth` test coverage ≥ 90% (measured via `pnpm test --coverage`, not estimated) | 8h | eng | [ ] | +| T-02 | `packages/orchestrator` test coverage ≥ 80% | 12h | eng | [ ] | +| T-03 | Governance gates package test coverage ≥ 80% | 8h | eng | [ ] | +| T-04 | All smoke tests pass on staging environment (`pnpm test:smoke --env=staging`) | 2h | qa | [ ] | +| T-05 | Zero P0/P1 security vulnerabilities open (`npm audit --audit-level=high` returns clean) | varies | security | [ ] | + +--- + +## Category 5 — Deployment + +| # | Item | Effort | Owner | Done | +|---|------|--------|-------|------| +| D-01 | Dockerfile exists, builds successfully (`docker build .` passes with no errors) | 4h | eng | [ ] | +| D-02 | `.env.example` documents every required environment variable with description and example value | 1h | eng | [ ] | +| D-03 | DB migrations tested on a completely clean schema (no pre-existing tables) | 1h | eng | [ ] | +| D-04 | Rollback procedure documented (`docs/ROLLBACK.md`) and tested: v1.3.0 → v1.2.0 verified working | 2h | eng | [ ] | +| D-05 | Zero-downtime deploy strategy defined and documented (blue/green or rolling; no forced restarts mid-request) | 4h | infra | [ ] | + +--- + +## Category 6 — Documentation + +| # | Item | Effort | Owner | Done | +|---|------|--------|-------|------| +| Doc-01 | OpenAPI 3.1 spec generated and validated against implementation (all routes, request/response schemas present) | 8h | eng | [ ] | +| Doc-02 | `CHANGELOG.md` updated with v1.3.0 entries (new features, bug fixes, breaking changes, security fixes) | 1h | eng | [ ] | +| Doc-03 | `SECURITY.md` current with accurate vulnerability contact information and disclosure timeline | 30m | security | [ ] | + +--- + +## Summary Table + +> Update this table manually at each milestone review. + +| Category | Total Items | Checked | Remaining | Effort Remaining | +|----------|-------------|---------|-----------|-----------------| +| Security | 10 | 0 | 10 | 20h | +| Reliability | 6 | 0 | 6 | 17h | +| Observability | 5 | 0 | 5 | 19h | +| Testing | 5 | 0 | 5 | 30h | +| Deployment | 5 | 0 | 5 | 12h | +| Documentation | 3 | 0 | 3 | 9.5h | +| **Total** | **34** | **0** | **34** | **107.5h** | + +--- + +## Related Documents + +- `docs/06_validation/GO_NO_GO_CHECKLIST.md` +- `docs/06_validation/SECURITY_TESTING_PLAN.md` +- `docs/06_validation/TEST_PLAN_RUN_SCOPING.md` +- `docs/SECURITY_AUDIT.md` +- `docs/RELEASE_CHECKLIST.md` diff --git a/docs/06_validation/SECURITY_TESTING_PLAN.md b/docs/06_validation/SECURITY_TESTING_PLAN.md new file mode 100644 index 0000000..ca558ca --- /dev/null +++ b/docs/06_validation/SECURITY_TESTING_PLAN.md @@ -0,0 +1,499 @@ +# Security Testing Plan + +- **Document type**: Security Test Plan +- **Version target**: v1.3.0 +- **Last updated**: 2026-04-04 +- **Status**: Draft — Phase 1 required before v1.3.0 release + +--- + +## 1. Scope + +### In scope + +| System | What is tested | +|--------|---------------| +| `apps/control-service` API | All authenticated endpoints, rate limiting, input validation | +| `packages/auth` | JWT validation, session resolution, token revocation | +| Tenant isolation layer | Cross-org access prevention, run scoping | +| SSE realtime stream (`GET /v1/stream`) | Cross-tenant event leakage | +| Audit log subsystem | Hash chain integrity, restart safety | + +### Out of scope + +- InsForge platform internals (JWT issuance, JWKS endpoint) — not operated by this team +- Third-party AI provider APIs (OpenAI, Anthropic, etc.) +- Network-layer controls (TLS termination, DDoS mitigation) — handled by infrastructure +- Client-side / browser security of any frontend applications + +--- + +## 2. Testing Phases + +### Phase 1 — Pre-release (required for v1.3.0) + +| Activity | Owner | Timing | +|----------|-------|--------| +| Manual security review of open risks R-01 through R-07 | Security Lead | 2 weeks before release | +| Automated security test suite (`pnpm test:security`) | Engineering | CI, every PR to main | +| OWASP ZAP API scan against staging | Security Lead | 1 week before release | +| `npm audit` dependency vulnerability scan | Engineering | CI, weekly | +| Static analysis: `eslint-plugin-security` | Engineering | CI, every PR | + +**Exit criteria for Phase 1:** Zero High or Critical findings from ZAP; all automated +security tests pass; all P0/P1 risks from the open risk register resolved. + +### Phase 2 — Post-release (scheduled) + +| Activity | Frequency | Owner | +|----------|-----------|-------| +| Automated `npm audit` | Weekly (CI cron) | Engineering | +| OWASP ZAP scan against production (read-only, non-destructive) | Monthly | Security Lead | +| Dependency update review | Monthly | Engineering | + +### Phase 3 — Ongoing (CI enforcement) + +- All tests in `pnpm test:security` run on every PR to `main`. +- Any PR that removes or skips a security test requires Security Lead approval. +- New security bug fixes must be accompanied by a regression test before merge. + +--- + +## 3. Test Cases by Category + +All automated test cases live under `packages/auth/tests/security/` and +`apps/control-service/tests/security/`. Run with: + +```bash +pnpm test:security +``` + +--- + +### 3.1 Authentication Attacks + +**JWT algorithm confusion** +- Send a token signed with HS256 using the RS256 public key as the HMAC secret. + Server expects RS256 (from JWKS). Must reject with `401 INVALID_TOKEN`. + +**JWT "none" algorithm** +- Craft a token with header `{"alg":"none","typ":"JWT"}` and no signature. + Must reject with `401 INVALID_TOKEN` — server must never accept `alg: "none"`. + +**Expired token** +- Issue a valid RS256 token with `exp` set 1 hour in the past. + Must reject with `401 TOKEN_EXPIRED`. + +**Wrong issuer** +- Issue a token with `iss: "https://evil.example.com"` signed with a valid key. + Must reject with `401 INVALID_ISSUER`. + +**Tampered payload** +- Take a valid token, decode the payload, flip one character in `sub`, re-encode + with the original signature (now invalid). + Must reject with `401 INVALID_SIGNATURE`. + +**Revoked token (jti blacklist)** +- Issue a valid token. Call the logout endpoint to revoke it (adds jti to Redis + blacklist). Immediately reuse the same token on a protected endpoint. + Must reject with `401 TOKEN_REVOKED`. + +**Brute force rate limit** +- Send 11 login/token-creation requests within 60 seconds from the same actor. + The 11th request must receive `429 Too Many Requests` with a `Retry-After` + header. + +```typescript +describe('Authentication Attacks', () => { + it('rejects HS256 algorithm confusion token', async () => { + const token = buildAlgConfusionToken({ alg: 'HS256', secret: RS256_PUBLIC_KEY }); + const res = await api.get('/v1/runs').set('Authorization', `Bearer ${token}`); + expect(res.status).toBe(401); + expect(res.body.code).toBe('INVALID_TOKEN'); + }); + + it('rejects alg:none token', async () => { + const token = buildNoneAlgToken({ sub: 'user-a-admin' }); + const res = await api.get('/v1/runs').set('Authorization', `Bearer ${token}`); + expect(res.status).toBe(401); + }); + + it('rejects expired token', async () => { + const token = buildExpiredToken({ sub: 'user-a-admin', ageSeconds: 3600 }); + const res = await api.get('/v1/runs').set('Authorization', `Bearer ${token}`); + expect(res.status).toBe(401); + expect(res.body.code).toBe('TOKEN_EXPIRED'); + }); + + it('rejects wrong issuer', async () => { + const token = buildToken({ sub: 'user-a-admin', iss: 'https://evil.example.com' }); + const res = await api.get('/v1/runs').set('Authorization', `Bearer ${token}`); + expect(res.status).toBe(401); + expect(res.body.code).toBe('INVALID_ISSUER'); + }); + + it('rejects tampered payload', async () => { + const token = buildTamperedPayloadToken({ sub: 'user-a-admin' }); + const res = await api.get('/v1/runs').set('Authorization', `Bearer ${token}`); + expect(res.status).toBe(401); + }); + + it('rejects revoked token (jti in Redis blacklist)', async () => { + const { token } = await issueToken({ sub: 'user-a-admin' }); + await api.post('/v1/auth/logout').set('Authorization', `Bearer ${token}`); + const res = await api.get('/v1/runs').set('Authorization', `Bearer ${token}`); + expect(res.status).toBe(401); + expect(res.body.code).toBe('TOKEN_REVOKED'); + }); + + it('rate limits token creation at 11 attempts per minute', async () => { + for (let i = 0; i < 10; i++) { + await api.post('/v1/auth/token').send({ clientId: 'x', clientSecret: 'bad' }); + } + const res = await api.post('/v1/auth/token').send({ clientId: 'x', clientSecret: 'bad' }); + expect(res.status).toBe(429); + expect(res.headers['retry-after']).toBeDefined(); + }); +}); +``` + +--- + +### 3.2 Authorization Bypass + +**Cross-tenant run access** +- Authenticate as an orgA user with a valid token. Request `GET /v1/runs/{id}` where + `id` belongs to orgB. Must receive `404` — not `200` (leak) or `403` (information + disclosure that the run exists). + +**Gate approval without permission** +- Authenticate as a user without `gate:approve` permission. Call the gate approval + endpoint. Must receive `403 FORBIDDEN`. + +**Privilege escalation via crafted JWT** +- Craft a token with `roles: ["org:admin"]` for an account that is only + `workspace:member`. Because the token is not signed by InsForge's private key, + it must be rejected with `401 INVALID_SIGNATURE`. + +**Service account accessing admin endpoint** +- Authenticate as a service account (which has `actorType: service_account`). + Call an admin-only endpoint (e.g., `GET /v1/admin/orgs`). + Must receive `403 FORBIDDEN`. + +**Forged execution token** +- Issue an execution token for run-a1. Substitute run-a2's ID into the token payload + without re-signing. Call `POST /v1/runs/run-a2/resume` with this token. + Must receive `401 INVALID_EXEC_TOKEN`. + +```typescript +describe('Authorization Bypass', () => { + it('returns 404 (not 200 or 403) for cross-tenant run access', async () => { + const res = await authed(orgAToken).get('/v1/runs/run-b1'); + expect(res.status).toBe(404); + }); + + it('returns 403 for gate approval without gate:approve permission', async () => { + const res = await authed(viewerToken).post(`/v1/runs/run-a1/gates/1/approve`); + expect(res.status).toBe(403); + }); + + it('rejects crafted admin JWT with invalid signature', async () => { + const token = craftAdminToken({ sub: 'user-a-member' }); // not signed by InsForge + const res = await api.get('/v1/admin/orgs').set('Authorization', `Bearer ${token}`); + expect(res.status).toBe(401); + }); + + it('returns 403 for service account accessing admin endpoint', async () => { + const res = await authed(saToken).get('/v1/admin/orgs'); + expect(res.status).toBe(403); + }); + + it('rejects forged execution token for different run', async () => { + const forgedToken = forgeExecToken({ originalRunId: 'run-a1', targetRunId: 'run-a2' }); + const res = await api + .post('/v1/runs/run-a2/resume') + .set('X-Execution-Token', forgedToken); + expect(res.status).toBe(401); + expect(res.body.code).toBe('INVALID_EXEC_TOKEN'); + }); +}); +``` + +--- + +### 3.3 Input Validation + +**SQL injection in runId** +- Send `GET /v1/runs/'; DROP TABLE runs; --` as the run ID path parameter. + The query must use parameterized statements; the response must be `400` or `404` + with no DB error. The `runs` table must still exist after the request. + +**XSS in idea text** +- Submit `POST /v1/runs` with `idea: ""`. + The stored and returned value must be sanitized or escaped — the literal + `' }); + expect(res.status).toBe(201); + expect(res.body.run.idea).not.toMatch(/