diff --git a/docs/plans/2026-03-27-kilroy-stabilization.md b/docs/plans/2026-03-27-kilroy-stabilization.md
new file mode 100644
index 00000000..f2a93872
--- /dev/null
+++ b/docs/plans/2026-03-27-kilroy-stabilization.md
@@ -0,0 +1,840 @@
+# Kilroy Stabilization Plan
+
+Date: 2026-03-27
+
+## Problem Statement
+
+Kilroy is an attractor-pattern graph runner that orchestrates multi-stage LLM workflows.
+It works, but it's fragile, hard to set up, hard to debug, and fails in ways that are
+difficult to diagnose. The root causes are architectural:
+
+1. **Three concerns in one layer.** Graph execution, LLM orchestration, and software-development
+   workflow opinions (git worktrees, artifact policies, per-node commits) are inseparable. You
+   can't run a simple graph without buying into the full stack.
+
+2. **Heavy where it should be light; light where it should be heavy.** Configuration requires
+   specifying everything (CXDB, provider backends, model catalogs, artifact policies) with no
+   sensible defaults. But validation is shallow — credentials are checked for presence not
+   validity, and the system fails 2 hours into a run for things it could catch in 10 seconds.
+
+3. **Invisible decisions.** Edge routing uses 4+ interacting mechanisms (conditions,
+   preferred_label, suggested_next_ids, retry_target chains). Provider selection, fidelity
+   resolution, failure classification, and implicit fan-out detection all happen silently.
+   When a run takes an unexpected path, there's no way to understand why without reading code.
+
+4. **File-based state assembly.** Run state is spread across progress.ndjson, checkpoint.json,
+   final.json, status.json per node, and CXDB events. Piecing together what happened requires
+   reading multiple files and mentally replaying the run. No queryable operational state exists.
+
+5. **Graphs encode too much.** Prompts, failure policies, verification scripts, status contracts,
+   model routing, and fidelity modes all live in DOT files. Graphs are hard to write, hard to
+   read, and easy to get subtly wrong.
+
+## Design Principles
+
+These guide every task in this plan:
+
+- **Fail fast.** Validate everything checkable before execution starts. 10 seconds, not 2 hours.
+- **Explicit over implicit.** Every routing decision, provider selection, and state transition
+  should be visible in the graph or in decision logs. No silent fallbacks.
+- **Layered architecture.** The graph runner is the core. Git, CXDB, artifact policies, and
+  software-development workflow patterns are composable layers on top, not baked in.
+- **Zero-config start.** `kilroy run --graph pipeline.dot` should work with no config file.
+  Everything else is opt-in.
+- **Observable decisions.** The system should answer "what happened and why" from a single
+  query, not a filesystem crawl.
+
+## Phasing
+
+Each phase delivers standalone value and builds on the previous one. Phase 0 can start today.
+Script-based test graphs (no LLM involvement) validate each change to isolate engine behavior
+from model variability.
+
+---
+
+## Phase 0: Immediate Stabilization
+
+Goal: Make the current system more predictable without structural changes. Each task is
+independently shippable. Use script-based test graphs to validate — nodes run shell commands
+that produce deterministic output, so we're testing the engine plumbing, not LLM behavior.
+
+### 0.1 Create Script-Based Test Graphs
+
+**What:** Create a small suite of DOT graphs where every node runs a shell script instead of
+an LLM. These become the validation harness for all subsequent Phase 0 work and beyond.
+
+**Context:** Currently there's no way to test the engine's traversal, routing, and checkpoint
+behavior without involving LLMs. We need deterministic graphs that exercise the engine's core
+codepaths: linear traversal, conditional routing, retry loops, failure handling.
+
+**Done when:**
+- A `testdata/` directory contains at least 4 graphs:
+  - `linear.dot`: start → A → B → C → done (basic traversal)
+  - `linear_verify.dot`: start → implement → verify → done, with verify→implement on fail
+    (hill-climber pattern with a verify script that succeeds on Nth attempt)
+  - `conditional.dot`: start → process → (path_a | path_b based on outcome) → done
+  - `fail_fast.dot`: start → node_that_fails → done (tests failure handling)
+- Each graph uses `type="command"` or `shape=parallelogram` nodes with shell scripts
+- Shell scripts are in `testdata/scripts/`, produce deterministic outcomes via status.json
+- All graphs pass `kilroy attractor validate`
+- All graphs execute successfully with `kilroy attractor run`
+
+### 0.2 Make CXDB Truly Optional
+
+**What:** Remove the assertion that CXDB configuration is required. Users should be able to
+run graphs without CXDB configured, running, or even installed.
+
+**Context:** `config.go:313` asserts CXDB fields are required ("required in v1"). The engine
+already nil-checks CXDB access everywhere. The `--no-cxdb` CLI flag exists but config
+validation blocks before it's consulted when no CXDB config is provided. This means even a
+user who doesn't want CXDB has to provide dummy CXDB config values.
+
+**Done when:**
+- A run config with no `cxdb` section is accepted without error
+- `kilroy attractor run --graph testdata/linear.dot` works with a minimal config (repo path only)
+- Running without CXDB produces no CXDB-related errors or warnings
+- Running WITH CXDB still works as before
+- Existing tests pass (`go test ./...`)
+
+### 0.3 Strict Outcome Validation
+
+**What:** Reject unrecognized status values from node outcomes instead of silently passing
+them through. Typos in status names currently cause silent routing failures.
+
+**Context:** `runtime/status.go` ParseStageStatus accepts any non-empty string as a valid
+status. If an LLM writes `"outcome": "proccess"` (typo), it's treated as a custom outcome.
+No edge condition matches it, and routing silently falls through to default behavior. The
+canonical values are: success, partial_success, retry, fail, skipped.
+
+**Done when:**
+- Unknown status values produce a clear error with the invalid value and the list of valid values
+- The canonical status set is documented in one place in code
+- The test graph suite from 0.1 exercises both valid and invalid outcomes
+- Existing graphs that intentionally use custom outcomes (if any in the test suite) are migrated
+  to use `preferred_label` for conditional routing instead
+- Existing tests pass
+
+### 0.4 Decision Logging
+
+**What:** Add structured events to the progress stream for every routing decision the engine
+makes. When a run takes an unexpected path, the decision log should explain exactly why.
+
+**Context:** `next_hop.go` evaluates edge conditions, matches preferred labels, applies weight
+tiebreaking, and falls through retry_target chains — but most of these decisions are invisible.
+`progress.go` captures stage start/end events but not the reasoning behind edge selection.
+Provider selection and retry decisions are also unlogged.
+
+**Done when:**
+- Edge condition evaluation logs: which edges were evaluated, what their conditions were,
+  whether each condition matched, and which edge was selected
+- Provider/model selection logs: which provider and model were chosen for each node
+- Retry decision logs: when a retry occurs, the attempt number, backoff delay, and why
+- All new events appear in progress.ndjson with structured fields
+- The test graphs from 0.1 produce readable decision trails that explain their routing
+- Running a conditional graph and reading progress.ndjson clearly shows why each path was taken
+
+### 0.5 Real Credential Validation
+
+**What:** Actually test provider credentials during preflight by making lightweight API calls,
+rather than just checking if environment variables are set.
+
+**Context:** `provider_preflight.go` has prompt-probe infrastructure that already makes test
+API calls with retry logic. But it's disabled by default and runs in warn-only mode. The
+common failure mode: a run starts, executes for hours, then fails at a node that needs a
+provider whose API key is missing, invalid, or expired.
+
+**Done when:**
+- Preflight makes a real API call to every provider/model combination declared in the graph
+- Invalid or missing credentials fail preflight with a clear message naming the provider,
+  model, and what's wrong
+- Preflight reports ALL credential issues at once, not one at a time
+- A graph referencing a provider with a bad key fails in preflight, not at execution time
+- Preflight can be skipped with an explicit flag for offline/CI use
+- The fail_fast test graph from 0.1 validates this behavior
+
+### 0.6 Dry-Run Mode
+
+**What:** Add `--dry-run` that validates everything — graph structure, provider credentials,
+model availability — and reports results without executing any nodes.
+
+**Context:** The current `--preflight` flag runs validation but doesn't do the full bootstrap
+(repo checks, artifact policy resolution, effective provider routing). A dry-run should go
+further: confirm that the run WOULD succeed by doing everything except actually executing
+handlers.
+
+**Done when:**
+- `kilroy attractor run --graph test.dot --dry-run` validates graph, resolves all providers,
+  tests all credentials, and reports a summary of what would happen
+- No git branches are created, no worktrees made, no nodes executed
+- Output shows: graph structure summary, provider assignments per node, credential status,
+  any warnings or errors
+- Exit code reflects whether the run would succeed (0) or fail (non-zero)
+- Works with the test graph suite from 0.1
+
+### 0.7 Zero-Config Startup
+
+**What:** Make `kilroy attractor run --graph pipeline.dot` work with no config file at all.
+Auto-detect providers from environment, use sensible defaults for everything else.
+
+**Context:** Currently you must write a YAML config file specifying CXDB, provider backends,
+model catalogs, artifact policies, and more. A new user has to understand the entire system
+architecture to even start. Most of this has one reasonable default.
+
+**Done when:**
+- Running with only `--graph` and no `--config` succeeds when provider API keys are in the
+  environment (auto-detect: ANTHROPIC_API_KEY → anthropic provider, OPENAI_API_KEY → openai, etc.)
+- No CXDB required (builds on 0.2)
+- Default model selected per provider (the provider's current best general-purpose model)
+- Default backend selected based on what's available (API key present → api; CLI binary found → cli)
+- Repo path defaults to current directory if it's a git repo, or a temp workspace if not
+- Config file remains available for overriding any default
+- The test graph suite runs with zero config
+
+---
+
+## Phase 0.8: CLI-Only Agent Backend (Deprecate API)
+
+Goal: Simplify the agent backend to CLI-only. Use the appropriate CLI tool for each provider
+instead of maintaining both API and CLI execution paths.
+
+### Design Context
+
+Team feedback: use `opencode` for all providers that don't have a dedicated CLI agent.
+The provider mapping becomes:
+
+- Claude → `claude` CLI (with `--dangerously-skip-permissions`)
+- OpenAI → `codex` CLI (with `--sandbox workspace-write`)
+- Google → `gemini` CLI (with `--yolo`)
+- Everything else → `opencode` CLI (multi-provider, open-source)
+
+This eliminates the entire API backend: the `internal/agent/` package (session management,
+tool registry, profiles, output truncation), the `internal/llm/` package (provider adapters,
+unified client), and the complexity of maintaining two fundamentally different execution
+models. The codergen handler becomes: "invoke a CLI binary with a prompt, capture output."
+
+This also resolves the CLI-vs-API convergence gap — CLI converges better for structural
+reasons (purpose-built agents with native tools), so going all-CLI means convergence
+improves everywhere. It sidesteps Anthropic ToS concerns since we're using official/community
+CLI tooling as intended.
+
+### 0.8.1 Add opencode as a Provider Backend
+
+**What:** Add opencode CLI support alongside the existing claude/codex/gemini CLI backends.
+opencode becomes the default backend for any provider that doesn't have a dedicated CLI.
+
+**Context:** `internal/providerspec/builtin.go` defines CLI invocation templates per provider.
+Adding opencode follows the same pattern. opencode supports multiple LLM providers through
+its own configuration.
+
+**Done when:**
+- opencode is a recognized provider backend in the provider spec system
+- Graphs can specify `llm_provider=opencode` or have it selected as default
+- The invocation template handles opencode's flags for headless/non-interactive mode
+- A test graph using opencode validates the integration
+- opencode availability is checked in preflight
+
+### 0.8.2 Deprecate API Backend
+
+**What:** Mark the API backend as deprecated. New graphs should use CLI backends only.
+The API codepath remains functional but is no longer the recommended path.
+
+**Context:** The API backend lives in `internal/agent/` (session, tool registry, profiles)
+and `internal/llm/` (provider adapters). Deprecation means: warning when API backend is
+selected, documentation updated, no new features added to the API path.
+
+**Done when:**
+- Using `backend: api` in config produces a deprecation warning
+- Documentation recommends CLI backends for all providers
+- The warning names the recommended CLI alternative for the provider
+- Existing API-backend graphs continue to work (no breaking change)
+
+---
+
+## Phase 1: Run Database
+
+Goal: Introduce SQLite as the single source of truth for run-level operational state. Replace
+the file-based state assembly (progress.ndjson, checkpoint.json, final.json, per-node
+status.json) with queryable structured data.
+
+### Design Context
+
+CXDB is a conversation-history store optimized for turn-level LLM interaction data. It's good
+at recording what happened inside a node's LLM session. It is not designed for run-level
+operational state — it can't query across runs, filter by outcome, or aggregate decisions.
+
+SQLite fills the operational gap: which nodes ran, what they returned, which edges were taken,
+why routing decisions were made. CXDB remains the opt-in deep-dive tool for inspecting LLM
+conversations within a node.
+
+The database lives at the XDG state path (the codebase already uses `XDG_STATE_HOME` for
+state). Default: `~/.local/state/kilroy/runs.db`. Shared across runs with isolation by run_id.
+
+**Schema management:** Numbered migration files applied automatically on DB open. Migrations
+are append-only (add columns and tables, never remove or rename). Each migration is a single
+SQL file checked into the repo. The migration runner and the schema contract must be clearly
+documented in AGENTS.md or equivalent so that future work (human or agent) follows the pattern.
+
+### 1.1 Schema Design and Migration Infrastructure
+
+**What:** Design the initial schema and build the migration runner. The schema must support
+the queries that matter: "what's the status of run X?", "why did node Y take edge Z?",
+"which runs failed this week?", "what provider was used for node W?"
+
+**Context:** Today this information is scattered across files that must be parsed and
+cross-referenced. The schema should make these queries trivial.
+
+**Done when:**
+- A migration runner exists that auto-applies numbered SQL files on DB open
+- The initial schema is documented with purpose annotations per table and column
+- The schema supports: runs, nodes (with attempts), outcomes, edge decisions, provider selections
+- A Go package provides typed read/write access to the schema
+- Migration tests verify forward migration from empty DB
+- AGENTS.md or equivalent documents the migration pattern and rules
+
+### 1.2 Engine Writes to Run DB
+
+**What:** Wire the engine to write operational events to the run DB alongside (not instead of)
+existing file-based output. This is additive — nothing breaks.
+
+**Context:** The engine currently writes progress events via `appendProgress()` in engine.go.
+The DB writes should happen at the same points, with the same information, but into structured
+tables instead of ndjson lines.
+
+**Done when:**
+- Every run creates a row in the runs table
+- Every node execution creates rows in nodes and outcomes tables
+- Every edge decision creates a row in decisions table
+- Existing file-based output (progress.ndjson, checkpoint.json, final.json) still works
+- Test graphs from Phase 0 produce correct DB entries
+- DB contents match what progress.ndjson reports (cross-validation)
+
+### 1.3 Status Command Reads from Run DB
+
+**What:** Update `kilroy attractor status` to query the run DB as its primary data source,
+with filesystem fallback for older runs.
+
+**Context:** The status command currently parses progress.ndjson, checkpoint.json, and final.json
+to assemble run state. This is fragile and slow for large runs. The DB makes these queries
+instant and reliable.
+
+**Done when:**
+- `kilroy attractor status --run <id>` shows run state from the DB
+- `kilroy attractor status --list` shows all known runs with their status
+- `kilroy attractor status --run <id> --decisions` shows the edge decision log
+- Falls back to file-based parsing if run predates the DB
+- Works with test graphs from Phase 0
+
+---
+
+## Phase 2: Lifecycle Hooks
+
+Goal: Decouple git, CXDB, artifact policies, and other concerns from the engine core. Replace
+hardcoded behavior with a hook system where the engine fires lifecycle events and registered
+hooks respond.
+
+### Design Context
+
+Currently the engine directly calls gitutil functions, CXDB sinks, and artifact policy logic.
+This means you can't run a graph without git, and every engine change risks breaking these
+subsystems. Lifecycle hooks make these composable: a minimal setup has zero hooks (engine just
+runs), a full coding-workflow setup registers hooks for git, CXDB, filesystem snapshots, etc.
+
+Hook points: `run.before`, `run.after`, `node.before`, `node.after`, `node.on_fail`,
+`node.on_retry`, `run.on_blocked`. These are the natural lifecycle boundaries where
+cross-cutting concerns attach.
+
+### 2.1 Define Hook Interface and Registration
+
+**What:** Design and implement the hook interface, registration mechanism, and the lifecycle
+event points in the engine where hooks fire.
+
+**Context:** The engine currently has these concerns hardcoded in its main loop: git branch
+creation (run.before), git worktree setup (run.before), CXDB context creation (run.before),
+per-node git commits (node.after), CXDB event emission (node.after), checkpoint writes
+(node.after), stall detection (node.during).
+
+**Done when:**
+- A hook interface exists with methods for each lifecycle point
+- Hooks can be registered on the engine before a run starts
+- The engine calls registered hooks at the correct lifecycle points
+- A no-op hook can be registered and fires correctly (verified by test)
+- Multiple hooks can be registered and fire in order
+- Hook errors are logged but don't crash the engine (configurable: warn vs fail)
+
+### 2.2 Extract Git Operations into Hooks
+
+**What:** Move git branch creation, worktree management, per-node commits, and SHA tracking
+out of the engine core and into a git lifecycle hook. The engine doesn't know about git. The
+git hook provides worktree isolation, per-node checkpointing, and branch management as an
+opt-in capability.
+
+**Context:** Team feedback: "Don't make kilroy understand worktrees." The engine should work
+in whatever directory it's given. Git operations (worktree creation, per-node commits, SHA
+tracking) are valuable but should be composable — register the git hook and you get them,
+don't register it and the engine still works fine in a plain directory.
+
+The recommended workflow becomes: the caller creates a worktree (or the git hook does it on
+`run.before`), kilroy executes inside it, the git hook commits per node and tracks state.
+But none of that is required — kilroy can run in a temp directory with no git at all.
+
+Git operations are currently spread across engine.go (~15 callsites),
+parallel_handlers.go (branch isolation), and resume.go (SHA validation).
+
+**Done when:**
+- The engine core has zero direct gitutil imports
+- A GitHook implements the hook interface and provides all current git behavior
+- Runs WITH the GitHook behave identically to current behavior (worktrees, commits, SHAs)
+- Runs WITHOUT the GitHook succeed in any directory (no git required)
+- Parallel branch isolation works both ways: git hook uses worktrees, no-git uses temp dirs
+- The test graph suite from Phase 0 works both with and without the GitHook
+
+### 2.3 Extract CXDB into Hooks
+
+**What:** Move CXDB context creation, event emission, and turn appending out of the engine
+core and into a CXDB lifecycle hook.
+
+**Context:** CXDB integration is already mostly nil-safe, but the plumbing (sink creation,
+context ID management, turn appending) is still wired directly into the engine. Extracting it
+into a hook makes the nil-checking unnecessary — if no CXDB hook is registered, no CXDB code
+runs.
+
+**Done when:**
+- The engine core has zero direct CXDB imports
+- A CXDBHook implements the hook interface
+- Runs with the CXDBHook behave identically to current CXDB behavior
+- Runs without the CXDBHook have zero CXDB overhead
+- Existing CXDB integration tests pass
+
+### 2.4 Make Checkpoint/Resume Optional via Hooks
+
+**What:** Checkpoint and resume become hook-provided capabilities, not engine-mandated
+behavior. Some workflows (exploration, report generation, multi-repo searches) don't benefit
+from checkpointing. Making it optional removes overhead and complexity for those cases.
+
+**Context:** Currently the engine always writes checkpoint.json after each node and always
+supports resume. For workflows that don't modify a repo or produce incremental artifacts,
+this is pure overhead. With the hook system, checkpoint becomes a registered hook that can
+be omitted.
+
+**Done when:**
+- Checkpoint writes happen in a hook, not in the engine core
+- Runs without a checkpoint hook complete successfully (no checkpoint files written)
+- Runs with the checkpoint hook behave identically to current behavior
+- Resume requires the checkpoint hook to have been active during the original run
+- A clear error message if resume is attempted on a run that had no checkpointing
+
+### 2.5 Filesystem Change Tracking Hook
+
+**What:** Provide a hook that captures what filesystem changes each node made during execution.
+This is useful for understanding what an agent did without reading its full conversation log.
+
+**Context:** Currently there's no structured record of what files a node created, modified,
+or deleted. The git hook captures this implicitly (via commits), but without git there's no
+visibility. A dedicated hook can snapshot before/after and produce a diff.
+
+**Done when:**
+- A filesystem tracking hook can be registered that records file changes per node
+- Changes are recorded in the run DB (or a structured output) as: path, operation (create/modify/delete), size
+- The hook works independently of git (doesn't require worktrees or commits)
+- Change records are visible via `kilroy attractor status --run <id> --node <node_id>`
+
+### 2.6 Extract Artifact Policy into Hooks
+
+**What:** Move artifact policy resolution, checkpoint exclusion globs, and managed root
+configuration out of the engine core and into a hook.
+
+**Context:** Artifact policies are resolved once per run and applied as configuration to node
+execution. Currently this is tightly coupled to the engine's setup phase and config parsing.
+
+**Done when:**
+- Artifact policy is applied via a hook, not hardcoded in the engine
+- Runs without the hook use no artifact policy (simple default behavior)
+- Runs with the hook behave identically to current behavior
+- Configuration for artifact profiles still works through the run config
+
+---
+
+## Phase 2.5: Run Supervisor
+
+Goal: Introduce a supervisor layer that sits above the graph runner and manages run lifecycle,
+health monitoring, blocker detection, and user notification. This is the "babysitter" that
+watches runs and knows when to surface issues vs when to stay quiet.
+
+### Design Context
+
+Currently the engine tries to be both executor and supervisor. The stall watchdog, loop_restart
+circuit breakers, and failure classification heuristics are all supervisor concerns jammed into
+the traversal loop. This makes the engine complex and the supervision logic hard to tune.
+
+The supervisor sits outside the engine. It monitors runs by querying the run DB (Phase 1) and
+intervenes through lifecycle hooks (Phase 2). It can manage multiple concurrent runs. Its
+policies are configurable: "if stuck for 10 minutes, try a different model; if stuck for 30
+minutes, notify me; if the same node fails 3 times with the same error, stop and ask."
+
+This is also the layer that fulfills the "quickly determine if user input is needed" goal.
+The supervisor watches early run progress and surfaces blockers (missing auth, missing data,
+infrastructure failures) to the user as fast as possible, then stays quiet while the run
+proceeds unattended.
+
+### 2.5.1 Supervisor Core and Health Monitoring
+
+**What:** Build the supervisor as a wrapper around the engine that monitors run health by
+polling the run DB and applying configurable policies.
+
+**Context:** The engine's current stall watchdog (hardcoded thresholds, embedded in the main
+loop) is a primitive version of this. The supervisor replaces it with a clean separation:
+the engine runs, the supervisor observes and reacts.
+
+**Done when:**
+- A supervisor can start a run and monitor its progress via the run DB
+- Configurable health policies: stall detection, repeated failure detection, progress thresholds
+- The supervisor can detect and report: stalled nodes, repeated failures on the same node,
+  runs that haven't made progress in N minutes
+- Health status is queryable: `kilroy attractor status --run <id>` shows supervisor assessment
+- The supervisor works with the script-based test graphs (e.g., a graph with a deliberately
+  slow node triggers stall detection)
+
+### 2.5.2 Blocker Detection and User Notification
+
+**What:** The supervisor identifies situations that require user attention and surfaces them
+clearly, distinguishing "stuck and retrying" from "truly blocked."
+
+**Context:** The current system doesn't distinguish between transient failures (worth retrying)
+and blockers (need human input). A rate-limited API call looks the same to the user as a
+missing database. The supervisor should classify blockers and notify the user with actionable
+context.
+
+**Done when:**
+- The supervisor classifies run states: healthy, degraded (retrying), blocked (needs user),
+  failed (terminal)
+- Blocked state includes: what's blocking, which node, what the user can do about it
+- A notification mechanism exists (at minimum: CLI output, structured event in DB; extensible
+  to webhooks, etc.)
+- The "quickly determine if user input needed" pattern works: supervisor watches early nodes
+  and surfaces prerequisite failures (auth, missing resources) within the first few minutes
+- Test graphs exercise the blocked detection (e.g., a node that requires a file that doesn't exist)
+
+### 2.5.3 Self-Introspection
+
+**What:** The system should be able to explain its own state, decisions, and plan in
+human-readable terms. Not just "what happened" but "what is the system's model of what's
+happening and why."
+
+**Context:** Team feedback: "needs to introspect itself." This means the supervisor can
+answer questions like:
+- "Why are you stuck?" → "Node X has failed 3 times with the same build error. The failure
+  is deterministic — retrying won't help."
+- "What's your plan?" → "I'm on node 4 of 7. Next is verify, then review."
+- "What have you tried?" → "Attempt 1: error A. Attempt 2: same. Attempt 3: different error B."
+
+This is built on the run DB (Phase 1) for data and the supervisor for interpretation.
+
+**Done when:**
+- `kilroy attractor status --run <id> --explain` produces a human-readable narrative of
+  the run's current state, recent decisions, and what's expected next
+- The explanation includes: current node, progress through graph, recent failures with
+  reasons, what the system plans to do next
+- For stuck/failed runs: the explanation diagnoses why (deterministic failure, missing
+  resource, repeated error) and suggests what the user can do
+- The explanation is generated from run DB data, not from parsing log files
+
+### 2.5.4 Supervisor Intervention Policies
+
+**What:** Allow the supervisor to take corrective action based on configurable policies, using
+lifecycle hooks as the intervention mechanism.
+
+**Context:** Today, model escalation and retry decisions are hardcoded in the engine. The
+supervisor should own these decisions: "this node failed 3 times with model X, try model Y"
+or "this run has been stuck for an hour, pause and notify." The engine provides the hooks;
+the supervisor decides when and how to use them.
+
+**Done when:**
+- The supervisor can pause and resume a run
+- The supervisor can request a model/provider change for a failing node (via hook or context update)
+- Policies are configurable (not hardcoded): retry limits, escalation models, stall thresholds,
+  notification preferences
+- A default policy set works out of the box (sane defaults, not overwhelming configuration)
+- The engine's current hardcoded stall watchdog and escalation logic can be removed in favor
+  of supervisor-driven policies
+
+---
+
+## Phase 3: Explicit Routing
+
+Goal: Replace the multi-mechanism routing system with a single, predictable mechanism.
+Every routing decision should be visible in the graph as an explicit edge with an explicit
+condition.
+
+### Design Context
+
+The current routing system has 4+ interacting mechanisms evaluated in priority order:
+1. Condition expressions on edges
+2. preferred_label matching
+3. suggested_next_ids targeting
+4. retry_target chains (node → node fallback → graph → graph fallback)
+
+Plus implicit fan-out detection (BFS for convergence points), goal gate enforcement at exit,
+and failure classification that affects routing eligibility. A graph author cannot predict
+which mechanism will fire without mentally simulating the engine.
+
+The target: **edges have conditions, conditions are evaluated in declaration order, first
+match wins.** Retry routing, failure handling, and default paths are all expressed as explicit
+edges in the graph. The engine evaluates and logs; the graph declares.
+
+### 3.1 Consolidate Routing to Condition-Based Edges
+
+**What:** Make condition-based edge evaluation the single routing mechanism. Remove
+preferred_label matching, suggested_next_ids, and implicit weight-based tiebreaking from
+the engine's routing logic.
+
+**Context:** `next_hop.go` implements the 5-step priority cascade. The condition evaluator
+in `cond/cond.go` already handles the important cases. The other mechanisms exist as fallbacks
+for cases where conditions aren't specified, but they make routing unpredictable.
+
+**Done when:**
+- Edge routing evaluates conditions in edge declaration order, selects first match
+- An unconditional edge (no condition attribute) matches anything (serves as default/fallback)
+- preferred_label and suggested_next_ids are no longer consulted for routing
+- The decision log clearly shows: which edges were candidates, which conditions matched,
+  which edge was selected
+- Test graphs from Phase 0 validate the new routing behavior
+- A migration guide documents how to convert existing graphs (retry_target → explicit edges)
+
+### 3.2 Remove Implicit Fan-Out Detection
+
+**What:** Remove the engine's BFS-based convergence detection that silently enables or
+disables parallel execution based on graph topology.
+
+**Context:** When a node has multiple eligible outgoing edges, the engine runs `findJoinNode()`
+to detect topological convergence. If found, it dispatches parallel branches. If not, it
+silently falls back to single-edge selection. This means the same graph structure can produce
+either parallel or sequential execution depending on subtle topology differences.
+
+**Done when:**
+- Parallel execution only happens at explicit parallel nodes (shape=component)
+- Nodes with multiple outgoing edges where multiple conditions match produce a clear error
+  or warning, not silent single-edge fallback
+- The engine never silently converts sequential routing into parallel execution
+- Validation warns when multiple unconditional edges leave a non-parallel node
+- Test graphs verify both parallel (explicit) and multi-edge (error) cases
+
+### 3.3 Retire retry_target Chains
+
+**What:** Remove the retry_target and fallback_retry_target attributes from nodes and graphs.
+Replace with explicit retry edges in the graph.
+
+**Context:** retry_target is an invisible routing mechanism: when a node fails and no
+fail-condition edge matches, the engine checks node.retry_target, then node.fallback_retry_target,
+then graph.retry_target, then graph.fallback_retry_target. This 4-level chain is invisible in
+the graph and interacts with failure classification in non-obvious ways.
+
+**Done when:**
+- retry_target and fallback_retry_target attributes are no longer consulted by the engine
+- Retry routing is expressed as edges: `node_a -> retry_node [condition="outcome=fail"]`
+- Validation warns if a node has outgoing edges but no edge handles `outcome=fail`
+- A migration guide documents the conversion pattern
+- Existing test graphs are migrated to explicit retry edges
+
+### 3.4 Retire Goal Gate Mechanism
+
+**What:** Remove the goal_gate enforcement that happens at the exit node. Replace with
+explicit edges — if a critical node must succeed before the pipeline can exit, express that
+as graph structure, not as a metadata annotation checked at a distance.
+
+**Context:** goal_gate=true on a node means the engine tracks it and checks at exit time
+whether it reached SUCCESS. If not, it routes to the node's retry_target. This is another
+invisible routing layer: the graph author marks a node as critical, but the enforcement
+happens at the exit node, not at the marked node. With explicit routing (Phase 3.1), the
+graph can express "don't proceed unless this succeeded" directly as edge conditions.
+
+**Done when:**
+- goal_gate attribute is no longer consulted by the engine
+- Critical-path enforcement is expressed as edge conditions on the relevant nodes
+- Validation no longer requires goal_gate nodes to have retry_targets
+- A migration guide documents the conversion pattern
+
+### 3.5 Prompt File Support
+
+**What:** Support `prompt_file` attribute on nodes to reference external prompt files. Prompts
+can be inline (current behavior) or external (file reference) — author's choice per node.
+
+**Context:** Production graphs embed multi-paragraph prompts as DOT string attributes. This
+makes graphs hard to read and hard to diff. Moving prompts to separate files improves
+readability without losing the ability to have simple inline prompts for small nodes. The
+graph remains the single source of truth for structure; prompt files are referenced by path
+and resolved at parse time.
+
+**Done when:**
+- Nodes can use `prompt_file="path/to/prompt.md"` instead of or alongside `prompt="..."`
+- prompt_file paths are resolved relative to the graph file's directory
+- Validation checks that referenced prompt files exist
+- Variable expansion ($goal, etc.) works in prompt files the same as inline prompts
+- Inline `prompt` and `prompt_file` can coexist (prompt_file takes precedence, or they
+  concatenate — decide during implementation)
+- Existing graphs with inline prompts continue to work unchanged
+
+### 3.6 Simplify Fidelity Model
+
+**What:** Collapse the 6 fidelity modes (full, truncate, compact, summary:low/medium/high)
+and 4-level precedence chain into a simpler model that the engine manages automatically.
+Graph authors shouldn't need to think about token budgets.
+
+**Context:** The fidelity system controls how much conversation context carries between nodes.
+It has 6 modes, a 4-level precedence chain (edge → node → graph → default), and a separate
+thread_id resolution chain. This is infrastructure-level complexity exposed as an authoring
+concern. In practice, there are two meaningful modes: "continue the conversation" (the agent
+keeps its context) and "fresh start" (new context with a summary of what happened).
+
+**Done when:**
+- The graph-authoring surface is reduced to at most 2-3 fidelity options with clear semantics
+- The engine handles context budgeting automatically (compaction, summarization) without
+  graph author involvement
+- The precedence chain is simplified (node attribute or graph default, no edge-level override)
+- Thread management is handled by the agent handler, not the engine
+- Existing graphs with fidelity attributes still work (backward compat during transition)
+
+---
+
+## Phase 4: Clean Handler Interface
+
+Goal: Establish a clean boundary between the engine (graph traversal, routing, lifecycle) and
+handlers (the work each node does). Handlers own their dependencies. The engine provides
+minimal context.
+
+### Design Context
+
+Currently the codergen handler receives the full engine, the graph, the logs root, provider
+runtimes, the model catalog, and the CXDB sink. This couples handlers to engine internals.
+The target: handlers receive node attributes and a run context (key-value store), and return
+an outcome. How they invoke LLMs, manage tools, or interpret results is their concern.
+
+The parallel/fan-in boundary also needs cleanup: the engine owns the concurrency lifecycle
+(recognizing parallel structure, managing worker pools, evaluating join policies), while
+hooks handle isolation (git worktrees, temp directories) and handlers do the actual branch work.
+
+### 4.1 Define Minimal Handler Interface
+
+**What:** Design the handler interface that handlers implement. It should provide enough
+context for handlers to do their work without coupling them to engine internals.
+
+**Context:** The current handler interface varies by handler type. Codergen handlers receive
+substantially more context than tool handlers. The goal is a uniform interface that works for
+all handler types while allowing handlers to bring their own dependencies via closure or
+constructor injection.
+
+**Done when:**
+- A handler interface is defined with a single Execute method
+- The method receives: node attributes (read-only), run context (read/write), workspace path,
+  and a logger/event emitter
+- The method returns: an Outcome (status, failure_reason, context_updates)
+- The three core handler types (agent, command, gate) implement this interface
+- The engine dispatches to handlers through this interface only
+- Test graphs validate all three handler types through the new interface
+
+### 4.2 Reform Status Contract
+
+**What:** Replace the file-based status contract (LLM writes JSON to a specific path) with
+handler-interpreted outcomes. The handler extracts success/fail from the agent's behavior,
+not from a file the agent was instructed to write.
+
+**Context:** Currently, every codergen node's prompt must include instructions to write
+status.json to `$KILROY_STAGE_STATUS_PATH` with specific fields. LLMs get the format wrong,
+write to the wrong path, or forget entirely. This is the most fragile part of the pipeline —
+a typo in a JSON field name silently breaks routing.
+
+With a reformed contract, the handler owns interpretation: for CLI agents, exit code + output
+parsing. For API agents, the final message content. For command/tool nodes, exit code. The
+LLM doesn't need to know about status.json at all. The handler reports a structured outcome
+to the engine through the handler interface, which writes it to the run DB.
+
+**Done when:**
+- The agent handler can determine success/fail without the LLM writing a status file
+- Command handlers use exit code (0 = success, non-zero = fail)
+- Agent handlers interpret the agent's final output or behavior to determine outcome
+- Status file writing is optional (an agent CAN still write one to provide richer outcome
+  data like preferred_label or context_updates, but it's not required for basic routing)
+- Prompts no longer need the KILROY_STAGE_STATUS_PATH boilerplate
+- The run DB records the outcome regardless of how it was determined
+
+### 4.3 Clean Parallel/Fan-In Boundary
+
+**What:** Separate the engine's concurrency management from the handler-level isolation and
+merge strategies. The engine manages worker pools and join policies. Hooks manage isolation.
+Handlers do the branch work.
+
+**Context:** The parallel handler currently creates git worktrees, clones engine instances,
+manages worker pools, evaluates join policies, and handles fan-in winner selection — all in
+one handler. The engine-level concerns (concurrency, join policy) should be in the engine.
+The isolation concerns (worktrees, temp dirs) should be in hooks. The winner selection should
+be in the fan-in handler.
+
+**Done when:**
+- The engine recognizes parallel nodes and manages concurrent branch dispatch
+- Join policy evaluation (wait_all, first_success, k_of_n, quorum) is engine-level
+- Error policy (continue, fail_fast, ignore) is engine-level
+- Branch isolation is provided by lifecycle hooks (git hook creates worktrees; no-git hook
+  creates temp directories)
+- Fan-in winner selection remains handler-level
+- Parallel execution works with and without git hooks
+- Test graphs with parallel nodes validate the new boundary
+
+---
+
+## Test Strategy
+
+Every phase is validated with the script-based test graph suite from 0.1. No LLM involvement
+in engine validation.
+
+For each completed task:
+1. `go test ./...` passes (existing tests)
+2. All test graphs in the suite execute correctly
+3. The specific behavior change is demonstrated (e.g., decision log shows edge evaluation
+   for 0.4, bad credentials fail preflight for 0.5)
+
+LLM-backed validation (real graphs with real models) happens after Phase 0 is complete and
+the engine plumbing is proven solid with deterministic graphs.
+
+## Progressive Delivery
+
+This plan is designed to build while the plane is flying:
+
+- **Phase 0** (8 tasks) — each independently shippable, immediately improves the current system;
+  includes CLI-only backend shift (opencode for non-claude/codex providers, deprecate API)
+- **Phase 1** (3 tasks) — additive; the DB writes alongside existing files, nothing breaks
+- **Phase 2** (6 tasks) — decouples concerns; git removed from engine (caller manages worktrees),
+  CXDB/artifacts into hooks, checkpoint optional
+- **Phase 2.5** (4 tasks) — adds the supervisor; run monitoring, blocker detection,
+  self-introspection, intervention policies
+- **Phase 3** (6 tasks) — changes graph semantics and authoring model; requires graph migration
+- **Phase 4** (3 tasks) — cleans interfaces; handler contract, status reform, parallel boundary
+
+At any point, we can pause and use the system as-is. Each completed phase leaves the system
+strictly better than before.
+
+---
+
+## Future Work (Beyond This Plan)
+
+These are known gaps that this plan doesn't address but should be tackled eventually:
+
+### Remove API Backend Entirely
+
+Phase 0.8 deprecates the API backend in favor of CLI-only (claude, codex, gemini, opencode).
+Once the deprecation has been in place and CLI-only is proven reliable, the API backend code
+(`internal/agent/`, `internal/llm/`) can be removed entirely. This is a significant code
+reduction that simplifies maintenance and testing.
+
+### Configuration Surface Simplification
+
+Zero-config startup (Phase 0.7) addresses the new-user experience, but the config file
+itself remains a combinatorial explosion for advanced users. The config model should be
+reviewed and simplified: collapse redundant options, remove options that have only one
+reasonable value, group related settings, and provide clear documentation of what each
+option does and when you'd change it.
+
+### Graph Authoring Tooling
+
+Writing DOT files by hand is error-prone. Future tooling could include: a graph visualizer
+that shows the execution topology, a graph editor with validation feedback, graph templates
+for common patterns (linear+verify, fan-out consensus), and prompt scaffolding that generates
+the boilerplate for new nodes.
diff --git a/internal/attractor/engine/config.go b/internal/attractor/engine/config.go
index f1037dea..1a67bb38 100644
--- a/internal/attractor/engine/config.go
+++ b/internal/attractor/engine/config.go
@@ -309,9 +309,6 @@ func validateConfig(cfg *RunConfigFile) error {
 	if strings.TrimSpace(cfg.Repo.Path) == "" {
 		return fmt.Errorf("repo.path is required")
 	}
-	if strings.TrimSpace(cfg.CXDB.BinaryAddr) == "" || strings.TrimSpace(cfg.CXDB.HTTPBaseURL) == "" {
-		return fmt.Errorf("cxdb.binary_addr and cxdb.http_base_url are required in v1")
-	}
 	if cfg.CXDB.Autostart.WaitTimeoutMS < 0 {
 		return fmt.Errorf("cxdb.autostart.wait_timeout_ms must be >= 0")
 	}
diff --git a/internal/attractor/engine/phase0_tool_graph_test.go b/internal/attractor/engine/phase0_tool_graph_test.go
new file mode 100644
index 00000000..72512849
--- /dev/null
+++ b/internal/attractor/engine/phase0_tool_graph_test.go
@@ -0,0 +1,283 @@
+// Integration tests for script-based (tool-node-only) graphs.
+// Validates engine traversal, routing, and failure handling without LLM involvement.
+
+package engine
+
+import (
+	"context"
+	"encoding/json"
+	"os"
+	"path/filepath"
+	"strings"
+	"testing"
+	"time"
+
+	"github.com/danshapiro/kilroy/internal/attractor/runtime"
+)
+
+func TestToolGraph_Linear(t *testing.T) {
+	t.Setenv("XDG_STATE_HOME", t.TempDir())
+	repo := initTestRepo(t)
+	logsRoot := t.TempDir()
+	pinned := writePinnedCatalog(t)
+
+	dot := []byte(`digraph linear {
+  graph [goal="Test linear traversal"]
+  start [shape=Mdiamond]
+  step_a [shape=parallelogram, tool_command="echo step_a_done"]
+  step_b [shape=parallelogram, tool_command="echo step_b_done"]
+  step_c [shape=parallelogram, tool_command="echo step_c_done"]
+  done [shape=Msquare]
+  start -> step_a -> step_b -> step_c -> done
+}`)
+	cfg := minimalToolGraphConfig(repo, pinned)
+	ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
+	defer cancel()
+
+	res, err := RunWithConfig(ctx, dot, cfg, RunOptions{
+		RunID:       "linear-test",
+		LogsRoot:    logsRoot,
+		DisableCXDB: true,
+	})
+	if err != nil {
+		t.Fatalf("RunWithConfig: %v", err)
+	}
+	if res.FinalStatus != runtime.FinalSuccess {
+		t.Fatalf("expected success, got %q", res.FinalStatus)
+	}
+
+	// Verify all three steps executed
+	for _, nodeID := range []string{"step_a", "step_b", "step_c"} {
+		stdout := filepath.Join(logsRoot, nodeID, "stdout.log")
+		data, err := os.ReadFile(stdout)
+		if err != nil {
+			t.Fatalf("read %s stdout: %v", nodeID, err)
+		}
+		if !strings.Contains(string(data), nodeID+"_done") {
+			t.Fatalf("%s stdout: got %q, want to contain %q", nodeID, string(data), nodeID+"_done")
+		}
+	}
+}
+
+func TestToolGraph_LinearVerify_HillClimber(t *testing.T) {
+	t.Setenv("XDG_STATE_HOME", t.TempDir())
+	repo := initTestRepo(t)
+	logsRoot := t.TempDir()
+	pinned := writePinnedCatalog(t)
+
+	// Create a counter file in the repo — verify succeeds on 3rd attempt
+	counterFile := filepath.Join(repo, "attempt_counter")
+	if err := os.WriteFile(counterFile, []byte("0"), 0o644); err != nil {
+		t.Fatal(err)
+	}
+	commitAll(t, repo, "add counter")
+
+	dot := []byte(`digraph hill_climber {
+  graph [goal="Test hill-climber verify loop"]
+  start [shape=Mdiamond]
+  implement [
+    shape=parallelogram,
+    tool_command="count=$(cat attempt_counter); count=$((count + 1)); echo $count > attempt_counter; echo implemented_attempt_$count"
+  ]
+  verify [
+    shape=parallelogram,
+    tool_command="count=$(cat attempt_counter); if [ $count -ge 3 ]; then echo 'all checks pass'; exit 0; else echo 'checks failed on attempt '$count; exit 1; fi"
+  ]
+  done [shape=Msquare]
+  start -> implement
+  implement -> verify
+  verify -> done [condition="outcome=success"]
+  verify -> implement
+}`)
+	cfg := minimalToolGraphConfig(repo, pinned)
+	ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
+	defer cancel()
+
+	res, err := RunWithConfig(ctx, dot, cfg, RunOptions{
+		RunID:       "hill-climber-test",
+		LogsRoot:    logsRoot,
+		DisableCXDB: true,
+	})
+	if err != nil {
+		t.Fatalf("RunWithConfig: %v", err)
+	}
+	if res.FinalStatus != runtime.FinalSuccess {
+		t.Fatalf("expected success, got %q", res.FinalStatus)
+	}
+}
+
+func TestToolGraph_Conditional(t *testing.T) {
+	t.Setenv("XDG_STATE_HOME", t.TempDir())
+	repo := initTestRepo(t)
+	logsRoot := t.TempDir()
+	pinned := writePinnedCatalog(t)
+
+	// process writes a marker file; path_a and path_b check for it
+	dot := []byte(`digraph conditional {
+  graph [goal="Test conditional routing"]
+  start [shape=Mdiamond]
+  process [shape=parallelogram, tool_command="echo processed"]
+  check [shape=diamond]
+  path_a [shape=parallelogram, tool_command="echo took_path_a"]
+  done [shape=Msquare]
+  start -> process -> check
+  check -> path_a
+  path_a -> done
+}`)
+	cfg := minimalToolGraphConfig(repo, pinned)
+	ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
+	defer cancel()
+
+	res, err := RunWithConfig(ctx, dot, cfg, RunOptions{
+		RunID:       "conditional-test",
+		LogsRoot:    logsRoot,
+		DisableCXDB: true,
+	})
+	if err != nil {
+		t.Fatalf("RunWithConfig: %v", err)
+	}
+	if res.FinalStatus != runtime.FinalSuccess {
+		t.Fatalf("expected success, got %q", res.FinalStatus)
+	}
+
+	// Verify path_a was taken
+	stdout := filepath.Join(logsRoot, "path_a", "stdout.log")
+	data, err := os.ReadFile(stdout)
+	if err != nil {
+		t.Fatalf("read path_a stdout: %v", err)
+	}
+	if !strings.Contains(string(data), "took_path_a") {
+		t.Fatalf("path_a stdout: got %q, want to contain 'took_path_a'", string(data))
+	}
+}
+
+func TestToolGraph_FailFast(t *testing.T) {
+	t.Setenv("XDG_STATE_HOME", t.TempDir())
+	repo := initTestRepo(t)
+	logsRoot := t.TempDir()
+	pinned := writePinnedCatalog(t)
+
+	// A node that fails with no outgoing edge causes the run to fail.
+	dot := []byte(`digraph fail_fast {
+  graph [goal="Test failure handling"]
+  start [shape=Mdiamond]
+  failing_step [shape=parallelogram, tool_command="echo 'something went wrong' >&2; exit 1"]
+  done [shape=Msquare]
+  start -> failing_step -> done
+}`)
+	cfg := minimalToolGraphConfig(repo, pinned)
+	ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
+	defer cancel()
+
+	res, err := RunWithConfig(ctx, dot, cfg, RunOptions{
+		RunID:       "fail-fast-test",
+		LogsRoot:    logsRoot,
+		DisableCXDB: true,
+	})
+	if err != nil {
+		t.Fatalf("RunWithConfig: %v", err)
+	}
+
+	// The node fails, but the unconditional edge routes to done.
+	// Verify the node itself recorded a failure.
+	statusPath := filepath.Join(logsRoot, "failing_step", "status.json")
+	data, err := os.ReadFile(statusPath)
+	if err != nil {
+		t.Fatalf("read status.json: %v", err)
+	}
+	var out runtime.Outcome
+	if err := json.Unmarshal(data, &out); err != nil {
+		t.Fatalf("parse status.json: %v", err)
+	}
+	if out.Status != runtime.StatusFail {
+		t.Fatalf("expected fail status, got %q", out.Status)
+	}
+	_ = res
+}
+
+func TestToolGraph_NoCXDBConfig(t *testing.T) {
+	t.Setenv("XDG_STATE_HOME", t.TempDir())
+	repo := initTestRepo(t)
+	logsRoot := t.TempDir()
+	pinned := writePinnedCatalog(t)
+
+	dot := []byte(`digraph no_cxdb {
+  graph [goal="Test running without CXDB config"]
+  start [shape=Mdiamond]
+  step [shape=parallelogram, tool_command="echo works_without_cxdb"]
+  done [shape=Msquare]
+  start -> step -> done
+}`)
+	// Config with NO CXDB section at all
+	cfg := &RunConfigFile{}
+	cfg.Version = 1
+	cfg.Repo.Path = repo
+	cfg.ModelDB.OpenRouterModelInfoPath = pinned
+	cfg.ModelDB.OpenRouterModelInfoUpdatePolicy = "pinned"
+	cfg.LLM.CLIProfile = "test_shim"
+
+	ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
+	defer cancel()
+
+	res, err := RunWithConfig(ctx, dot, cfg, RunOptions{
+		RunID:    "no-cxdb-test",
+		LogsRoot: logsRoot,
+		// Note: NOT setting DisableCXDB — relying on empty config being treated as disabled
+	})
+	if err != nil {
+		t.Fatalf("RunWithConfig: %v", err)
+	}
+	if res.FinalStatus != runtime.FinalSuccess {
+		t.Fatalf("expected success, got %q", res.FinalStatus)
+	}
+}
+
+func TestToolGraph_WorkspaceLifecycle(t *testing.T) {
+	t.Setenv("XDG_STATE_HOME", t.TempDir())
+	repo := initTestRepo(t)
+	logsRoot := t.TempDir()
+	pinned := writePinnedCatalog(t)
+
+	dot := []byte(`digraph workspace {
+  graph [goal="Test workspace setup and cleanup in flow"]
+  start [shape=Mdiamond]
+  setup [shape=parallelogram, tool_command="mkdir -p workspace && echo setup_done > workspace/state.txt"]
+  work [shape=parallelogram, tool_command="cat workspace/state.txt && echo work_done >> workspace/state.txt"]
+  verify [shape=parallelogram, tool_command="grep -q work_done workspace/state.txt"]
+  done [shape=Msquare]
+  start -> setup -> work -> verify -> done
+}`)
+	cfg := minimalToolGraphConfig(repo, pinned)
+	ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
+	defer cancel()
+
+	res, err := RunWithConfig(ctx, dot, cfg, RunOptions{
+		RunID:       "workspace-test",
+		LogsRoot:    logsRoot,
+		DisableCXDB: true,
+	})
+	if err != nil {
+		t.Fatalf("RunWithConfig: %v", err)
+	}
+	if res.FinalStatus != runtime.FinalSuccess {
+		t.Fatalf("expected success, got %q", res.FinalStatus)
+	}
+}
+
+// minimalToolGraphConfig returns a RunConfigFile suitable for tool-node-only graphs.
+func minimalToolGraphConfig(repoPath, pinnedCatalogPath string) *RunConfigFile {
+	cfg := &RunConfigFile{}
+	cfg.Version = 1
+	cfg.Repo.Path = repoPath
+	cfg.ModelDB.OpenRouterModelInfoPath = pinnedCatalogPath
+	cfg.ModelDB.OpenRouterModelInfoUpdatePolicy = "pinned"
+	cfg.LLM.CLIProfile = "test_shim"
+	return cfg
+}
+
+// commitAll stages and commits all changes in the repo.
+func commitAll(t *testing.T, repo, msg string) {
+	t.Helper()
+	runCmd(t, repo, "git", "add", "-A")
+	runCmd(t, repo, "git", "commit", "-m", msg)
+}
diff --git a/internal/attractor/engine/run_with_config.go b/internal/attractor/engine/run_with_config.go
index 9e0ca63f..ebac12bb 100644
--- a/internal/attractor/engine/run_with_config.go
+++ b/internal/attractor/engine/run_with_config.go
@@ -330,8 +330,8 @@ func bootstrapRunWithConfig(ctx context.Context, dotSource []byte, cfg *RunConfi
 		bin        *cxdb.BinaryClient
 		startup    *CXDBStartupInfo
 	)
-	if !overrides.DisableCXDB {
-		// CXDB is required in v1 and must be reachable.
+	cxdbConfigured := strings.TrimSpace(cfg.CXDB.BinaryAddr) != "" && strings.TrimSpace(cfg.CXDB.HTTPBaseURL) != ""
+	if !overrides.DisableCXDB && cxdbConfigured {
 		var cxdbStartup *CXDBStartupInfo
 		cxdbClient, bin, cxdbStartup, err = ensureCXDBReady(ctx, cfg, opts.LogsRoot, opts.RunID)
 		if err != nil {