Skip to content

Latest commit

 

History

History
376 lines (311 loc) · 18.8 KB

File metadata and controls

376 lines (311 loc) · 18.8 KB

Agent Routing, Permissions, and Model Registry

Every chat resolves to a provider, a model, and a permission mode before a turn runs. This document describes how those choices are made and where the machinery lives.

Source file map

Path Role
apps/desktop/src/shared/modelRegistry.ts Single source of truth for model descriptors. Defines MODEL_REGISTRY, ModelDescriptor, resolution helpers.
apps/desktop/src/shared/modelProfiles.ts Curated selection helpers (task routing, default pickers).
apps/desktop/src/shared/chatModelSwitching.ts canSwitchChatSessionModel / filterChatModelIdsForSession -- rules for mid-session model changes.
apps/desktop/src/main/services/chat/agentChatService.ts handoffSession, permission translation, per-provider adapter.
apps/desktop/src/main/services/ai/providerRuntimeHealth.ts Tracks provider readiness/auth/network failures so the UI can surface degraded states.
apps/desktop/src/main/services/ai/providerOptions.ts Normalises provider-native options (Claude permission mode, Codex approval + sandbox, OpenCode permission).
apps/desktop/src/main/services/ai/authDetector.ts Discovers available credentials (CLI, API key, OAuth) and reports auth status.
apps/desktop/src/main/services/ai/codexExecutable.ts / droidExecutable.ts CLI resolution for runtimes that still need an external binary (looks on PATH, in the app bundle, then in configured install paths where supported). Claude uses the bundled Claude Agent SDK binary; Cursor runs through the embedded @cursor/sdk.
apps/desktop/src/main/services/ai/tools/systemPrompt.ts Adjusts the system prompt per mode (chat, coding, planning) and permission mode.

Supported providers

AgentChatProvider is "codex" | "claude" | "cursor" | "droid" | "opencode" | (string & {}). The final branch exists so local discovery can populate provider keys for vendored runtimes without changing the union.

Provider Runtime Adapter location
claude @anthropic-ai/claude-agent-sdk query() stream with an ADE async input pump, startup() warmup, bundled Claude Code binary, SDK sessions, hooks, output styles, plugins, context usage, rewind, and slash-command dispatch. agentChatService.ts (inline; the file carries the full Claude adapter).
codex codex app-server subprocess, JSON-RPC protocol. Spawn failures surface as error events. agentChatService.ts (Codex adapter); config via codexAppServerConfig.ts.
opencode OpenCode server runtime: Anthropic/OpenAI/Google/Mistral/DeepSeek/xAI/Groq/Together AI API keys, OpenRouter, and local (Ollama, LM Studio, vLLM). agentChatService.ts (OpenCode adapter); model discovery in localModelDiscovery.ts and modelsDevService.ts.
cursor Official @cursor/sdk running in a Node worker pool. ADE owns permissions, hooks, and the system prompt; the SDK owns the model + tool execution. cursorSdkPool.ts, cursorSdkWorker.ts, cursorSdkProtocol.ts, cursorSdkPolicy.ts, cursorSdkSystemPrompt.ts, cursorSdkEventMapper.ts.
droid Factory Droid CLI models exposed as dynamic droid/<modelId> descriptors and driven through the Droid ACP bridge. agentChatService.ts (Droid adapter); model helpers in modelRegistry.ts.

Model registry

MODEL_REGISTRY is a static catalogue of ModelDescriptor records:

type ModelDescriptor = {
  id: string;             // stable ADE id
  shortId: string;        // CLI-facing token
  aliases?: string[];     // user-facing aliases (e.g. "sonnet", "opus")
  displayName: string;
  family: ProviderFamily; // anthropic | openai | opencode | google | ...
  authTypes: AuthType[];  // cli-subscription | api-key | oauth | openrouter | local
  contextWindow: number;
  maxOutputTokens: number;
  capabilities: { tools, vision, reasoning, streaming };
  reasoningTiers?: string[];
  color: string;
  providerRoute: string;
  providerModelId: string;
  cliCommand?: string;
  isCliWrapped: boolean;
  deprecated?: boolean;
  inputPricePer1M?: number;
  outputPricePer1M?: number;
  costTier?: "low" | "medium" | "high" | "very_high";
  harnessProfile?: "verified" | "guarded" | "read_only"; // local models
  discoverySource?: "lmstudio-rest" | "lmstudio-openai" | "ollama";
  openCodeProviderId?: string;
  openCodeModelId?: string;
};

Helpers (also re-exported through shared/modelRegistry.ts):

  • getModelById(id) -- exact id match.
  • resolveModelAlias(alias) -- resolves user-facing aliases.
  • getDefaultModelDescriptor() -- default model.
  • resolveModelDescriptorForProvider(provider, modelId?) -- fallback resolution when an agent requests a model not available under a specific provider.
  • resolveChatProviderForDescriptor(descriptor) -- chooses the appropriate provider for a given model.
  • resolveProviderGroupForModel(modelId) -- groups models by family/provider-group for handoff decisions.
  • getAvailableModels(opts) -- filters by auth, discovery, and feature flags.
  • getDynamicOpenCodeModelDescriptors() / listModelDescriptorsForProvider -- discovery-aware lists.

Dynamic local-model discovery (localModelDiscovery.ts) mutates the registry at runtime when LM Studio or Ollama report available models. These descriptors carry discoverySource and a harnessProfile that defaults to guarded unless explicitly whitelisted.

Reasoning tiers (Claude)

Claude's reasoning-tier vocabulary is low | medium | high | max (CLAUDE_THINKING_LEVELS in shared/modelProfiles.ts). max was added alongside the Claude Opus 4.7 1M entry (anthropic/claude-opus-4-7-1m, aliases opus[1m] / claude-opus-4-7[1m], 1,000,000-token context, 128 k output, tier very_high) — it's the first registry entry that advertises the full low|medium|high|max tier set. Passthrough to the provider config is unchanged (the tier string is forwarded directly to the CLI / SDK — no synthesized token budgets).

Auth and credentials

authDetector.ts (detectAllAuth) probes every provider:

  • CLI-wrapped providers (claude, codex) check for the binary on PATH and then for the app's auth token cache.
  • Cursor authenticates through the SDK (API key / managed credential). The optional Cursor CLI is only inspected as a diagnostic (paid-plan inference) — runtime auth comes from the SDK, not the CLI.
  • API-key providers check the keychain via apiKeyStore.ts and then the ANTHROPIC_API_KEY / OPENAI_API_KEY / etc. env vars.
  • OAuth providers trigger the OAuth redirect flow in services/lanes/oauthRedirectService.ts.
  • Local providers (ollama, lmstudio) probe the configured endpoint for model availability.

Results feed into the UI's AiProviderConnectionStatus / AiRuntimeConnectionStatus (see providerConnectionStatus.ts).

Permission modes

Permission controls are provider-native. The session carries an abstract permissionMode alongside provider-native fields.

Claude

AgentChatClaudePermissionMode:

Mode Behavior
default Claude CLI built-in permission flow.
plan Read-only; writing/executing blocked.
acceptEdits Writes allowed; shell commands require approval.
bypassPermissions Proceed without asking.

Claude permission mode can be changed mid-session via the SDK (query.setPermissionMode(...)).

Codex

Two independent controls:

  • AgentChatCodexApprovalPolicy -- untrusted | on-request | on-failure | never.
  • AgentChatCodexSandbox -- read-only | workspace-write | danger-full-access.
  • AgentChatCodexConfigSource -- flags | config-toml. When config-toml, ADE defers both controls to the project's .codex/config.toml.

The chat adapter translates ADE's persisted kebab-case approval/sandbox values into the Codex app-server wire format at the JSON-RPC boundary: on-request -> onRequest, untrusted -> unlessTrusted, on-failure -> onFailure, and workspace-write -> workspaceWrite. Every thread/start and thread/resume call passes { model, cwd, reasoningEffort, ...codexPolicyArgs, ...codexServiceTierArgs(session), persistExtendedHistory: true }. The return envelope is consumed by applyCodexEffectiveThreadState, which normalizes approvalPolicy, sandbox (including the camel-case aliases readOnly / workspaceWrite / dangerFullAccess that the server emits), and reasoningEffort. That snapshot becomes the session state, so the picker chips always show what the runtime actually applied. On resume, the persisted chat state is re-written after normalization instead of being re-copied from the on-disk file — the server's reading of .codex/config.toml wins over a stale persisted pair. Turns use the Codex-native effort key (turn/start({ threadId, input, effort?, serviceTier? })) instead of the lifecycle reasoningEffort name.

Codex service tiers (Fast Mode)

ModelDescriptor.serviceTiers?: string[] advertises the optional service tiers a model accepts (today only "fast"). The composer's Fast toggle (a yellow Lightning chip next to the model picker) shows whenever modelSupportsFastMode(descriptor) is true for the selected model and the session provider is Codex. AgentChatSession carries codexFastMode?: boolean and the chat adapter forwards it as serviceTier: "fast" | null on every turn/start and thread/start JSON-RPC call (an explicit null clears any app-server default). The flag persists with the session, survives reload through PersistedChatState, and is forwarded to remote devices through the sync command service. Parallel-model rows track Fast mode per slot (ParallelModelRowState.codexFastMode) so launching multiple Codex runs side-by-side can mix Fast and Standard turns. The discovery layer populates serviceTiers from app-server-reported additionalSpeedTiers / serviceTiers rows; the static registry pre-marks the GPT 5.4 / 5.5 Codex CLI entries.

Codex plan mode uses the native app-server planning flow. ADE passes its runtime guidance as an ordinary system-context input item and keeps collaborationMode.settings.developer_instructions null, then turns completed Codex plan items (including <proposed_plan> wrappers) into ADE plan-approval requests. Accepting that request moves the session to edit/default mode and starts the implementation turn.

Default Codex chats map to the "Default permissions" preset (workspace-write + on-request). The older implicit fallback that mapped CLI edit mode to untrusted was removed so the first-turn picker state matches the documented default; the explicit Codex edit preset still resolves through the picker path.

OpenCode

AgentChatOpenCodePermissionMode:

Mode Behavior
plan Read-only.
edit Read/write allowed; bash gated.
full-auto Proceed without asking.

Cursor

Cursor modes (apps/desktop/src/shared/cursorModes.ts) are a list of configurable mode IDs; ADE stores a cursorModeSnapshot on the session carrying the current mode, available mode IDs, and selected config options.

Abstract-to-native mapping

AgentChatPermissionMode is default | plan | edit | full-auto | config-toml. providerOptions.ts exposes mapPermissionModeToNativeFields(), which translates the abstract value into the correct provider-native fields:

  • claude: claudePermissionMode = "default" | "auto" | "plan" | "acceptEdits" | "bypassPermissions". The auto mode hands permission decisions to the SDK's automatic gate and surfaces in the desktop and ade code permission pickers alongside the existing modes.
  • codex: codexApprovalPolicy + codexSandbox pair.
  • opencode: opencodePermissionMode = "plan" | "edit" | "full-auto".
  • droid: droidPermissionMode = "read-only" | "auto-low" | "auto-medium" | "auto-high".

The abstract field is persisted alongside the native fields so the UI can summarize session state consistently, and so legacy flows that only know about the abstract mode still work.

Interaction mode

AgentChatInteractionMode is default | plan. When plan, the agent operates in read-only planning mode and proposes changes via ExitPlanMode. Approving the plan transitions the session to edit permission mode automatically. In bypassPermissions or full-auto permission modes, plan approval auto-grants (no UI), since the user has opted out of permission gates.

When the user approves an ExitPlanMode call, the canUseTool handler returns { behavior: "allow", updatedInput: input } so the SDK's native ExitPlanMode handler runs, restores the pre-plan permission mode from its toolPermissionContext.prePlanMode, and emits a normal tool_result back to the model. ADE additionally calls setPermissionMode defensively so the SDK and ADE agree on the target mode even if the SDK's restore path no-ops, but the SDK is still the source of truth. (Previously we returned behavior: "deny" to dodge a ZodError in the SDK's input schema; that is no longer necessary and the deny path made the model hesitate after a "denied" tool call.)

Model selection flow

  1. User picks a model in ProviderModelSelector (under renderer/components/shared/).
  2. Renderer resolves a ModelDescriptor via getModelById / resolveModelDescriptorForProvider.
  3. The descriptor determines the provider (providerRoute), routing module, and default reasoning tier.
  4. createSession(args) creates the session with both the descriptor's shortId as model and its canonical id as modelId.
  5. The service resolves the correct adapter and spawns the runtime.

For Claude, resolveClaudeCliModel() translates the descriptor into the CLI's expected model token. For Codex, codexAppServerConfig.ts builds the app-server startup options.

Model switching mid-session

chatModelSwitching.ts rules:

  • ChatModelSwitchPolicy is either "same-family-after-launch" or "any-after-launch".
  • canSwitchChatSessionModel(session, targetDescriptor) returns true only when the policy permits. CTO and persistent-identity sessions default to "any-after-launch"; regular chat defaults to "same-family-after-launch" to avoid spurious handoffs.
  • filterChatModelIdsForSession(ids, session) filters the model picker to the models the user may switch to without triggering a handoff.

Changing models triggers a handoff (handoffSession), which splits into two strategies depending on whether the source and target both run on the Claude Agent SDK:

  1. Fork (Claude → Claude). When both ends are Claude runtimes, the service pins the source sdkSessionId as the new session's forkFromSdkSessionId and starts the next query() with options.forkSession = true. The SDK forks the SDK session graph server-side so the new chat keeps the full conversation and tool history without a summary round-trip. forkFromSdkSessionId is persisted through PersistedChatState and re-applied on resume so forked descendants survive app restart.
  2. Brief (cross-runtime). When the target leaves the Claude family (or the source is non-Claude), the service falls back to a 12-message handoff brief built by generateHandoffBrief(): summarize the current session, end it gracefully, create a new session with the target model, and inject the brief as a continuity message. buildDeterministicHandoffBrief() provides a deterministic fallback when the LLM summarization call fails or no eligible summarizer is available; AgentChatHandoffResult.usedFallbackSummary surfaces which path was taken.

Auto-title generation

Sessions auto-title through two stages when ai.sessionIntelligence.titles.enabled is true and the runtime is not guest:

  • Initial -- generated early in the conversation from the first user message, providing an immediate label while the session is still brief.
  • Final -- generated once enough transcript has accumulated, producing a more accurate title.

ai.sessionIntelligence.titles.refreshOnComplete (default true) triggers a final refresh after a turn completes.

Manual renaming sets manuallyNamed: true, which permanently suppresses further auto-title generation.

CTO vs. regular chat routing

CTO sessions (identityKey: "cto") are routed differently:

  1. sessionProfile: "persistent_identity" drives a distinct ChatSurfaceProfile in the UI.
  2. Core memory is reconstructed from ctoStateService on session start and re-injected via buildReconstructionContext().
  3. The CTO system prompt includes the immutable CTO doctrine, memory operating model, environment knowledge, and active personality overlay (CtoPersonalityPreset). See ctoStateService.ts.
  4. Extra tooling: CTO sessions receive ctoOperatorTools, Linear tools (if connected), and memoryUpdateCore.
  5. Guarded permission defaults: Claude defaults to "default" (ask before dangerous ops); OpenCode defaults to "edit". full-auto is only applied when explicitly requested.

Worker sessions (identityKey: "agent:<id>") follow a similar pattern through AgentCoreMemory (same five fields) and the workerAgentService.

Fragile and tricky wiring

  • Dynamic model discovery mutates the registry. Local-model probes in localModelDiscovery.ts can add and remove descriptors. Callers that cache the registry must subscribe to the discovery emitter or re-read on each use.
  • Handoff requires context contract. handoffSession calls the summarizer with the current transcript plus the context contract from contextContract.ts. If the contract can't be resolved (e.g. missing lane context), the handoff falls back to a minimal summary and sets fallbackUsed: true.
  • Claude runtime readiness. claudeRuntimeProbe.ts verifies the bundled Claude Agent SDK binary and auth state before chat launch. Missing binary/auth readiness surfaces as CLAUDE_RUNTIME_AUTH_ERROR before the SDK query() stream is allowed to start.
  • Permission mapping is asymmetric. mapPermissionModeToNativeFields only handles the abstract-to-native direction. The reverse (native-to-abstract) requires provider-specific logic; switching a provider-native field without also updating the abstract field leaves them out of sync.
  • Claude post-compaction re-injection. When a CTO or worker session undergoes context compaction, the service must call refreshReconstructionContext() to re-inject identity. Losing this strips persona mid-session and results in the agent forgetting it is the CTO.
  • OAuth redirect ports. oauthRedirectService.ts binds to an ephemeral port and writes the URI into the provider config. If another process grabs that port between detection and callback, the OAuth flow fails silently from the user's perspective.

Related docs