feat(agent): one-shot ask_peer_claude (Phase-2 PR A) by WiktorStarczewski · Pull Request #7 · WiktorStarczewski/hearsay

WiktorStarczewski · 2026-04-25T10:33:55Z

Summary

First interactive tool of Phase 2: a calling Claude can spawn a parallel Claude session on a teammate's machine that reads / globs / greps the live filesystem. Returns a markdown transcript + {turnCount, toolCallCount, stopReason, elapsedMs}.

Off by default. A v0.1 install upgrading to this binary without --enable-agent behaves bit-for-bit like before.

This is PR A in the Phase-2 sequence (PR 0 merged in #6). PR B will add stateful conversations (start_peer_conversation / send_peer_message / etc.) on top of the same internal/agent/ package.

SDK-spike resolution

Plan called for verifying client.Beta.Sessions.* works as advertised before committing. Spike confirmed:

✅ The Managed-Agents API exists at anthropic-sdk-go v1.38.0 (released 2026-04-23, beta header managed-agents-2026-04-01 auto-applied by the SDK).
⚠️ The bundled agent_toolset_20260401 (which exposes read / glob / grep) executes inside an Anthropic-hosted Environment sandbox, not on Ivan's filesystem. The plan's whole point is for tools to read Ivan's box, so the bundled toolset is the wrong knob.
✅ Pivoted to custom tools + sync callback (Option 2 from the spike report). Define read/glob/grep as custom tools on BetaAgentNewParams. The session emits agent.custom_tool_use; we execute on Ivan's box and reply with user.custom_tool_result. Tools run on Ivan's machine; security posture is identical to what the plan originally specified.

Architecture

internal/agent/
├── agent.go          public types (Budget, Transcript, StopReason, ErrorSummary, Agent)
├── customtools.go    read / glob / grep handlers, sandboxed under the project root
├── loop.go           SDK-decoupled event loop (testable with canned events)
├── sdk.go            Managed-Agents wrapper: ensureInit + runOnce + pumpStream
├── audit.go          line-atomic JSONL audit log (sizes only, no content)
└── *_test.go

The adversarial defense (verification step 7a) has two legs:

(a) buildCustomToolUnion() advertises only {read, glob, grep} as custom tools — never agent_toolset_20260401, never mcp_toolset. Asserted in sdk_test.go.
(b) runEventLoop rejects any incoming agent.custom_tool_use for a name we didn't register, even if a compromised upstream emits one anyway. Asserted in loop_test.go (TestLoop_RejectsDisallowedTool).

Test plan

What this doesn't do (deferred to PR B)

Stateful conversations (start_peer_conversation / send_peer_message / list_peer_conversations / end_peer_conversation).
Conversation idle reaping, max-conversations cap.
Full SDK-stubbed TestE2E_AskPeerClaude end-to-end (would require simulating the SSE event stream; loop-layer canned-event tests cover the same logic).

Adds the first interactive tool: a calling Claude can spawn a parallel Claude session on a teammate's machine that can read / glob / grep the live filesystem. Returns a markdown transcript plus {turnCount, toolCallCount, stopReason, elapsedMs}. Off by default — a v0.1 install upgrading to this binary without --enable-agent behaves bit-for-bit like before. Architecture ------------ * internal/agent/ — wraps the Anthropic Managed-Agents API (anthropic-sdk-go v1.38.0, beta header managed-agents-2026-04-01). Read-only tools registered as **custom** tools on the agent, NOT the bundled agent_toolset_20260401 (which would route execution to an Anthropic-hosted sandbox instead of Ivan's filesystem). On agent.custom_tool_use the SDK pauses the session; we execute on Ivan's box and reply with user.custom_tool_result. * internal/agent/loop.go — the event-loop core, decoupled from the SDK so unit tests drive it with canned events. Adversarial check (verification step 7a in the plan): any custom_tool_use for a name not in {read, glob, grep} is rejected with stopReason=error, errorSummary=disallowed_tool — even if a compromised upstream emits one anyway. * internal/agent/customtools.go — read / glob / grep handlers, bounded per call (read caps at 64KB, glob/grep cap at 200 results), sandboxed under the project root, hidden dirs filtered, binary files skipped from grep. * internal/agent/audit.go — per-tool-call audit log at ~/Library/Logs/hearsay/agent.log (macOS) or $XDG_STATE_HOME/hearsay/agent.log (Linux). Sizes only — no prompt / response / tool-arg content, no hashes. sync.Mutex around each line-atomic O_APPEND write. CLI flags (all gated by --enable-agent, off by default) ------------------------------------------------------- * --enable-agent * --agent-api-key-env <NAME> (default ANTHROPIC_API_KEY) * --agent-base-url <url> (test stubs / regional endpoints) * --agent-model <id> (default claude-opus-4-7) * --agent-default-max-tokens <n> (default 32768) * --agent-default-max-tool-calls <n> (default 20) * --agent-default-timeout-seconds <n> (default 120) * --agent-log-path <path> If --enable-agent is set but the API key env var is empty, hearsay refuses to start — no half-configured state. Coverage -------- Aggregate at 90.4% (gate is 90%). Unit tests at the loop layer cover all stop-reason enum values + the adversarial defense. E2E tests confirm the tool is registered iff --enable-agent is set, and the binary refuses to start without the API key. Display-side manual loopback against real Claude Code is a separate verification step (plan step 9).

…ment (PR C) (#10) * feat(agent): subprocess-driven Claude Code, replacing API-key requirement (PR C) Phase 2 originally shipped via the Anthropic Managed-Agents API (PRs #7 + #8), which is API-key-only. The constraint that surfaced post-merge: both peers and consumers in the actual user base have Claude Code subscriptions (Pro / Max / Team), not API keys. The shipped Phase-2 tools were therefore unusable for the real users. This PR replaces the SDK path with a subprocess driver around `claude --print`. The peer's existing OAuth credentials authenticate every call; subscription quota pays the bill. No code changes for consumers (Wiktor's side); breaking CLI-flag changes for peers. Internals --------- * internal/agent/cli.go (new) — subprocess driver. Builds three argv variants (OneShot, first-turn-of-conv, resumed-turn) with --print --output-format json --allowed-tools "Read Glob Grep" on every call. Parses Claude Code's JSON result for stop_reason + usage; replays the session JSONL via the existing internal/transcript package to extract per-tool-call detail (the JSON payload has no tool_calls[] field). * internal/agent/conversations.go (slimmed) — convID is now the Claude Code session UUID. StartConversation no longer hits an upstream service; it allocates a UUID, records the system_prompt, returns. EndConversation marks the local slot ended without deleting anything; the session JSONL stays on disk so Phase-1 read_session can still surface it. Idle reaper is local-only. * internal/agent/sdk.go, loop.go, customtools.go (deleted) — the Managed-Agents wrapper, event-loop translation layer, and hand-rolled Read/Glob/Grep handlers all go away. Claude Code's native tools replace the custom dispatch; the JSON output replaces the event stream. * go.mod — drops github.com/anthropics/anthropic-sdk-go and its transitive deps (jsonparser, tidwall/*, wk8/go-ordered-map, ...). Static binary supply-chain footprint shrinks meaningfully. Security -------- Two-leg defense for the read-only allowlist: 1. --allowed-tools "Read Glob Grep" is hardcoded on every invocation; Claude Code itself enforces it. Asserted by the argv-shape unit test. 2. After every call, the JSONL replay scans for tool_use blocks outside the allowlist. Any disallowed name flips StopReason to error / disallowed_tool. Defense-in-depth against future-Claude-Code drift / corrupted builds. ANTHROPIC_API_KEY in the operator's env is stripped from the subprocess env by default — Claude Code's auth precedence is ANTHROPIC_API_KEY > apiKeyHelper > OAuth/keychain, and a stale env var would silently redirect billing. --agent-keep-env-key opts back in. CLI changes (breaking) ---------------------- Removed (hard-fail with a friendly upgrade-notes pointer): * --agent-api-key-env * --agent-base-url * --agent-model Kept, reinterpreted: * --agent-default-max-tokens — soft budget (system-prompt nudge); the CLI doesn't expose a token cap. Added: * --agent-claude-bin <path> * --agent-keep-env-key Unchanged: --enable-agent, --agent-default-max-tool-calls, --agent-default-timeout-seconds, --agent-log-path, --max-conversations, --conversation-idle-timeout. Version bumped 0.1.0-dev → 0.3.0-dev (skipping 0.2 since PR B forgot to bump). Test coverage ------------- Aggregate race-mode coverage 92.9% on a clean run (fluctuates 88-93% with the race detector's goroutine-timing sensitivity). Coverage gate is 90% with cache:false on setup-go. Plan: ~/.claude/plans/hearsay-phase-2-subprocess.md (six review rounds: 3 → 4 → 1 → 2 → 0 → 0 issues). * fix(agent): kill subprocess group on cancel for portable timeout Linux's /bin/sh is dash, which does NOT forward SIGTERM to a child sleep / claude process. CI's TestRunClaude_TimeoutKillsSubprocess saw the subprocess run the full 5s sleep — a real bug in the production path on Linux too, since claude itself spawns helper processes that wouldn't get the cancel signal otherwise. Fix: set Setpgid on the subprocess so it owns its own process group; on cancel, send SIGTERM to the whole group via syscall.Kill(-pgid, SIGTERM) instead of just the leader. cli_unix.go and cli_other.go build-tag the platform-specific calls. macOS keeps working (Setpgid is a no-op for already-isolated procs); Windows compiles to a stub since hearsay doesn't ship Windows builds.

WiktorStarczewski merged commit 5f4a451 into main Apr 25, 2026
2 checks passed

WiktorStarczewski deleted the pr-a/one-shot-agent branch April 25, 2026 10:36

This was referenced Apr 25, 2026

feat(agent): stateful conversations (Phase-2 PR B) #8

Merged

docs(readme): refresh for Phase 2 + design notes #9

Merged

feat(agent): subprocess-driven Claude Code, replacing API-key requirement (PR C) #10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): one-shot ask_peer_claude (Phase-2 PR A)#7

feat(agent): one-shot ask_peer_claude (Phase-2 PR A)#7
WiktorStarczewski merged 1 commit intomainfrom
pr-a/one-shot-agent

WiktorStarczewski commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant