feat(agent): stateful conversations (Phase-2 PR B) by WiktorStarczewski · Pull Request #8 · WiktorStarczewski/hearsay

WiktorStarczewski · 2026-04-25T10:53:38Z

Summary

Last in the Phase-2 sequence (PR 0 #6, PR A #7, this is PR B). Adds the four conversation-lifecycle MCP tools that build on PR A's one-shot agent — multi-turn conversations with state held server-side on the SDK session, plus an in-process metadata map for listing / idle-reaping / cap enforcement.

Tool	Purpose
`start_peer_conversation`	Opens a stateful read-only conversation. Returns `{convId, startedAt, effectiveBudget}`.
`send_peer_message`	Drives one more turn on a convId. Per-turn budget overrides cascade through the conversation's defaults.
`list_peer_conversations`	Active conversations sorted by `lastActivityAt` desc, with rune-safe 140-char previews.
`end_peer_conversation`	Terminates a conversation and frees its slot. Idempotent.

All four are gated behind --enable-agent (off by default, same posture as PR A).

Internals

internal/agent/conversations.go — per-conv state map + mutex + idle reaper. Each conversation tracks startedAt, lastActivityAt, turnCount, perTurnBudget, project, system-prompt preview, first-user-prompt preview, and the EndReason enum (caller | idle_timeout | shutdown). The full message history lives server-side on the SDK session — the local map is metadata only.
New CLI flags: --max-conversations <n> (default 10), --conversation-idle-timeout <dur> (default 15m).
Reaper goroutine ticks at idle/4 (floored at 1s). Reaped conversations get EndedByIdleReap and the upstream session is best-effort-deleted. agent.Closer interface lets main.go shut the reaper down on SIGTERM.
OneShot refactored to build a transient conversation and route through runTurn so PR-A and PR-B share the SDK round-trip verbatim.

Concurrency

SendMessage takes the conversation's own mutex for the full turn — two callers racing on the same convId serialize.
The reaper takes convsMu then conv.mu; the reverse order is forbidden.
lookupAlive distinguishes alive / reaped / ended-by-caller, so the tool layer can produce the right user-facing message:
- reaped → "reaped after idle timeout; start a new one"
- ended → "unknown convId"
- missing → "unknown convId"

Test plan

go build ./... && go vet ./... — clean.
go test ./... — green across all 9 packages.
go test -race -coverpkg=./... -coverprofile=… — aggregate 90.7% (gate is 90%).
Conversation unit tests (conversations_test.go): listing sorted-by-LastActivityAt, ended-conv omitted, system-prompt fallback preview, idempotent re-end preserves original EndedReason, lookupAlive discriminates end modes, cap accounting, reaper marks stale, stopReaper is idempotent, truncateRunes is rune-safe.
SDK-stub-driven tests (sdk_translate_test.go): StartConversation succeeds against stub, cap enforcement (third start refused), invalid project rejected before SDK call; SendMessage known/unknown; EndConversation success + idempotent re-end; reaper actually reaps + deletes.
Tools-layer tests (tools_agent_test.go): all four conversation tools registered when ctx.Agent != nil, missing-arg validation, error-translation (ErrConvCap / ErrConvReaped / ErrUnknownConv → user-facing messages).
End-to-end tests (hearsay_e2e_test.go): all four tools present in the catalog, --max-conversations flag accepted, list_peer_conversations returns empty list on a fresh server.
Manual loopback (Phase-2 plan verification step 10) — covered separately. PR-B's automated tests prove the wiring; this PR doesn't bill the Anthropic API.

What this completes

PR 0 (read_tool_result content fix) + PR A (one-shot agent) + this PR = Phase 2 complete. Wiktor's Claude can now read teammates' past sessions (Phase 1) AND drive a stateful read-only agent on their box for primary-data investigations.

Adds the four conversation-lifecycle MCP tools that build on PR A's one-shot agent: * start_peer_conversation — opens a stateful read-only conversation, returns {convId, startedAt, effectiveBudget} * send_peer_message — drives one more turn against an existing convId (per-turn budget overrides cascade through the conversation's defaults) * list_peer_conversations — active conversations sorted by lastActivityAt desc, with rune-safe 140-char previews * end_peer_conversation — terminates a conversation and frees its slot; idempotent (re-end returns alreadyEnded=true) All four are gated behind --enable-agent. Tools are registered iff ctx.Agent is non-nil (matches PR A's gating). Internals --------- * `internal/agent/conversations.go` adds the per-conv state map + mutex + idle reaper. Each conversation tracks startedAt, lastActivityAt, turnCount, perTurnBudget, project, system_prompt preview, first user prompt preview, and the EndReason enum (caller | idle_timeout | shutdown). The full message history lives server-side on the SDK session — the local map is metadata only. * New CLI flags: --max-conversations <n> (default 10) --conversation-idle-timeout <dur> (default 15m) * The reaper runs as a background goroutine with a tick at idle/4 (floored at 1s). Reaped conversations get EndedByIdleReap and the upstream session is best-effort-deleted. agent.Closer interface lets main.go shut the reaper down cleanly on SIGTERM. * OneShot now builds a transient conversation and routes through runTurn (in conversations.go), so PR-A and PR-B share the SDK round-trip code path verbatim. Concurrency ----------- * SendMessage takes the conversation's own mutex for the full turn, so two callers racing on the same convId serialize. The reaper takes both convsMu and conv.mu in that order — locking direction forbidden in reverse. * lookupAlive distinguishes three end states: alive, reaped (idle), ended-by-caller — translating the latter two into ErrConvReaped vs ErrUnknownConv so the tool layer can produce the right user-facing message. Coverage -------- Aggregate at 90.7% with -race (gate is 90%). Unit tests cover state transitions (lifecycle, idle reap, cap enforcement, idempotent re- end, lookupAlive's three branches) without an SDK round-trip; stub-driven tests in sdk_translate_test.go exercise the SDK paths end-to-end (StartConversation success + cap, SendMessage with known/unknown convID, EndConversation with idempotent re-end, the reaper actually deleting).

The PR-B stub returned 5xx for the StreamEvents endpoint, which forced the SDK's stream layer to error out — the timing of that error vs the test's t.Context() expiry was non-deterministic under -race, so coverage occasionally dipped below the 90% gate (CI saw 65% on the previous push, local races between 60-90%). Switching to a proper-but-empty SSE response (HTTP 200, content-type text/event-stream, no events delivered) lets the stream close cleanly. pumpStream sees the channel close, the loop returns StopReasonEndTurn, the conv.lastActivityAt update happens deterministically — coverage reads 90.7% with -race three runs in a row.

Diagnosis: setup-go@v5's default cache key is hashed against go.sum only. When PR B inherited PR A's go.sum (no new deps), the CI cache restored PR A's instrumented test binaries. Three of the nine -coverpkg=./... test builds got stale cache hits — their cover blocks referenced the old printHelp source-line range (84-125) while six fresh builds referenced the new range (84-127). go tool cover -func treats the two ranges as separate blocks: the old one has 0 hits, the new one is fully covered, and printHelp drops from 100% to 33%. That cascade across cmd/hearsay/main.go funcs pushed the aggregate from 90.7% (local) to 65.1% (CI) on the same commit. Cache off makes CI a few seconds slower but eliminates the variance. Re-enable once we have a cache key that hashes source files in addition to go.sum.

…ment (PR C) (#10) * feat(agent): subprocess-driven Claude Code, replacing API-key requirement (PR C) Phase 2 originally shipped via the Anthropic Managed-Agents API (PRs #7 + #8), which is API-key-only. The constraint that surfaced post-merge: both peers and consumers in the actual user base have Claude Code subscriptions (Pro / Max / Team), not API keys. The shipped Phase-2 tools were therefore unusable for the real users. This PR replaces the SDK path with a subprocess driver around `claude --print`. The peer's existing OAuth credentials authenticate every call; subscription quota pays the bill. No code changes for consumers (Wiktor's side); breaking CLI-flag changes for peers. Internals --------- * internal/agent/cli.go (new) — subprocess driver. Builds three argv variants (OneShot, first-turn-of-conv, resumed-turn) with --print --output-format json --allowed-tools "Read Glob Grep" on every call. Parses Claude Code's JSON result for stop_reason + usage; replays the session JSONL via the existing internal/transcript package to extract per-tool-call detail (the JSON payload has no tool_calls[] field). * internal/agent/conversations.go (slimmed) — convID is now the Claude Code session UUID. StartConversation no longer hits an upstream service; it allocates a UUID, records the system_prompt, returns. EndConversation marks the local slot ended without deleting anything; the session JSONL stays on disk so Phase-1 read_session can still surface it. Idle reaper is local-only. * internal/agent/sdk.go, loop.go, customtools.go (deleted) — the Managed-Agents wrapper, event-loop translation layer, and hand-rolled Read/Glob/Grep handlers all go away. Claude Code's native tools replace the custom dispatch; the JSON output replaces the event stream. * go.mod — drops github.com/anthropics/anthropic-sdk-go and its transitive deps (jsonparser, tidwall/*, wk8/go-ordered-map, ...). Static binary supply-chain footprint shrinks meaningfully. Security -------- Two-leg defense for the read-only allowlist: 1. --allowed-tools "Read Glob Grep" is hardcoded on every invocation; Claude Code itself enforces it. Asserted by the argv-shape unit test. 2. After every call, the JSONL replay scans for tool_use blocks outside the allowlist. Any disallowed name flips StopReason to error / disallowed_tool. Defense-in-depth against future-Claude-Code drift / corrupted builds. ANTHROPIC_API_KEY in the operator's env is stripped from the subprocess env by default — Claude Code's auth precedence is ANTHROPIC_API_KEY > apiKeyHelper > OAuth/keychain, and a stale env var would silently redirect billing. --agent-keep-env-key opts back in. CLI changes (breaking) ---------------------- Removed (hard-fail with a friendly upgrade-notes pointer): * --agent-api-key-env * --agent-base-url * --agent-model Kept, reinterpreted: * --agent-default-max-tokens — soft budget (system-prompt nudge); the CLI doesn't expose a token cap. Added: * --agent-claude-bin <path> * --agent-keep-env-key Unchanged: --enable-agent, --agent-default-max-tool-calls, --agent-default-timeout-seconds, --agent-log-path, --max-conversations, --conversation-idle-timeout. Version bumped 0.1.0-dev → 0.3.0-dev (skipping 0.2 since PR B forgot to bump). Test coverage ------------- Aggregate race-mode coverage 92.9% on a clean run (fluctuates 88-93% with the race detector's goroutine-timing sensitivity). Coverage gate is 90% with cache:false on setup-go. Plan: ~/.claude/plans/hearsay-phase-2-subprocess.md (six review rounds: 3 → 4 → 1 → 2 → 0 → 0 issues). * fix(agent): kill subprocess group on cancel for portable timeout Linux's /bin/sh is dash, which does NOT forward SIGTERM to a child sleep / claude process. CI's TestRunClaude_TimeoutKillsSubprocess saw the subprocess run the full 5s sleep — a real bug in the production path on Linux too, since claude itself spawns helper processes that wouldn't get the cancel signal otherwise. Fix: set Setpgid on the subprocess so it owns its own process group; on cancel, send SIGTERM to the whole group via syscall.Kill(-pgid, SIGTERM) instead of just the leader. cli_unix.go and cli_other.go build-tag the platform-specific calls. macOS keeps working (Setpgid is a no-op for already-isolated procs); Windows compiles to a stub since hearsay doesn't ship Windows builds.

WiktorStarczewski added 3 commits April 25, 2026 12:53

WiktorStarczewski merged commit e7e12eb into main Apr 25, 2026
2 checks passed

WiktorStarczewski deleted the pr-b/stateful-conversations branch April 25, 2026 11:07

This was referenced Apr 25, 2026

docs(readme): refresh for Phase 2 + design notes #9

Merged

feat(agent): subprocess-driven Claude Code, replacing API-key requirement (PR C) #10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): stateful conversations (Phase-2 PR B)#8

feat(agent): stateful conversations (Phase-2 PR B)#8
WiktorStarczewski merged 3 commits intomainfrom
pr-b/stateful-conversations

WiktorStarczewski commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant