feat(agent): stateful conversations (Phase-2 PR B)#8
Merged
WiktorStarczewski merged 3 commits intomainfrom Apr 25, 2026
Merged
Conversation
Adds the four conversation-lifecycle MCP tools that build on PR A's
one-shot agent:
* start_peer_conversation — opens a stateful read-only conversation,
returns {convId, startedAt, effectiveBudget}
* send_peer_message — drives one more turn against an existing
convId (per-turn budget overrides cascade
through the conversation's defaults)
* list_peer_conversations — active conversations sorted by
lastActivityAt desc, with rune-safe
140-char previews
* end_peer_conversation — terminates a conversation and frees its
slot; idempotent (re-end returns
alreadyEnded=true)
All four are gated behind --enable-agent. Tools are registered iff
ctx.Agent is non-nil (matches PR A's gating).
Internals
---------
* `internal/agent/conversations.go` adds the per-conv state map +
mutex + idle reaper. Each conversation tracks startedAt,
lastActivityAt, turnCount, perTurnBudget, project, system_prompt
preview, first user prompt preview, and the EndReason enum (caller
| idle_timeout | shutdown). The full message history lives
server-side on the SDK session — the local map is metadata only.
* New CLI flags:
--max-conversations <n> (default 10)
--conversation-idle-timeout <dur> (default 15m)
* The reaper runs as a background goroutine with a tick at idle/4
(floored at 1s). Reaped conversations get EndedByIdleReap and the
upstream session is best-effort-deleted. agent.Closer interface
lets main.go shut the reaper down cleanly on SIGTERM.
* OneShot now builds a transient conversation and routes through
runTurn (in conversations.go), so PR-A and PR-B share the SDK
round-trip code path verbatim.
Concurrency
-----------
* SendMessage takes the conversation's own mutex for the full turn,
so two callers racing on the same convId serialize. The reaper
takes both convsMu and conv.mu in that order — locking direction
forbidden in reverse.
* lookupAlive distinguishes three end states: alive, reaped (idle),
ended-by-caller — translating the latter two into ErrConvReaped vs
ErrUnknownConv so the tool layer can produce the right user-facing
message.
Coverage
--------
Aggregate at 90.7% with -race (gate is 90%). Unit tests cover state
transitions (lifecycle, idle reap, cap enforcement, idempotent re-
end, lookupAlive's three branches) without an SDK round-trip;
stub-driven tests in sdk_translate_test.go exercise the SDK paths
end-to-end (StartConversation success + cap, SendMessage with
known/unknown convID, EndConversation with idempotent re-end, the
reaper actually deleting).
The PR-B stub returned 5xx for the StreamEvents endpoint, which forced the SDK's stream layer to error out — the timing of that error vs the test's t.Context() expiry was non-deterministic under -race, so coverage occasionally dipped below the 90% gate (CI saw 65% on the previous push, local races between 60-90%). Switching to a proper-but-empty SSE response (HTTP 200, content-type text/event-stream, no events delivered) lets the stream close cleanly. pumpStream sees the channel close, the loop returns StopReasonEndTurn, the conv.lastActivityAt update happens deterministically — coverage reads 90.7% with -race three runs in a row.
Diagnosis: setup-go@v5's default cache key is hashed against go.sum only. When PR B inherited PR A's go.sum (no new deps), the CI cache restored PR A's instrumented test binaries. Three of the nine -coverpkg=./... test builds got stale cache hits — their cover blocks referenced the old printHelp source-line range (84-125) while six fresh builds referenced the new range (84-127). go tool cover -func treats the two ranges as separate blocks: the old one has 0 hits, the new one is fully covered, and printHelp drops from 100% to 33%. That cascade across cmd/hearsay/main.go funcs pushed the aggregate from 90.7% (local) to 65.1% (CI) on the same commit. Cache off makes CI a few seconds slower but eliminates the variance. Re-enable once we have a cache key that hashes source files in addition to go.sum.
This was referenced Apr 25, 2026
WiktorStarczewski
added a commit
that referenced
this pull request
Apr 25, 2026
…ment (PR C) (#10) * feat(agent): subprocess-driven Claude Code, replacing API-key requirement (PR C) Phase 2 originally shipped via the Anthropic Managed-Agents API (PRs #7 + #8), which is API-key-only. The constraint that surfaced post-merge: both peers and consumers in the actual user base have Claude Code subscriptions (Pro / Max / Team), not API keys. The shipped Phase-2 tools were therefore unusable for the real users. This PR replaces the SDK path with a subprocess driver around `claude --print`. The peer's existing OAuth credentials authenticate every call; subscription quota pays the bill. No code changes for consumers (Wiktor's side); breaking CLI-flag changes for peers. Internals --------- * internal/agent/cli.go (new) — subprocess driver. Builds three argv variants (OneShot, first-turn-of-conv, resumed-turn) with --print --output-format json --allowed-tools "Read Glob Grep" on every call. Parses Claude Code's JSON result for stop_reason + usage; replays the session JSONL via the existing internal/transcript package to extract per-tool-call detail (the JSON payload has no tool_calls[] field). * internal/agent/conversations.go (slimmed) — convID is now the Claude Code session UUID. StartConversation no longer hits an upstream service; it allocates a UUID, records the system_prompt, returns. EndConversation marks the local slot ended without deleting anything; the session JSONL stays on disk so Phase-1 read_session can still surface it. Idle reaper is local-only. * internal/agent/sdk.go, loop.go, customtools.go (deleted) — the Managed-Agents wrapper, event-loop translation layer, and hand-rolled Read/Glob/Grep handlers all go away. Claude Code's native tools replace the custom dispatch; the JSON output replaces the event stream. * go.mod — drops github.com/anthropics/anthropic-sdk-go and its transitive deps (jsonparser, tidwall/*, wk8/go-ordered-map, ...). Static binary supply-chain footprint shrinks meaningfully. Security -------- Two-leg defense for the read-only allowlist: 1. --allowed-tools "Read Glob Grep" is hardcoded on every invocation; Claude Code itself enforces it. Asserted by the argv-shape unit test. 2. After every call, the JSONL replay scans for tool_use blocks outside the allowlist. Any disallowed name flips StopReason to error / disallowed_tool. Defense-in-depth against future-Claude-Code drift / corrupted builds. ANTHROPIC_API_KEY in the operator's env is stripped from the subprocess env by default — Claude Code's auth precedence is ANTHROPIC_API_KEY > apiKeyHelper > OAuth/keychain, and a stale env var would silently redirect billing. --agent-keep-env-key opts back in. CLI changes (breaking) ---------------------- Removed (hard-fail with a friendly upgrade-notes pointer): * --agent-api-key-env * --agent-base-url * --agent-model Kept, reinterpreted: * --agent-default-max-tokens — soft budget (system-prompt nudge); the CLI doesn't expose a token cap. Added: * --agent-claude-bin <path> * --agent-keep-env-key Unchanged: --enable-agent, --agent-default-max-tool-calls, --agent-default-timeout-seconds, --agent-log-path, --max-conversations, --conversation-idle-timeout. Version bumped 0.1.0-dev → 0.3.0-dev (skipping 0.2 since PR B forgot to bump). Test coverage ------------- Aggregate race-mode coverage 92.9% on a clean run (fluctuates 88-93% with the race detector's goroutine-timing sensitivity). Coverage gate is 90% with cache:false on setup-go. Plan: ~/.claude/plans/hearsay-phase-2-subprocess.md (six review rounds: 3 → 4 → 1 → 2 → 0 → 0 issues). * fix(agent): kill subprocess group on cancel for portable timeout Linux's /bin/sh is dash, which does NOT forward SIGTERM to a child sleep / claude process. CI's TestRunClaude_TimeoutKillsSubprocess saw the subprocess run the full 5s sleep — a real bug in the production path on Linux too, since claude itself spawns helper processes that wouldn't get the cancel signal otherwise. Fix: set Setpgid on the subprocess so it owns its own process group; on cancel, send SIGTERM to the whole group via syscall.Kill(-pgid, SIGTERM) instead of just the leader. cli_unix.go and cli_other.go build-tag the platform-specific calls. macOS keeps working (Setpgid is a no-op for already-isolated procs); Windows compiles to a stub since hearsay doesn't ship Windows builds.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Last in the Phase-2 sequence (PR 0 #6, PR A #7, this is PR B). Adds the four conversation-lifecycle MCP tools that build on PR A's one-shot agent — multi-turn conversations with state held server-side on the SDK session, plus an in-process metadata map for listing / idle-reaping / cap enforcement.
start_peer_conversation{convId, startedAt, effectiveBudget}.send_peer_messagelist_peer_conversationslastActivityAtdesc, with rune-safe 140-char previews.end_peer_conversationAll four are gated behind
--enable-agent(off by default, same posture as PR A).Internals
internal/agent/conversations.go— per-conv state map + mutex + idle reaper. Each conversation tracksstartedAt,lastActivityAt,turnCount,perTurnBudget,project, system-prompt preview, first-user-prompt preview, and theEndReasonenum (caller|idle_timeout|shutdown). The full message history lives server-side on the SDK session — the local map is metadata only.--max-conversations <n>(default 10),--conversation-idle-timeout <dur>(default 15m).idle/4(floored at 1s). Reaped conversations getEndedByIdleReapand the upstream session is best-effort-deleted.agent.Closerinterface letsmain.goshut the reaper down on SIGTERM.conversationand route throughrunTurnso PR-A and PR-B share the SDK round-trip verbatim.Concurrency
SendMessagetakes the conversation's own mutex for the full turn — two callers racing on the same convId serialize.convsMuthenconv.mu; the reverse order is forbidden.lookupAlivedistinguishes alive / reaped / ended-by-caller, so the tool layer can produce the right user-facing message:Test plan
go build ./... && go vet ./...— clean.go test ./...— green across all 9 packages.go test -race -coverpkg=./... -coverprofile=…— aggregate 90.7% (gate is 90%).conversations_test.go): listing sorted-by-LastActivityAt, ended-conv omitted, system-prompt fallback preview, idempotent re-end preserves originalEndedReason,lookupAlivediscriminates end modes, cap accounting, reaper marks stale,stopReaperis idempotent,truncateRunesis rune-safe.sdk_translate_test.go):StartConversationsucceeds against stub, cap enforcement (third start refused), invalid project rejected before SDK call;SendMessageknown/unknown;EndConversationsuccess + idempotent re-end; reaper actually reaps + deletes.tools_agent_test.go): all four conversation tools registered whenctx.Agent != nil, missing-arg validation, error-translation (ErrConvCap/ErrConvReaped/ErrUnknownConv→ user-facing messages).hearsay_e2e_test.go): all four tools present in the catalog,--max-conversationsflag accepted,list_peer_conversationsreturns empty list on a fresh server.What this completes
PR 0 (read_tool_result content fix) + PR A (one-shot agent) + this PR = Phase 2 complete. Wiktor's Claude can now read teammates' past sessions (Phase 1) AND drive a stateful read-only agent on their box for primary-data investigations.