Promote 7 experimental packages to stable paths by myzie · Pull Request #141 · deepnoodle-ai/dive

myzie · 2026-04-12T19:19:18Z

Summary

Move subagent, compaction, todo, a2a, toolkit/firecrawl, toolkit/google, and toolkit/kagi from experimental/ to top-level stable paths
Update all import paths across the main module, experimental/cmd/dive, and examples
Move documentation guides from docs/guides/experimental/ to docs/guides/ for promoted packages
Fix production-readiness issues found during audit:
- toolkit/google & toolkit/kagi: Data race — baseURL was a package-level var mutated by option functions; moved to Client struct field. Retry sleeps now respect context cancellation.
- toolkit/kagi: Renamed stuttering types (KagiClient → Client, WithKagiAPIKey → WithAPIKey, etc.)
- toolkit/google: Search no longer mutates caller's input struct. Removed dead commented-out code.
- a2a/server.go: Handler() allocated a new http.ServeMux per call; now built once and cached.
- subagent/loader.go: Consolidated double os.Getwd() call. MapLoader.Load() returns a defensive copy.

settings deferred from this batch — it imports experimental/sandbox which is staying experimental, and nothing currently imports it.

Test plan

go vet ./... clean
go test ./... all passing
cd experimental/cmd/dive && go build succeeds
cd examples && go vet ./... clean
No stale experimental/ references in code, guides, or README (historical PRD/plan docs left as-is)
docs/guides/experimental/ only contains still-experimental guides (mcp, sandboxing, media-generation)

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- New Todo package: TodoWrite tool and a Todo extension with automated reminders and state persistence.
Improvements
- Promoted A2A, Compaction, Sub-Agents, and Todo to stable top-level packages.
- Google and Kagi search clients now use per-client configuration and backoff that respects cancellation.
- Firecrawl fetch/parsing and error handling hardened.
- A2A server now uses a cached request handler for more efficient dispatch.
Tests
- Expanded tests for Firecrawl, Todo tool/extension, and compaction.
Documentation
- Guides updated to reflect package reorganization and experimental vs. stable labeling.

Move subagent, compaction, todo, a2a, toolkit/firecrawl, toolkit/google, and toolkit/kagi from experimental/ to top-level stable paths. Update all import paths, documentation, and CLAUDE.md. Fix production-readiness issues: data races in google/kagi search clients (package-level baseURL mutation), input mutation in google Search, kagi type name stuttering, a2a handler allocation per request, subagent MapLoader returning internal map. Settings deferred from this batch due to experimental/sandbox dependency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-12T19:19:46Z

📝 Walkthrough

Walkthrough

Promotes several experimental packages to top-level (a2a, compaction, subagent, todo), updates imports/docs, introduces a new todo package and Extension with reminder/state hooks, adds todo/compaction integration tests, caches A2A server handler, and refactors toolkit clients (per-client baseURLs and cancellation-aware backoff); hardens Firecrawl handling.

Changes

Cohort / File(s)	Summary
Docs & Guides `CLAUDE.md`, `docs/README.md`, `docs/guides/a2a.md`, `docs/guides/agents.md`, `docs/guides/compaction.md`, `docs/guides/hooks.md`, `docs/guides/subagents.md`, `docs/guides/todo-lists.md`	Re-scoped documentation from `experimental/*` to top-level packages; marked promoted vs experimental guides; updated navigation and examples.
A2A package & examples `a2a/doc.go`, `a2a/a2a_test.go`, `a2a/interop_test.go`, `examples/a2a_example/main.go`	Updated package docs and imports to `github.com/deepnoodle-ai/dive/a2a`; tests and examples now reference stable `a2a`.
A2A server handler caching `a2a/server.go`	Cache built `http.Handler` at `NewServer` via `buildHandler()`; `Handler()` returns cached handler; `ServeHTTP` dispatches to cached handler.
Experimental CLI & toolkit imports `experimental/cmd/dive/.go`, `experimental/cmd/dive/_test.go`, `experimental/toolkit/extended/task.go`, `experimental/toolkit/extended/task_test.go`	Updated imports to use stable `compaction`, `subagent`, `todo`, and `dive/toolkit/*` locations; adjusted call sites accordingly.
Removed experimental extended todo `experimental/toolkit/extended/todo.go`, `experimental/toolkit/extended/todo_test.go`	Deleted prior in-memory `TodoWrite` tool and its tests from experimental extended toolkit.
New todo package & extension `todo/tool.go`, `todo/extension.go`, `todo/state.go`, `todo/extension_test.go`, `todo/tool_test.go`, `todo/extension_integration_test.go`	Added `todo` package: `TodoWrite` typed tool, `WriteInput`, `ToolName`, state persistence (`StateBlock`/LatestState), and `Extension` wiring PreGeneration/PreIteration/PostToolUse hooks; extensive unit and integration tests.
Subagent loader changes `subagent/loader.go`	Adjusted directory resolution when including Claude agents; `MapLoader.Load` now returns a defensive copy of definitions to prevent external mutation.
Compaction changes & tests `compaction/compaction.go`, `compaction/compaction_test.go`, `compaction/todo_integration_test.go`	`CompactMessages` now preserves appended todo state block in generated summary; added unit and integration tests validating state preservation.
Toolkit — Google client `toolkit/google/search.go`	Removed package-level mutable `baseURL`/`SetBaseURL`; added per-Client `baseURL` and `WithBaseURL`; made 429 backoff cancellation-aware; adjusted pagination/limit handling; removed SetBaseURL API.
Toolkit — Kagi client `toolkit/kagi/kagi.go`	Refactored to per-client config: `KagiClient`→`Client`, new `ClientOption`/`WithAPIKey`/`WithBaseURL`/`WithHTTPClient`, per-client `baseURL`, cancellation-aware retries, and renamed internal response types.
Toolkit — Firecrawl `toolkit/firecrawl/firecrawl.go`, `toolkit/firecrawl/firecrawl_types.go`, `toolkit/firecrawl/firecrawl_test.go`	Changed request format mapping (`raw_html`→`rawHtml`), pruned forwarded defaults, tightened response validation (fail when success==false or data==nil), expanded HTTP error mappings, added tolerant `keywordsField` unmarshal, and expanded tests.
Misc tests & examples various updated tests and examples	Added/updated tests (todo, compaction, firecrawl) and updated example/test import paths to stable packages.

Sequence Diagram(s)

sequenceDiagram
    participant Agent
    participant TodoExt as TodoExtension
    participant History as MessageHistory
    participant Tool as TodoWriteTool
    Agent->>TodoExt: PreGeneration/PreIteration hook
    TodoExt->>History: scan messages (newest→oldest) for <todo-state> / tool-use
    alt latest state found
        History-->>TodoExt: returns todos + turnsSinceWrite
        TodoExt->>TodoExt: compute staleness vs threshold
        alt stale
            TodoExt->>Agent: inject <system-reminder name="todos"> into first user message
        else recent
            TodoExt->>Agent: remove any existing reminder
        end
    else no state
        TodoExt->>Agent: ensure reminder removed
    end
    Agent->>Tool: may call TodoWrite tool during generation
    Tool-->>TodoExt: successful tool result captured by PostToolUse hook
    TodoExt->>Agent: append hidden <todo-state> block to AdditionalContext

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Draft: experimental A2A server + client (PRD-05 phase 1) #138: directly related — overlapping changes promoting a2a from experimental to stable (server/client/tests/examples).
Add Extension interface for composable agent capabilities #136: related — introduces the Extension/agent hook surfaces that todo/extension.go integrates with.
Improve API clarity and code organization #98: related — prior work on todo types and reorganization that this PR continues and replaces.

Poem

🐰 I hopped through imports, neat and light,

From experimental burrows to stable daylight.
Cached handlers hum, clients bow and wait,
Reminders whisper, todos annotate.
A tiny thump for this tidy merge tonight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.99% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: promoting 7 experimental packages to stable paths, which is the primary objective of the PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch promote-experimental-packages

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

experimental/toolkit/extended/todo.go (1)

19-23: ⚠️ Potential issue | 🟡 Minor

Stale comment: update reference to canonical types location.

The comment says "canonical types are in experimental/todo" but they have been moved to the stable todo package.

📝 Proposed fix

-// Type aliases — canonical types are in experimental/todo
+// Type aliases — canonical types are in todo
 type (
 	TodoStatus = todo.TodoStatus
 	TodoItem   = todo.TodoItem
 )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@experimental/toolkit/extended/todo.go` around lines 19 - 23, The file comment
above the type aliases TodoStatus and TodoItem is stale; update it to reflect
that the canonical types now live in the stable todo package (not
experimental/todo). Edit the comment text near the type alias block (the line
above the type (...) declaration) to reference the stable "todo" package or
remove the "experimental" qualifier so it accurately documents that TodoStatus
and TodoItem are aliases to the canonical types in todo.

docs/guides/a2a.md (1)

386-393: ⚠️ Potential issue | 🟡 Minor

Fix contradiction about package stability in the boundaries section.

This section says A2A is stable, but then says both A2A and MCP live in experimental packages.

✏️ Suggested doc fix

-MCP support remains a separate story. MCP is for tools and data; A2A is
-for agents. They live in different experimental packages and are
-independently adoptable.
+MCP support remains a separate story. MCP is for tools and data; A2A is
+for agents. MCP currently lives under `experimental/`, while A2A is in
+stable `a2a/`; they are independently adoptable.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/guides/a2a.md` around lines 386 - 393, The paragraph contradicts itself
about stability; decide which is correct and make the wording consistent: either
(A) mark both a2a and MCP as experimental and state that a2a lives in an
experimental package that intentionally does not leak types into the stable dive
package (so *dive.Agent, *dive.Response, and dive.Session remain unchanged), or
(B) if a2a is intended to be stable, change the sentence that calls it
experimental to say MCP is experimental while clarifying that a2a’s public
surface does not force core-API changes. Update the sentences mentioning a2a,
dive, and MCP so they all reflect the chosen stability status consistently.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@docs/guides/a2a.md`:
- Around line 386-393: The paragraph contradicts itself about stability; decide
which is correct and make the wording consistent: either (A) mark both a2a and
MCP as experimental and state that a2a lives in an experimental package that
intentionally does not leak types into the stable dive package (so *dive.Agent,
*dive.Response, and dive.Session remain unchanged), or (B) if a2a is intended to
be stable, change the sentence that calls it experimental to say MCP is
experimental while clarifying that a2a’s public surface does not force core-API
changes. Update the sentences mentioning a2a, dive, and MCP so they all reflect
the chosen stability status consistently.

In `@experimental/toolkit/extended/todo.go`:
- Around line 19-23: The file comment above the type aliases TodoStatus and
TodoItem is stale; update it to reflect that the canonical types now live in the
stable todo package (not experimental/todo). Edit the comment text near the type
alias block (the line above the type (...) declaration) to reference the stable
"todo" package or remove the "experimental" qualifier so it accurately documents
that TodoStatus and TodoItem are aliases to the canonical types in todo.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1997a133-bef3-4f00-b648-cb6a82956b5a

📥 Commits

Reviewing files that changed from the base of the PR and between 7c0cdd2 and 8800614.

📒 Files selected for processing (40)

CLAUDE.md
a2a/a2a_test.go
a2a/client.go
a2a/doc.go
a2a/interop_test.go
a2a/remoteagent.go
a2a/rpc.go
a2a/server.go
a2a/task_store.go
a2a/types.go
compaction/compaction.go
compaction/compaction_test.go
compaction/hooks.go
docs/README.md
docs/guides/a2a.md
docs/guides/agents.md
docs/guides/compaction.md
docs/guides/hooks.md
docs/guides/subagents.md
docs/guides/todo-lists.md
examples/a2a_example/main.go
experimental/cmd/dive/app.go
experimental/cmd/dive/app_interactive_test.go
experimental/cmd/dive/app_test.go
experimental/cmd/dive/main.go
experimental/cmd/dive/render.go
experimental/toolkit/extended/task.go
experimental/toolkit/extended/task_test.go
experimental/toolkit/extended/todo.go
subagent/loader.go
subagent/subagent.go
subagent/subagent_test.go
todo/todo_tracker.go
todo/types.go
toolkit/firecrawl/firecrawl.go
toolkit/firecrawl/firecrawl_test.go
toolkit/firecrawl/firecrawl_types.go
toolkit/google/search.go
toolkit/google/types.go
toolkit/kagi/kagi.go

- Translate wonton's "raw_html" format to Firecrawl's "rawHtml" (the v2 API rejects snake_case format names). - Drop the redundant "v2 specific defaults" block (blockAds, removeBase64Images, proxy: auto, storeInCache); these match Firecrawl's own defaults and forcing them removed any future ability for callers to override. - Map metadata.keywords (which Firecrawl returns as either a string or array) into fetch.Metadata.Keywords via a custom unmarshaler. - Nil-check scrapeResp.Data and Data.Metadata before dereferencing. - Add 400, 401, 404, and 502/503/504 to the friendly error branches. - Cover the new behavior with unit tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The TodoWrite tool was the last piece left in experimental/toolkit/extended/ after the previous promotion batch. Move it to the stable todo/ package and restructure it the way the rest of Dive's tools work — as a dive.Extension modeled after skill.Loader. Architecture changes: - Tool is stateless. The previous TodoWriteTool stored todos []TodoItem on its struct, which meant a single instance reused across sessions or agents bled state between them. The conversation message history is now the sole source of truth for the current list. GetTodos / GetCurrentTask are removed (zero callers in-tree outside the tool's own tests). - New todo.Extension implements dive.Extension. It returns the TodoWrite tool plus a PreGenerationHook that walks message history, finds the most recent TodoWrite tool_use, and injects a <system-reminder name="todos"> block via dive.SetSystemReminder when the model has not used the tool in the last N assistant turns (default 10, matching Claude Code's TODO_REMINDER_CONFIG.TURNS_SINCE_WRITE). When the model uses TodoWrite again, the next iteration removes the block. - Both tool and hook are fully stateless, so a single Extension instance is safe to share across agents, sessions, and subagents. No parallel store, no per-tool-instance bleed. This is structurally simpler than Claude Code's appState.todos[agentId ?? sessionId] map because Dive does not have a process-singleton app state to begin with. Behavioral changes: - Tool result text now matches Claude's canonical TodoWrite acknowledgment ("Todos have been modified successfully. Ensure that you continue to use the todo list to track your progress...") followed by a one-line progress summary, instead of returning a JSON counts blob. Claude is tuned on the canonical text shape. - Empty list with all tasks complete appends a one-line nudge to either write a fresh list or wrap up, but the list itself is retained (not auto-cleared as Claude Code does — Dive's persistence is more useful). API changes: - experimental.NewTodoWriteTool(...) -> todo.NewTool(...) or todo.New(...) - TodoWriteToolOptions{OnUpdate: fn} -> todo.WithOnUpdate(fn) / todo.WithExtensionOnUpdate(fn) - TodoWriteInput -> todo.WriteInput Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

todo/tool_test.go (1)
36-137: Add a regression case for multiple in_progress todos.

Given the tool contract says only one active item at a time, please add a test that submits two in_progress items and asserts an error result. This will lock behavior and prevent future regressions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@todo/tool_test.go` around lines 36 - 137, Add a new subtest inside
TestTool_Call named "MultipleInProgress" that constructs a NewTool(), builds a
WriteInput with two TodoItem entries both using TodoStatusInProgress, calls
tool.Call(ctx, input), and asserts the call returns no error but result.IsError
is true and result.Content[0].Text contains the expected validation message
(e.g., "only one active item" or "only one active item at a time") to lock the
behavior that only one in_progress item is allowed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@todo/extension.go`:
- Around line 142-152: The loop over msg.Content currently scans forward and may
return an older TodoWrite call; change the iteration in the block that inspects
msg.Content (looking for *llm.ToolUseContent with tu.Name == ToolName and
calling parseTodoInput) to iterate from the end to the start (for i :=
len(msg.Content)-1; i >= 0; i--) so the function picks the latest TodoWrite
within the same assistant message and returns parsed, turnsSince, true for the
first matching (most recent) tool use.

In `@todo/tool.go`:
- Around line 148-158: Both PreviewCall and Call dereference input without
checking for nil; add a nil guard at the start of PreviewCall and Call to return
an appropriate error/result when input == nil. In PreviewCall (func (t *Tool)
PreviewCall) check input == nil and return a ToolCallPreview that indicates the
missing payload (or an error preview) instead of calling countByStatus; in Call
(func (t *Tool) Call) check input == nil and return
dive.NewToolResultError("todos array is required") (or a clearer error message)
to avoid panics when accessing input.Todos. Ensure both guards run before any
access to input or calling countByStatus.
- Around line 160-170: The loop over input.Todos currently validates each item's
fields but does not enforce the rule that only one todo may be in the
TodoStatusInProgress state; add validation (after or during the loop) that
counts occurrences of TodoStatusInProgress in input.Todos and if count > 1
return dive.NewToolResultError with a clear message like "only one todo item may
have status 'in_progress'". Use the existing input.Todos slice and the
TodoStatusInProgress constant to locate the check and return the error in the
same style as the other validations.
- Around line 107-135: The Item object in Tool.Schema() lacks a Required list so
the LLM can't rely on schema validation; update the Items property inside
Schema() (the nested *schema.Property under "todos" with Type "object") to
include Required: []string{"content","status","activeForm"} so each todo item
enforces those fields during validation; locate the Items property in the
Schema() method and add the Required slice to that nested object property.

---

Nitpick comments:
In `@todo/tool_test.go`:
- Around line 36-137: Add a new subtest inside TestTool_Call named
"MultipleInProgress" that constructs a NewTool(), builds a WriteInput with two
TodoItem entries both using TodoStatusInProgress, calls tool.Call(ctx, input),
and asserts the call returns no error but result.IsError is true and
result.Content[0].Text contains the expected validation message (e.g., "only one
active item" or "only one active item at a time") to lock the behavior that only
one in_progress item is allowed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 63acf304-592f-402f-a042-d3b9a53e964f

📥 Commits

Reviewing files that changed from the base of the PR and between 9fd8f37 and 6b6d547.

📒 Files selected for processing (8)

CLAUDE.md
docs/guides/todo-lists.md
experimental/toolkit/extended/todo.go
experimental/toolkit/extended/todo_test.go
todo/extension.go
todo/extension_test.go
todo/tool.go
todo/tool_test.go

💤 Files with no reviewable changes (2)

experimental/toolkit/extended/todo_test.go
experimental/toolkit/extended/todo.go

✅ Files skipped from review due to trivial changes (2)

CLAUDE.md
docs/guides/todo-lists.md

coderabbitai · 2026-04-12T19:58:23Z

+		for _, c := range msg.Content {
+			tu, ok := c.(*llm.ToolUseContent)
+			if !ok || tu.Name != ToolName {
+				continue
+			}
+			parsed, err := parseTodoInput(tu.Input)
+			if err != nil {
+				continue
+			}
+			return parsed, turnsSince, true
+		}


⚠️ Potential issue | 🟡 Minor

Pick the latest TodoWrite call within the same assistant message.

Line 142 iterates content in forward order. If one assistant message contains multiple TodoWrite tool uses, this can return an older call from that message. Iterate message content from end to start.

🔧 Suggested fix

- for _, c := range msg.Content { - tu, ok := c.(*llm.ToolUseContent) + for j := len(msg.Content) - 1; j >= 0; j-- { + tu, ok := msg.Content[j].(*llm.ToolUseContent) if !ok || tu.Name != ToolName { continue } parsed, err := parseTodoInput(tu.Input) if err != nil { continue } return parsed, turnsSince, true }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for _, c := range msg.Content {

tu, ok := c.(*llm.ToolUseContent)

if !ok || tu.Name != ToolName {

continue

}

parsed, err := parseTodoInput(tu.Input)

if err != nil {

continue

}

return parsed, turnsSince, true

}

for j := len(msg.Content) - 1; j >= 0; j-- {

tu, ok := msg.Content[j].(*llm.ToolUseContent)

if !ok || tu.Name != ToolName {

continue

}

parsed, err := parseTodoInput(tu.Input)

if err != nil {

continue

}

return parsed, turnsSince, true

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@todo/extension.go` around lines 142 - 152, The loop over msg.Content currently scans forward and may return an older TodoWrite call; change the iteration in the block that inspects msg.Content (looking for *llm.ToolUseContent with tu.Name == ToolName and calling parseTodoInput) to iterate from the end to the start (for i := len(msg.Content)-1; i >= 0; i--) so the function picks the latest TodoWrite within the same assistant message and returns parsed, turnsSince, true for the first matching (most recent) tool use.

coderabbitai · 2026-04-12T19:58:23Z

+func (t *Tool) Schema() *schema.Schema {
+	return &schema.Schema{
+		Type:     "object",
+		Required: []string{"todos"},
+		Properties: map[string]*schema.Property{
+			"todos": {
+				Type:        "array",
+				Description: "The complete updated todo list",
+				Items: &schema.Property{
+					Type: "object",
+					Properties: map[string]*schema.Property{
+						"content": {
+							Type:        "string",
+							Description: "The task description in imperative form (e.g., 'Run tests')",
+						},
+						"status": {
+							Type:        "string",
+							Enum:        []any{"pending", "in_progress", "completed"},
+							Description: "The task status: pending, in_progress, or completed",
+						},
+						"activeForm": {
+							Type:        "string",
+							Description: "The task in present continuous form (e.g., 'Running tests')",
+						},
+					},
+				},
+			},
+		},
+	}


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "tool.go" -path "*/todo/*" -type f

Repository: deepnoodle-ai/dive

Length of output: 76

🏁 Script executed:

cat -n ./todo/tool.go

Repository: deepnoodle-ai/dive

Length of output: 9016

Add Required slice to Items property schema to align with validation.

Lines 161–169 reject todo items missing content, status, or activeForm, but the schema at lines 115–131 does not declare these fields as required on the Items object. This forces the LLM to learn the constraints through rejection loops rather than schema inspection.

♻️ Schema alignment fix

"todos": { Type: "array", Description: "The complete updated todo list", Items: &schema.Property{ Type: "object", + Required: []string{"content", "status", "activeForm"}, Properties: map[string]*schema.Property{

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (t *Tool) Schema() *schema.Schema {

return &schema.Schema{

Type: "object",

Required: []string{"todos"},

Properties: map[string]*schema.Property{

"todos": {

Type: "array",

Description: "The complete updated todo list",

Items: &schema.Property{

Type: "object",

Properties: map[string]*schema.Property{

"content": {

Type: "string",

Description: "The task description in imperative form (e.g., 'Run tests')",

},

"status": {

Type: "string",

Enum: []any{"pending", "in_progress", "completed"},

Description: "The task status: pending, in_progress, or completed",

},

"activeForm": {

Type: "string",

Description: "The task in present continuous form (e.g., 'Running tests')",

},

},

},

},

},

}

func (t *Tool) Schema() *schema.Schema {

return &schema.Schema{

Type: "object",

Required: []string{"todos"},

Properties: map[string]*schema.Property{

"todos": {

Type: "array",

Description: "The complete updated todo list",

Items: &schema.Property{

Type: "object",

Required: []string{"content", "status", "activeForm"},

Properties: map[string]*schema.Property{

"content": {

Type: "string",

Description: "The task description in imperative form (e.g., 'Run tests')",

},

"status": {

Type: "string",

Enum: []any{"pending", "in_progress", "completed"},

Description: "The task status: pending, in_progress, or completed",

},

"activeForm": {

Type: "string",

Description: "The task in present continuous form (e.g., 'Running tests')",

},

},

},

},

},

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@todo/tool.go` around lines 107 - 135, The Item object in Tool.Schema() lacks a Required list so the LLM can't rely on schema validation; update the Items property inside Schema() (the nested *schema.Property under "todos" with Type "object") to include Required: []string{"content","status","activeForm"} so each todo item enforces those fields during validation; locate the Items property in the Schema() method and add the Required slice to that nested object property.

coderabbitai · 2026-04-12T19:58:23Z

+func (t *Tool) PreviewCall(_ context.Context, input *WriteInput) *dive.ToolCallPreview {
+	pending, inProgress, completed, _ := countByStatus(input.Todos)
+	return &dive.ToolCallPreview{
+		Summary: fmt.Sprintf("Update todos: %d pending, %d in progress, %d completed",
+			pending, inProgress, completed),
+	}
+}
+
+func (t *Tool) Call(_ context.Context, input *WriteInput) (*dive.ToolResult, error) {
+	if input.Todos == nil {
+		return dive.NewToolResultError("todos array is required"), nil


⚠️ Potential issue | 🟠 Major

Guard nil input to avoid panics in preview/call paths.

Line 149 and Line 157 dereference input without a nil check. A nil payload can panic before producing a tool error result.

🛡️ Suggested nil guards

func (t *Tool) PreviewCall(_ context.Context, input *WriteInput) *dive.ToolCallPreview { + if input == nil { + return &dive.ToolCallPreview{ + Summary: "Update todos: 0 pending, 0 in progress, 0 completed", + } + } pending, inProgress, completed, _ := countByStatus(input.Todos) return &dive.ToolCallPreview{ Summary: fmt.Sprintf("Update todos: %d pending, %d in progress, %d completed", pending, inProgress, completed), } } func (t *Tool) Call(_ context.Context, input *WriteInput) (*dive.ToolResult, error) { + if input == nil { + return dive.NewToolResultError("input is required"), nil + } if input.Todos == nil { return dive.NewToolResultError("todos array is required"), nil }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (t *Tool) PreviewCall(_ context.Context, input *WriteInput) *dive.ToolCallPreview {

pending, inProgress, completed, _ := countByStatus(input.Todos)

return &dive.ToolCallPreview{

Summary: fmt.Sprintf("Update todos: %d pending, %d in progress, %d completed",

pending, inProgress, completed),

}

}

func (t *Tool) Call(_ context.Context, input *WriteInput) (*dive.ToolResult, error) {

if input.Todos == nil {

return dive.NewToolResultError("todos array is required"), nil

func (t *Tool) PreviewCall(_ context.Context, input *WriteInput) *dive.ToolCallPreview {

if input == nil {

return &dive.ToolCallPreview{

Summary: "Update todos: 0 pending, 0 in progress, 0 completed",

}

}

pending, inProgress, completed, _ := countByStatus(input.Todos)

return &dive.ToolCallPreview{

Summary: fmt.Sprintf("Update todos: %d pending, %d in progress, %d completed",

pending, inProgress, completed),

}

}

func (t *Tool) Call(_ context.Context, input *WriteInput) (*dive.ToolResult, error) {

if input == nil {

return dive.NewToolResultError("input is required"), nil

}

if input.Todos == nil {

return dive.NewToolResultError("todos array is required"), nil

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@todo/tool.go` around lines 148 - 158, Both PreviewCall and Call dereference input without checking for nil; add a nil guard at the start of PreviewCall and Call to return an appropriate error/result when input == nil. In PreviewCall (func (t *Tool) PreviewCall) check input == nil and return a ToolCallPreview that indicates the missing payload (or an error preview) instead of calling countByStatus; in Call (func (t *Tool) Call) check input == nil and return dive.NewToolResultError("todos array is required") (or a clearer error message) to avoid panics when accessing input.Todos. Ensure both guards run before any access to input or calling countByStatus.

coderabbitai · 2026-04-12T19:58:23Z

+	for i, item := range input.Todos {
+		if item.Content == "" {
+			return dive.NewToolResultError(fmt.Sprintf("todo[%d].content is required", i)), nil
+		}
+		if item.ActiveForm == "" {
+			return dive.NewToolResultError(fmt.Sprintf("todo[%d].activeForm is required", i)), nil
+		}
+		if item.Status != TodoStatusPending && item.Status != TodoStatusInProgress && item.Status != TodoStatusCompleted {
+			return dive.NewToolResultError(fmt.Sprintf("todo[%d].status must be 'pending', 'in_progress', or 'completed'", i)), nil
+		}
+	}


⚠️ Potential issue | 🟡 Minor

Enforce the “one in_progress item” contract.

The description/rules state only one active task at a time, but Line 160-170 accepts multiple in_progress items. This weakens consistency of the todo state.

✅ Suggested validation

func (t *Tool) Call(_ context.Context, input *WriteInput) (*dive.ToolResult, error) { if input.Todos == nil { return dive.NewToolResultError("todos array is required"), nil } + inProgressCount := 0 for i, item := range input.Todos { if item.Content == "" { return dive.NewToolResultError(fmt.Sprintf("todo[%d].content is required", i)), nil } if item.ActiveForm == "" { return dive.NewToolResultError(fmt.Sprintf("todo[%d].activeForm is required", i)), nil } if item.Status != TodoStatusPending && item.Status != TodoStatusInProgress && item.Status != TodoStatusCompleted { return dive.NewToolResultError(fmt.Sprintf("todo[%d].status must be 'pending', 'in_progress', or 'completed'", i)), nil } + if item.Status == TodoStatusInProgress { + inProgressCount++ + } } + if inProgressCount > 1 { + return dive.NewToolResultError("only one todo may be 'in_progress'"), nil + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for i, item := range input.Todos {

if item.Content == "" {

return dive.NewToolResultError(fmt.Sprintf("todo[%d].content is required", i)), nil

}

if item.ActiveForm == "" {

return dive.NewToolResultError(fmt.Sprintf("todo[%d].activeForm is required", i)), nil

}

if item.Status != TodoStatusPending && item.Status != TodoStatusInProgress && item.Status != TodoStatusCompleted {

return dive.NewToolResultError(fmt.Sprintf("todo[%d].status must be 'pending', 'in_progress', or 'completed'", i)), nil

}

}

func (t *Tool) Call(_ context.Context, input *WriteInput) (*dive.ToolResult, error) {

if input.Todos == nil {

return dive.NewToolResultError("todos array is required"), nil

}

inProgressCount := 0

for i, item := range input.Todos {

if item.Content == "" {

return dive.NewToolResultError(fmt.Sprintf("todo[%d].content is required", i)), nil

}

if item.ActiveForm == "" {

return dive.NewToolResultError(fmt.Sprintf("todo[%d].activeForm is required", i)), nil

}

if item.Status != TodoStatusPending && item.Status != TodoStatusInProgress && item.Status != TodoStatusCompleted {

return dive.NewToolResultError(fmt.Sprintf("todo[%d].status must be 'pending', 'in_progress', or 'completed'", i)), nil

}

if item.Status == TodoStatusInProgress {

inProgressCount++

}

}

if inProgressCount > 1 {

return dive.NewToolResultError("only one todo may be 'in_progress'"), nil

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@todo/tool.go` around lines 160 - 170, The loop over input.Todos currently validates each item's fields but does not enforce the rule that only one todo may be in the TodoStatusInProgress state; add validation (after or during the loop) that counts occurrences of TodoStatusInProgress in input.Todos and if count > 1 return dive.NewToolResultError with a clear message like "only one todo item may have status 'in_progress'". Use the existing input.Todos slice and the TodoStatusInProgress constant to locate the check and return the error in the same style as the other validations.

The todo.Extension now captures each successful TodoWrite as a hidden <todo-state> block on the tool_result message via a PostToolUse hook, so the stale-list reminder and downstream tooling can read authoritative state from conversation history instead of re-parsing raw tool_use input. compaction.CompactMessages carries the latest block forward into the summary message so todo state survives context rewriting. The state capture hook refuses to emit a block for failed tool results as a defensive guard against future dispatch refactors. Adds end-to-end integration tests covering multi-turn persistence, reminder self-clearing after a second write, empty-list handling, malformed-block fail-safe behavior, suspend/resume preservation, and compaction-driven state carry-forward. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds table cases for 400, 401, 404 and additional status codes to pin the client's HTTP error surface. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (3)

compaction/compaction.go (1)

212-226: Token estimate doesn't account for appended todo-state block.

The tokensAfter calculation on lines 222-223 only uses summaryPrefix + summaryText length, but if a <todo-state> block is appended (lines 214-216), this underestimates the actual token count. Consider including the state block length:

🔧 Suggested fix

-	// Step 7: Build compaction event
-	// TokensAfter is estimated from full summary message length (rough heuristic: ~4 chars per token)
-	fullSummaryLen := len(summaryPrefix) + len(summaryText)
+	// Step 7: Build compaction event
+	// TokensAfter is estimated from full summary message length (rough heuristic: ~4 chars per token)
+	fullSummaryLen := 0
+	for _, c := range summaryMsg.Content {
+		if tc, ok := c.(*llm.TextContent); ok {
+			fullSummaryLen += len(tc.Text)
+		}
+	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@compaction/compaction.go` around lines 212 - 226, The token estimate ignores
any appended todo-state block; update the tokensAfter calculation to include the
length of text added to summaryMsg (including the todo.StateBlock content)
rather than only fullSummaryLen derived from summaryPrefix+summaryText. Locate
summaryMsg and, after appending possible todo.StateBlock (todo.StateBlock /
summaryMsg.Content), compute the total summary length by summing
summaryPrefix+summaryText plus any Text from summaryMsg.Content (or iterate
summaryMsg.Content to accumulate Text lengths), then derive tokensAfter from
that total and keep the existing minimum cap (tokensAfter = max(totalLen/4,
100)).

todo/extension_integration_test.go (1)

22-93: Consider extracting shared test helpers.

scriptedLLM and todoWriteToolUse are duplicated between todo/extension_integration_test.go and compaction/todo_integration_test.go. Consider extracting to a shared internal/testutil package if more tests are added.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@todo/extension_integration_test.go` around lines 22 - 93, The test helpers
scriptedLLM (type and methods: Name, Generate, calls, lastReceived) and
todoWriteToolUse are duplicated across tests; extract them into a shared
internal test helper package (e.g., internal/testutil) and update tests to
import and use the helpers. Move the scriptedLLM type and its methods plus
todoWriteToolUse and any dependent types (TodoItem, WriteInput, ToolName
references) into that package, keep their APIs identical, update imports in
todo/extension_integration_test.go and compaction/todo_integration_test.go to
use testutil.scriptedLLM and testutil.todoWriteToolUse, and run tests to ensure
no missing dependencies or visibility issues (export symbols if needed).

todo/state.go (1)

54-66: Consider iterating message content from end to start for consistency.

If a single message contains multiple TextContent blocks with <todo-state> markers (edge case but possible), the forward iteration returns the first (oldest) block in that message. For consistency with the newest-to-oldest message scan, consider iterating content in reverse:
🔧 Suggested fix
 	for i := len(messages) - 1; i >= 0; i-- {
 		msg := messages[i]
-		for _, c := range msg.Content {
-			tc, ok := c.(*llm.TextContent)
+		for j := len(msg.Content) - 1; j >= 0; j-- {
+			tc, ok := msg.Content[j].(*llm.TextContent)
 			if !ok {
 				continue
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@todo/state.go` around lines 54 - 66, The current scan walks messages
newest-to-oldest but iterates a single message's Content forward, which can
return an older <todo-state> inside a message; change the inner loop over
msg.Content to iterate from end to start (e.g., index from len(msg.Content)-1
down to 0) so you examine the newest TextContent first; keep the existing checks
and returns that use llm.TextContent, parseStateBlock, cloneTodos, snap.Todos,
snap.TurnsSinceWrite and turnsAfterState unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@compaction/compaction.go`:
- Around line 212-226: The token estimate ignores any appended todo-state block;
update the tokensAfter calculation to include the length of text added to
summaryMsg (including the todo.StateBlock content) rather than only
fullSummaryLen derived from summaryPrefix+summaryText. Locate summaryMsg and,
after appending possible todo.StateBlock (todo.StateBlock / summaryMsg.Content),
compute the total summary length by summing summaryPrefix+summaryText plus any
Text from summaryMsg.Content (or iterate summaryMsg.Content to accumulate Text
lengths), then derive tokensAfter from that total and keep the existing minimum
cap (tokensAfter = max(totalLen/4, 100)).

In `@todo/extension_integration_test.go`:
- Around line 22-93: The test helpers scriptedLLM (type and methods: Name,
Generate, calls, lastReceived) and todoWriteToolUse are duplicated across tests;
extract them into a shared internal test helper package (e.g.,
internal/testutil) and update tests to import and use the helpers. Move the
scriptedLLM type and its methods plus todoWriteToolUse and any dependent types
(TodoItem, WriteInput, ToolName references) into that package, keep their APIs
identical, update imports in todo/extension_integration_test.go and
compaction/todo_integration_test.go to use testutil.scriptedLLM and
testutil.todoWriteToolUse, and run tests to ensure no missing dependencies or
visibility issues (export symbols if needed).

In `@todo/state.go`:
- Around line 54-66: The current scan walks messages newest-to-oldest but
iterates a single message's Content forward, which can return an older
<todo-state> inside a message; change the inner loop over msg.Content to iterate
from end to start (e.g., index from len(msg.Content)-1 down to 0) so you examine
the newest TextContent first; keep the existing checks and returns that use
llm.TextContent, parseStateBlock, cloneTodos, snap.Todos, snap.TurnsSinceWrite
and turnsAfterState unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e91f1e98-b103-463e-9c9f-d01fa7e8fe68

📥 Commits

Reviewing files that changed from the base of the PR and between 6b6d547 and 5ba58d5.

📒 Files selected for processing (10)

CLAUDE.md
compaction/compaction.go
compaction/compaction_test.go
compaction/todo_integration_test.go
docs/guides/todo-lists.md
todo/extension.go
todo/extension_integration_test.go
todo/extension_test.go
todo/state.go
toolkit/firecrawl/firecrawl_test.go

✅ Files skipped from review due to trivial changes (2)

CLAUDE.md
todo/extension_test.go

🚧 Files skipped from review as they are similar to previous changes (2)

docs/guides/todo-lists.md
toolkit/firecrawl/firecrawl_test.go

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

myzie and others added 2 commits April 12, 2026 15:33

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

myzie and others added 2 commits April 12, 2026 17:01

Expand Firecrawl error-handling test coverage

5ba58d5

Adds table cases for 400, 401, 404 and additional status codes to pin the client's HTTP error surface. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Promote 7 experimental packages to stable paths#141

Promote 7 experimental packages to stable paths#141
myzie wants to merge 5 commits intomainfrom
promote-experimental-packages

myzie commented Apr 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 12, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 12, 2026

Uh oh!

coderabbitai bot Apr 12, 2026

Uh oh!

coderabbitai bot Apr 12, 2026

Uh oh!

coderabbitai bot Apr 12, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

myzie commented Apr 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

myzie commented Apr 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 12, 2026 •

edited

Loading