Skip to content

Promote 7 experimental packages to stable paths#141

Open
myzie wants to merge 5 commits intomainfrom
promote-experimental-packages
Open

Promote 7 experimental packages to stable paths#141
myzie wants to merge 5 commits intomainfrom
promote-experimental-packages

Conversation

@myzie
Copy link
Copy Markdown
Collaborator

@myzie myzie commented Apr 12, 2026

Summary

  • Move subagent, compaction, todo, a2a, toolkit/firecrawl, toolkit/google, and toolkit/kagi from experimental/ to top-level stable paths
  • Update all import paths across the main module, experimental/cmd/dive, and examples
  • Move documentation guides from docs/guides/experimental/ to docs/guides/ for promoted packages
  • Fix production-readiness issues found during audit:
    • toolkit/google & toolkit/kagi: Data race — baseURL was a package-level var mutated by option functions; moved to Client struct field. Retry sleeps now respect context cancellation.
    • toolkit/kagi: Renamed stuttering types (KagiClientClient, WithKagiAPIKeyWithAPIKey, etc.)
    • toolkit/google: Search no longer mutates caller's input struct. Removed dead commented-out code.
    • a2a/server.go: Handler() allocated a new http.ServeMux per call; now built once and cached.
    • subagent/loader.go: Consolidated double os.Getwd() call. MapLoader.Load() returns a defensive copy.

settings deferred from this batch — it imports experimental/sandbox which is staying experimental, and nothing currently imports it.

Test plan

  • go vet ./... clean
  • go test ./... all passing
  • cd experimental/cmd/dive && go build succeeds
  • cd examples && go vet ./... clean
  • No stale experimental/ references in code, guides, or README (historical PRD/plan docs left as-is)
  • docs/guides/experimental/ only contains still-experimental guides (mcp, sandboxing, media-generation)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • New Todo package: TodoWrite tool and a Todo extension with automated reminders and state persistence.
  • Improvements

    • Promoted A2A, Compaction, Sub-Agents, and Todo to stable top-level packages.
    • Google and Kagi search clients now use per-client configuration and backoff that respects cancellation.
    • Firecrawl fetch/parsing and error handling hardened.
    • A2A server now uses a cached request handler for more efficient dispatch.
  • Tests

    • Expanded tests for Firecrawl, Todo tool/extension, and compaction.
  • Documentation

    • Guides updated to reflect package reorganization and experimental vs. stable labeling.

Move subagent, compaction, todo, a2a, toolkit/firecrawl, toolkit/google,
and toolkit/kagi from experimental/ to top-level stable paths. Update all
import paths, documentation, and CLAUDE.md.

Fix production-readiness issues: data races in google/kagi search clients
(package-level baseURL mutation), input mutation in google Search, kagi
type name stuttering, a2a handler allocation per request, subagent
MapLoader returning internal map.

Settings deferred from this batch due to experimental/sandbox dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 12, 2026

📝 Walkthrough

Walkthrough

Promotes several experimental packages to top-level (a2a, compaction, subagent, todo), updates imports/docs, introduces a new todo package and Extension with reminder/state hooks, adds todo/compaction integration tests, caches A2A server handler, and refactors toolkit clients (per-client baseURLs and cancellation-aware backoff); hardens Firecrawl handling.

Changes

Cohort / File(s) Summary
Docs & Guides
CLAUDE.md, docs/README.md, docs/guides/a2a.md, docs/guides/agents.md, docs/guides/compaction.md, docs/guides/hooks.md, docs/guides/subagents.md, docs/guides/todo-lists.md
Re-scoped documentation from experimental/* to top-level packages; marked promoted vs experimental guides; updated navigation and examples.
A2A package & examples
a2a/doc.go, a2a/a2a_test.go, a2a/interop_test.go, examples/a2a_example/main.go
Updated package docs and imports to github.com/deepnoodle-ai/dive/a2a; tests and examples now reference stable a2a.
A2A server handler caching
a2a/server.go
Cache built http.Handler at NewServer via buildHandler(); Handler() returns cached handler; ServeHTTP dispatches to cached handler.
Experimental CLI & toolkit imports
experimental/cmd/dive/*.go, experimental/cmd/dive/*_test.go, experimental/toolkit/extended/task.go, experimental/toolkit/extended/task_test.go
Updated imports to use stable compaction, subagent, todo, and dive/toolkit/* locations; adjusted call sites accordingly.
Removed experimental extended todo
experimental/toolkit/extended/todo.go, experimental/toolkit/extended/todo_test.go
Deleted prior in-memory TodoWrite tool and its tests from experimental extended toolkit.
New todo package & extension
todo/tool.go, todo/extension.go, todo/state.go, todo/extension_test.go, todo/tool_test.go, todo/extension_integration_test.go
Added todo package: TodoWrite typed tool, WriteInput, ToolName, state persistence (StateBlock/LatestState), and Extension wiring PreGeneration/PreIteration/PostToolUse hooks; extensive unit and integration tests.
Subagent loader changes
subagent/loader.go
Adjusted directory resolution when including Claude agents; MapLoader.Load now returns a defensive copy of definitions to prevent external mutation.
Compaction changes & tests
compaction/compaction.go, compaction/compaction_test.go, compaction/todo_integration_test.go
CompactMessages now preserves appended todo state block in generated summary; added unit and integration tests validating state preservation.
Toolkit — Google client
toolkit/google/search.go
Removed package-level mutable baseURL/SetBaseURL; added per-Client baseURL and WithBaseURL; made 429 backoff cancellation-aware; adjusted pagination/limit handling; removed SetBaseURL API.
Toolkit — Kagi client
toolkit/kagi/kagi.go
Refactored to per-client config: KagiClientClient, new ClientOption/WithAPIKey/WithBaseURL/WithHTTPClient, per-client baseURL, cancellation-aware retries, and renamed internal response types.
Toolkit — Firecrawl
toolkit/firecrawl/firecrawl.go, toolkit/firecrawl/firecrawl_types.go, toolkit/firecrawl/firecrawl_test.go
Changed request format mapping (raw_htmlrawHtml), pruned forwarded defaults, tightened response validation (fail when success==false or data==nil), expanded HTTP error mappings, added tolerant keywordsField unmarshal, and expanded tests.
Misc tests & examples
various updated tests and examples
Added/updated tests (todo, compaction, firecrawl) and updated example/test import paths to stable packages.

Sequence Diagram(s)

sequenceDiagram
    participant Agent
    participant TodoExt as TodoExtension
    participant History as MessageHistory
    participant Tool as TodoWriteTool
    Agent->>TodoExt: PreGeneration/PreIteration hook
    TodoExt->>History: scan messages (newest→oldest) for <todo-state> / tool-use
    alt latest state found
        History-->>TodoExt: returns todos + turnsSinceWrite
        TodoExt->>TodoExt: compute staleness vs threshold
        alt stale
            TodoExt->>Agent: inject <system-reminder name="todos"> into first user message
        else recent
            TodoExt->>Agent: remove any existing reminder
        end
    else no state
        TodoExt->>Agent: ensure reminder removed
    end
    Agent->>Tool: may call TodoWrite tool during generation
    Tool-->>TodoExt: successful tool result captured by PostToolUse hook
    TodoExt->>Agent: append hidden <todo-state> block to AdditionalContext
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped through imports, neat and light,

From experimental burrows to stable daylight.
Cached handlers hum, clients bow and wait,
Reminders whisper, todos annotate.
A tiny thump for this tidy merge tonight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.99% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: promoting 7 experimental packages to stable paths, which is the primary objective of the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch promote-experimental-packages

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
experimental/toolkit/extended/todo.go (1)

19-23: ⚠️ Potential issue | 🟡 Minor

Stale comment: update reference to canonical types location.

The comment says "canonical types are in experimental/todo" but they have been moved to the stable todo package.

📝 Proposed fix
-// Type aliases — canonical types are in experimental/todo
+// Type aliases — canonical types are in todo
 type (
 	TodoStatus = todo.TodoStatus
 	TodoItem   = todo.TodoItem
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experimental/toolkit/extended/todo.go` around lines 19 - 23, The file comment
above the type aliases TodoStatus and TodoItem is stale; update it to reflect
that the canonical types now live in the stable todo package (not
experimental/todo). Edit the comment text near the type alias block (the line
above the type (...) declaration) to reference the stable "todo" package or
remove the "experimental" qualifier so it accurately documents that TodoStatus
and TodoItem are aliases to the canonical types in todo.
docs/guides/a2a.md (1)

386-393: ⚠️ Potential issue | 🟡 Minor

Fix contradiction about package stability in the boundaries section.

This section says A2A is stable, but then says both A2A and MCP live in experimental packages.

✏️ Suggested doc fix
-MCP support remains a separate story. MCP is for tools and data; A2A is
-for agents. They live in different experimental packages and are
-independently adoptable.
+MCP support remains a separate story. MCP is for tools and data; A2A is
+for agents. MCP currently lives under `experimental/`, while A2A is in
+stable `a2a/`; they are independently adoptable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/guides/a2a.md` around lines 386 - 393, The paragraph contradicts itself
about stability; decide which is correct and make the wording consistent: either
(A) mark both a2a and MCP as experimental and state that a2a lives in an
experimental package that intentionally does not leak types into the stable dive
package (so *dive.Agent, *dive.Response, and dive.Session remain unchanged), or
(B) if a2a is intended to be stable, change the sentence that calls it
experimental to say MCP is experimental while clarifying that a2a’s public
surface does not force core-API changes. Update the sentences mentioning a2a,
dive, and MCP so they all reflect the chosen stability status consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@docs/guides/a2a.md`:
- Around line 386-393: The paragraph contradicts itself about stability; decide
which is correct and make the wording consistent: either (A) mark both a2a and
MCP as experimental and state that a2a lives in an experimental package that
intentionally does not leak types into the stable dive package (so *dive.Agent,
*dive.Response, and dive.Session remain unchanged), or (B) if a2a is intended to
be stable, change the sentence that calls it experimental to say MCP is
experimental while clarifying that a2a’s public surface does not force core-API
changes. Update the sentences mentioning a2a, dive, and MCP so they all reflect
the chosen stability status consistently.

In `@experimental/toolkit/extended/todo.go`:
- Around line 19-23: The file comment above the type aliases TodoStatus and
TodoItem is stale; update it to reflect that the canonical types now live in the
stable todo package (not experimental/todo). Edit the comment text near the type
alias block (the line above the type (...) declaration) to reference the stable
"todo" package or remove the "experimental" qualifier so it accurately documents
that TodoStatus and TodoItem are aliases to the canonical types in todo.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1997a133-bef3-4f00-b648-cb6a82956b5a

📥 Commits

Reviewing files that changed from the base of the PR and between 7c0cdd2 and 8800614.

📒 Files selected for processing (40)
  • CLAUDE.md
  • a2a/a2a_test.go
  • a2a/client.go
  • a2a/doc.go
  • a2a/interop_test.go
  • a2a/remoteagent.go
  • a2a/rpc.go
  • a2a/server.go
  • a2a/task_store.go
  • a2a/types.go
  • compaction/compaction.go
  • compaction/compaction_test.go
  • compaction/hooks.go
  • docs/README.md
  • docs/guides/a2a.md
  • docs/guides/agents.md
  • docs/guides/compaction.md
  • docs/guides/hooks.md
  • docs/guides/subagents.md
  • docs/guides/todo-lists.md
  • examples/a2a_example/main.go
  • experimental/cmd/dive/app.go
  • experimental/cmd/dive/app_interactive_test.go
  • experimental/cmd/dive/app_test.go
  • experimental/cmd/dive/main.go
  • experimental/cmd/dive/render.go
  • experimental/toolkit/extended/task.go
  • experimental/toolkit/extended/task_test.go
  • experimental/toolkit/extended/todo.go
  • subagent/loader.go
  • subagent/subagent.go
  • subagent/subagent_test.go
  • todo/todo_tracker.go
  • todo/types.go
  • toolkit/firecrawl/firecrawl.go
  • toolkit/firecrawl/firecrawl_test.go
  • toolkit/firecrawl/firecrawl_types.go
  • toolkit/google/search.go
  • toolkit/google/types.go
  • toolkit/kagi/kagi.go

myzie and others added 2 commits April 12, 2026 15:33
- Translate wonton's "raw_html" format to Firecrawl's "rawHtml" (the
  v2 API rejects snake_case format names).
- Drop the redundant "v2 specific defaults" block (blockAds,
  removeBase64Images, proxy: auto, storeInCache); these match
  Firecrawl's own defaults and forcing them removed any future ability
  for callers to override.
- Map metadata.keywords (which Firecrawl returns as either a string or
  array) into fetch.Metadata.Keywords via a custom unmarshaler.
- Nil-check scrapeResp.Data and Data.Metadata before dereferencing.
- Add 400, 401, 404, and 502/503/504 to the friendly error branches.
- Cover the new behavior with unit tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The TodoWrite tool was the last piece left in experimental/toolkit/extended/
after the previous promotion batch. Move it to the stable todo/ package and
restructure it the way the rest of Dive's tools work — as a dive.Extension
modeled after skill.Loader.

Architecture changes:

- Tool is stateless. The previous TodoWriteTool stored todos []TodoItem on
  its struct, which meant a single instance reused across sessions or agents
  bled state between them. The conversation message history is now the sole
  source of truth for the current list. GetTodos / GetCurrentTask are
  removed (zero callers in-tree outside the tool's own tests).

- New todo.Extension implements dive.Extension. It returns the TodoWrite
  tool plus a PreGenerationHook that walks message history, finds the most
  recent TodoWrite tool_use, and injects a <system-reminder name="todos">
  block via dive.SetSystemReminder when the model has not used the tool in
  the last N assistant turns (default 10, matching Claude Code's
  TODO_REMINDER_CONFIG.TURNS_SINCE_WRITE). When the model uses TodoWrite
  again, the next iteration removes the block.

- Both tool and hook are fully stateless, so a single Extension instance is
  safe to share across agents, sessions, and subagents. No parallel store,
  no per-tool-instance bleed. This is structurally simpler than Claude
  Code's appState.todos[agentId ?? sessionId] map because Dive does not
  have a process-singleton app state to begin with.

Behavioral changes:

- Tool result text now matches Claude's canonical TodoWrite acknowledgment
  ("Todos have been modified successfully. Ensure that you continue to use
  the todo list to track your progress...") followed by a one-line progress
  summary, instead of returning a JSON counts blob. Claude is tuned on the
  canonical text shape.

- Empty list with all tasks complete appends a one-line nudge to either
  write a fresh list or wrap up, but the list itself is retained (not
  auto-cleared as Claude Code does — Dive's persistence is more useful).

API changes:

- experimental.NewTodoWriteTool(...) -> todo.NewTool(...) or todo.New(...)
- TodoWriteToolOptions{OnUpdate: fn} -> todo.WithOnUpdate(fn) /
  todo.WithExtensionOnUpdate(fn)
- TodoWriteInput -> todo.WriteInput

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
todo/tool_test.go (1)

36-137: Add a regression case for multiple in_progress todos.

Given the tool contract says only one active item at a time, please add a test that submits two in_progress items and asserts an error result. This will lock behavior and prevent future regressions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@todo/tool_test.go` around lines 36 - 137, Add a new subtest inside
TestTool_Call named "MultipleInProgress" that constructs a NewTool(), builds a
WriteInput with two TodoItem entries both using TodoStatusInProgress, calls
tool.Call(ctx, input), and asserts the call returns no error but result.IsError
is true and result.Content[0].Text contains the expected validation message
(e.g., "only one active item" or "only one active item at a time") to lock the
behavior that only one in_progress item is allowed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@todo/extension.go`:
- Around line 142-152: The loop over msg.Content currently scans forward and may
return an older TodoWrite call; change the iteration in the block that inspects
msg.Content (looking for *llm.ToolUseContent with tu.Name == ToolName and
calling parseTodoInput) to iterate from the end to the start (for i :=
len(msg.Content)-1; i >= 0; i--) so the function picks the latest TodoWrite
within the same assistant message and returns parsed, turnsSince, true for the
first matching (most recent) tool use.

In `@todo/tool.go`:
- Around line 148-158: Both PreviewCall and Call dereference input without
checking for nil; add a nil guard at the start of PreviewCall and Call to return
an appropriate error/result when input == nil. In PreviewCall (func (t *Tool)
PreviewCall) check input == nil and return a ToolCallPreview that indicates the
missing payload (or an error preview) instead of calling countByStatus; in Call
(func (t *Tool) Call) check input == nil and return
dive.NewToolResultError("todos array is required") (or a clearer error message)
to avoid panics when accessing input.Todos. Ensure both guards run before any
access to input or calling countByStatus.
- Around line 160-170: The loop over input.Todos currently validates each item's
fields but does not enforce the rule that only one todo may be in the
TodoStatusInProgress state; add validation (after or during the loop) that
counts occurrences of TodoStatusInProgress in input.Todos and if count > 1
return dive.NewToolResultError with a clear message like "only one todo item may
have status 'in_progress'". Use the existing input.Todos slice and the
TodoStatusInProgress constant to locate the check and return the error in the
same style as the other validations.
- Around line 107-135: The Item object in Tool.Schema() lacks a Required list so
the LLM can't rely on schema validation; update the Items property inside
Schema() (the nested *schema.Property under "todos" with Type "object") to
include Required: []string{"content","status","activeForm"} so each todo item
enforces those fields during validation; locate the Items property in the
Schema() method and add the Required slice to that nested object property.

---

Nitpick comments:
In `@todo/tool_test.go`:
- Around line 36-137: Add a new subtest inside TestTool_Call named
"MultipleInProgress" that constructs a NewTool(), builds a WriteInput with two
TodoItem entries both using TodoStatusInProgress, calls tool.Call(ctx, input),
and asserts the call returns no error but result.IsError is true and
result.Content[0].Text contains the expected validation message (e.g., "only one
active item" or "only one active item at a time") to lock the behavior that only
one in_progress item is allowed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 63acf304-592f-402f-a042-d3b9a53e964f

📥 Commits

Reviewing files that changed from the base of the PR and between 9fd8f37 and 6b6d547.

📒 Files selected for processing (8)
  • CLAUDE.md
  • docs/guides/todo-lists.md
  • experimental/toolkit/extended/todo.go
  • experimental/toolkit/extended/todo_test.go
  • todo/extension.go
  • todo/extension_test.go
  • todo/tool.go
  • todo/tool_test.go
💤 Files with no reviewable changes (2)
  • experimental/toolkit/extended/todo_test.go
  • experimental/toolkit/extended/todo.go
✅ Files skipped from review due to trivial changes (2)
  • CLAUDE.md
  • docs/guides/todo-lists.md

Comment thread todo/extension.go Outdated
Comment on lines +142 to +152
for _, c := range msg.Content {
tu, ok := c.(*llm.ToolUseContent)
if !ok || tu.Name != ToolName {
continue
}
parsed, err := parseTodoInput(tu.Input)
if err != nil {
continue
}
return parsed, turnsSince, true
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Pick the latest TodoWrite call within the same assistant message.

Line 142 iterates content in forward order. If one assistant message contains multiple TodoWrite tool uses, this can return an older call from that message. Iterate message content from end to start.

🔧 Suggested fix
-		for _, c := range msg.Content {
-			tu, ok := c.(*llm.ToolUseContent)
+		for j := len(msg.Content) - 1; j >= 0; j-- {
+			tu, ok := msg.Content[j].(*llm.ToolUseContent)
 			if !ok || tu.Name != ToolName {
 				continue
 			}
 			parsed, err := parseTodoInput(tu.Input)
 			if err != nil {
 				continue
 			}
 			return parsed, turnsSince, true
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for _, c := range msg.Content {
tu, ok := c.(*llm.ToolUseContent)
if !ok || tu.Name != ToolName {
continue
}
parsed, err := parseTodoInput(tu.Input)
if err != nil {
continue
}
return parsed, turnsSince, true
}
for j := len(msg.Content) - 1; j >= 0; j-- {
tu, ok := msg.Content[j].(*llm.ToolUseContent)
if !ok || tu.Name != ToolName {
continue
}
parsed, err := parseTodoInput(tu.Input)
if err != nil {
continue
}
return parsed, turnsSince, true
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@todo/extension.go` around lines 142 - 152, The loop over msg.Content
currently scans forward and may return an older TodoWrite call; change the
iteration in the block that inspects msg.Content (looking for
*llm.ToolUseContent with tu.Name == ToolName and calling parseTodoInput) to
iterate from the end to the start (for i := len(msg.Content)-1; i >= 0; i--) so
the function picks the latest TodoWrite within the same assistant message and
returns parsed, turnsSince, true for the first matching (most recent) tool use.

Comment thread todo/tool.go
Comment on lines +107 to +135
func (t *Tool) Schema() *schema.Schema {
return &schema.Schema{
Type: "object",
Required: []string{"todos"},
Properties: map[string]*schema.Property{
"todos": {
Type: "array",
Description: "The complete updated todo list",
Items: &schema.Property{
Type: "object",
Properties: map[string]*schema.Property{
"content": {
Type: "string",
Description: "The task description in imperative form (e.g., 'Run tests')",
},
"status": {
Type: "string",
Enum: []any{"pending", "in_progress", "completed"},
Description: "The task status: pending, in_progress, or completed",
},
"activeForm": {
Type: "string",
Description: "The task in present continuous form (e.g., 'Running tests')",
},
},
},
},
},
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "tool.go" -path "*/todo/*" -type f

Repository: deepnoodle-ai/dive

Length of output: 76


🏁 Script executed:

cat -n ./todo/tool.go

Repository: deepnoodle-ai/dive

Length of output: 9016


Add Required slice to Items property schema to align with validation.

Lines 161–169 reject todo items missing content, status, or activeForm, but the schema at lines 115–131 does not declare these fields as required on the Items object. This forces the LLM to learn the constraints through rejection loops rather than schema inspection.

♻️ Schema alignment fix
 			"todos": {
 				Type:        "array",
 				Description: "The complete updated todo list",
 				Items: &schema.Property{
 					Type: "object",
+					Required: []string{"content", "status", "activeForm"},
 					Properties: map[string]*schema.Property{
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (t *Tool) Schema() *schema.Schema {
return &schema.Schema{
Type: "object",
Required: []string{"todos"},
Properties: map[string]*schema.Property{
"todos": {
Type: "array",
Description: "The complete updated todo list",
Items: &schema.Property{
Type: "object",
Properties: map[string]*schema.Property{
"content": {
Type: "string",
Description: "The task description in imperative form (e.g., 'Run tests')",
},
"status": {
Type: "string",
Enum: []any{"pending", "in_progress", "completed"},
Description: "The task status: pending, in_progress, or completed",
},
"activeForm": {
Type: "string",
Description: "The task in present continuous form (e.g., 'Running tests')",
},
},
},
},
},
}
func (t *Tool) Schema() *schema.Schema {
return &schema.Schema{
Type: "object",
Required: []string{"todos"},
Properties: map[string]*schema.Property{
"todos": {
Type: "array",
Description: "The complete updated todo list",
Items: &schema.Property{
Type: "object",
Required: []string{"content", "status", "activeForm"},
Properties: map[string]*schema.Property{
"content": {
Type: "string",
Description: "The task description in imperative form (e.g., 'Run tests')",
},
"status": {
Type: "string",
Enum: []any{"pending", "in_progress", "completed"},
Description: "The task status: pending, in_progress, or completed",
},
"activeForm": {
Type: "string",
Description: "The task in present continuous form (e.g., 'Running tests')",
},
},
},
},
},
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@todo/tool.go` around lines 107 - 135, The Item object in Tool.Schema() lacks
a Required list so the LLM can't rely on schema validation; update the Items
property inside Schema() (the nested *schema.Property under "todos" with Type
"object") to include Required: []string{"content","status","activeForm"} so each
todo item enforces those fields during validation; locate the Items property in
the Schema() method and add the Required slice to that nested object property.

Comment thread todo/tool.go
Comment on lines +148 to +158
func (t *Tool) PreviewCall(_ context.Context, input *WriteInput) *dive.ToolCallPreview {
pending, inProgress, completed, _ := countByStatus(input.Todos)
return &dive.ToolCallPreview{
Summary: fmt.Sprintf("Update todos: %d pending, %d in progress, %d completed",
pending, inProgress, completed),
}
}

func (t *Tool) Call(_ context.Context, input *WriteInput) (*dive.ToolResult, error) {
if input.Todos == nil {
return dive.NewToolResultError("todos array is required"), nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard nil input to avoid panics in preview/call paths.

Line 149 and Line 157 dereference input without a nil check. A nil payload can panic before producing a tool error result.

🛡️ Suggested nil guards
 func (t *Tool) PreviewCall(_ context.Context, input *WriteInput) *dive.ToolCallPreview {
+	if input == nil {
+		return &dive.ToolCallPreview{
+			Summary: "Update todos: 0 pending, 0 in progress, 0 completed",
+		}
+	}
 	pending, inProgress, completed, _ := countByStatus(input.Todos)
 	return &dive.ToolCallPreview{
 		Summary: fmt.Sprintf("Update todos: %d pending, %d in progress, %d completed",
 			pending, inProgress, completed),
 	}
 }
 
 func (t *Tool) Call(_ context.Context, input *WriteInput) (*dive.ToolResult, error) {
+	if input == nil {
+		return dive.NewToolResultError("input is required"), nil
+	}
 	if input.Todos == nil {
 		return dive.NewToolResultError("todos array is required"), nil
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (t *Tool) PreviewCall(_ context.Context, input *WriteInput) *dive.ToolCallPreview {
pending, inProgress, completed, _ := countByStatus(input.Todos)
return &dive.ToolCallPreview{
Summary: fmt.Sprintf("Update todos: %d pending, %d in progress, %d completed",
pending, inProgress, completed),
}
}
func (t *Tool) Call(_ context.Context, input *WriteInput) (*dive.ToolResult, error) {
if input.Todos == nil {
return dive.NewToolResultError("todos array is required"), nil
func (t *Tool) PreviewCall(_ context.Context, input *WriteInput) *dive.ToolCallPreview {
if input == nil {
return &dive.ToolCallPreview{
Summary: "Update todos: 0 pending, 0 in progress, 0 completed",
}
}
pending, inProgress, completed, _ := countByStatus(input.Todos)
return &dive.ToolCallPreview{
Summary: fmt.Sprintf("Update todos: %d pending, %d in progress, %d completed",
pending, inProgress, completed),
}
}
func (t *Tool) Call(_ context.Context, input *WriteInput) (*dive.ToolResult, error) {
if input == nil {
return dive.NewToolResultError("input is required"), nil
}
if input.Todos == nil {
return dive.NewToolResultError("todos array is required"), nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@todo/tool.go` around lines 148 - 158, Both PreviewCall and Call dereference
input without checking for nil; add a nil guard at the start of PreviewCall and
Call to return an appropriate error/result when input == nil. In PreviewCall
(func (t *Tool) PreviewCall) check input == nil and return a ToolCallPreview
that indicates the missing payload (or an error preview) instead of calling
countByStatus; in Call (func (t *Tool) Call) check input == nil and return
dive.NewToolResultError("todos array is required") (or a clearer error message)
to avoid panics when accessing input.Todos. Ensure both guards run before any
access to input or calling countByStatus.

Comment thread todo/tool.go
Comment on lines +160 to +170
for i, item := range input.Todos {
if item.Content == "" {
return dive.NewToolResultError(fmt.Sprintf("todo[%d].content is required", i)), nil
}
if item.ActiveForm == "" {
return dive.NewToolResultError(fmt.Sprintf("todo[%d].activeForm is required", i)), nil
}
if item.Status != TodoStatusPending && item.Status != TodoStatusInProgress && item.Status != TodoStatusCompleted {
return dive.NewToolResultError(fmt.Sprintf("todo[%d].status must be 'pending', 'in_progress', or 'completed'", i)), nil
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Enforce the “one in_progress item” contract.

The description/rules state only one active task at a time, but Line 160-170 accepts multiple in_progress items. This weakens consistency of the todo state.

✅ Suggested validation
 func (t *Tool) Call(_ context.Context, input *WriteInput) (*dive.ToolResult, error) {
 	if input.Todos == nil {
 		return dive.NewToolResultError("todos array is required"), nil
 	}
+	inProgressCount := 0
 	for i, item := range input.Todos {
 		if item.Content == "" {
 			return dive.NewToolResultError(fmt.Sprintf("todo[%d].content is required", i)), nil
 		}
 		if item.ActiveForm == "" {
 			return dive.NewToolResultError(fmt.Sprintf("todo[%d].activeForm is required", i)), nil
 		}
 		if item.Status != TodoStatusPending && item.Status != TodoStatusInProgress && item.Status != TodoStatusCompleted {
 			return dive.NewToolResultError(fmt.Sprintf("todo[%d].status must be 'pending', 'in_progress', or 'completed'", i)), nil
 		}
+		if item.Status == TodoStatusInProgress {
+			inProgressCount++
+		}
 	}
+	if inProgressCount > 1 {
+		return dive.NewToolResultError("only one todo may be 'in_progress'"), nil
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for i, item := range input.Todos {
if item.Content == "" {
return dive.NewToolResultError(fmt.Sprintf("todo[%d].content is required", i)), nil
}
if item.ActiveForm == "" {
return dive.NewToolResultError(fmt.Sprintf("todo[%d].activeForm is required", i)), nil
}
if item.Status != TodoStatusPending && item.Status != TodoStatusInProgress && item.Status != TodoStatusCompleted {
return dive.NewToolResultError(fmt.Sprintf("todo[%d].status must be 'pending', 'in_progress', or 'completed'", i)), nil
}
}
func (t *Tool) Call(_ context.Context, input *WriteInput) (*dive.ToolResult, error) {
if input.Todos == nil {
return dive.NewToolResultError("todos array is required"), nil
}
inProgressCount := 0
for i, item := range input.Todos {
if item.Content == "" {
return dive.NewToolResultError(fmt.Sprintf("todo[%d].content is required", i)), nil
}
if item.ActiveForm == "" {
return dive.NewToolResultError(fmt.Sprintf("todo[%d].activeForm is required", i)), nil
}
if item.Status != TodoStatusPending && item.Status != TodoStatusInProgress && item.Status != TodoStatusCompleted {
return dive.NewToolResultError(fmt.Sprintf("todo[%d].status must be 'pending', 'in_progress', or 'completed'", i)), nil
}
if item.Status == TodoStatusInProgress {
inProgressCount++
}
}
if inProgressCount > 1 {
return dive.NewToolResultError("only one todo may be 'in_progress'"), nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@todo/tool.go` around lines 160 - 170, The loop over input.Todos currently
validates each item's fields but does not enforce the rule that only one todo
may be in the TodoStatusInProgress state; add validation (after or during the
loop) that counts occurrences of TodoStatusInProgress in input.Todos and if
count > 1 return dive.NewToolResultError with a clear message like "only one
todo item may have status 'in_progress'". Use the existing input.Todos slice and
the TodoStatusInProgress constant to locate the check and return the error in
the same style as the other validations.

myzie and others added 2 commits April 12, 2026 17:01
The todo.Extension now captures each successful TodoWrite as a hidden
<todo-state> block on the tool_result message via a PostToolUse hook, so
the stale-list reminder and downstream tooling can read authoritative
state from conversation history instead of re-parsing raw tool_use input.
compaction.CompactMessages carries the latest block forward into the
summary message so todo state survives context rewriting. The state
capture hook refuses to emit a block for failed tool results as a
defensive guard against future dispatch refactors.

Adds end-to-end integration tests covering multi-turn persistence,
reminder self-clearing after a second write, empty-list handling,
malformed-block fail-safe behavior, suspend/resume preservation, and
compaction-driven state carry-forward.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds table cases for 400, 401, 404 and additional status codes to pin
the client's HTTP error surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
compaction/compaction.go (1)

212-226: Token estimate doesn't account for appended todo-state block.

The tokensAfter calculation on lines 222-223 only uses summaryPrefix + summaryText length, but if a <todo-state> block is appended (lines 214-216), this underestimates the actual token count. Consider including the state block length:

🔧 Suggested fix
-	// Step 7: Build compaction event
-	// TokensAfter is estimated from full summary message length (rough heuristic: ~4 chars per token)
-	fullSummaryLen := len(summaryPrefix) + len(summaryText)
+	// Step 7: Build compaction event
+	// TokensAfter is estimated from full summary message length (rough heuristic: ~4 chars per token)
+	fullSummaryLen := 0
+	for _, c := range summaryMsg.Content {
+		if tc, ok := c.(*llm.TextContent); ok {
+			fullSummaryLen += len(tc.Text)
+		}
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compaction/compaction.go` around lines 212 - 226, The token estimate ignores
any appended todo-state block; update the tokensAfter calculation to include the
length of text added to summaryMsg (including the todo.StateBlock content)
rather than only fullSummaryLen derived from summaryPrefix+summaryText. Locate
summaryMsg and, after appending possible todo.StateBlock (todo.StateBlock /
summaryMsg.Content), compute the total summary length by summing
summaryPrefix+summaryText plus any Text from summaryMsg.Content (or iterate
summaryMsg.Content to accumulate Text lengths), then derive tokensAfter from
that total and keep the existing minimum cap (tokensAfter = max(totalLen/4,
100)).
todo/extension_integration_test.go (1)

22-93: Consider extracting shared test helpers.

scriptedLLM and todoWriteToolUse are duplicated between todo/extension_integration_test.go and compaction/todo_integration_test.go. Consider extracting to a shared internal/testutil package if more tests are added.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@todo/extension_integration_test.go` around lines 22 - 93, The test helpers
scriptedLLM (type and methods: Name, Generate, calls, lastReceived) and
todoWriteToolUse are duplicated across tests; extract them into a shared
internal test helper package (e.g., internal/testutil) and update tests to
import and use the helpers. Move the scriptedLLM type and its methods plus
todoWriteToolUse and any dependent types (TodoItem, WriteInput, ToolName
references) into that package, keep their APIs identical, update imports in
todo/extension_integration_test.go and compaction/todo_integration_test.go to
use testutil.scriptedLLM and testutil.todoWriteToolUse, and run tests to ensure
no missing dependencies or visibility issues (export symbols if needed).
todo/state.go (1)

54-66: Consider iterating message content from end to start for consistency.

If a single message contains multiple TextContent blocks with <todo-state> markers (edge case but possible), the forward iteration returns the first (oldest) block in that message. For consistency with the newest-to-oldest message scan, consider iterating content in reverse:

🔧 Suggested fix
 	for i := len(messages) - 1; i >= 0; i-- {
 		msg := messages[i]
-		for _, c := range msg.Content {
-			tc, ok := c.(*llm.TextContent)
+		for j := len(msg.Content) - 1; j >= 0; j-- {
+			tc, ok := msg.Content[j].(*llm.TextContent)
 			if !ok {
 				continue
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@todo/state.go` around lines 54 - 66, The current scan walks messages
newest-to-oldest but iterates a single message's Content forward, which can
return an older <todo-state> inside a message; change the inner loop over
msg.Content to iterate from end to start (e.g., index from len(msg.Content)-1
down to 0) so you examine the newest TextContent first; keep the existing checks
and returns that use llm.TextContent, parseStateBlock, cloneTodos, snap.Todos,
snap.TurnsSinceWrite and turnsAfterState unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@compaction/compaction.go`:
- Around line 212-226: The token estimate ignores any appended todo-state block;
update the tokensAfter calculation to include the length of text added to
summaryMsg (including the todo.StateBlock content) rather than only
fullSummaryLen derived from summaryPrefix+summaryText. Locate summaryMsg and,
after appending possible todo.StateBlock (todo.StateBlock / summaryMsg.Content),
compute the total summary length by summing summaryPrefix+summaryText plus any
Text from summaryMsg.Content (or iterate summaryMsg.Content to accumulate Text
lengths), then derive tokensAfter from that total and keep the existing minimum
cap (tokensAfter = max(totalLen/4, 100)).

In `@todo/extension_integration_test.go`:
- Around line 22-93: The test helpers scriptedLLM (type and methods: Name,
Generate, calls, lastReceived) and todoWriteToolUse are duplicated across tests;
extract them into a shared internal test helper package (e.g.,
internal/testutil) and update tests to import and use the helpers. Move the
scriptedLLM type and its methods plus todoWriteToolUse and any dependent types
(TodoItem, WriteInput, ToolName references) into that package, keep their APIs
identical, update imports in todo/extension_integration_test.go and
compaction/todo_integration_test.go to use testutil.scriptedLLM and
testutil.todoWriteToolUse, and run tests to ensure no missing dependencies or
visibility issues (export symbols if needed).

In `@todo/state.go`:
- Around line 54-66: The current scan walks messages newest-to-oldest but
iterates a single message's Content forward, which can return an older
<todo-state> inside a message; change the inner loop over msg.Content to iterate
from end to start (e.g., index from len(msg.Content)-1 down to 0) so you examine
the newest TextContent first; keep the existing checks and returns that use
llm.TextContent, parseStateBlock, cloneTodos, snap.Todos, snap.TurnsSinceWrite
and turnsAfterState unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e91f1e98-b103-463e-9c9f-d01fa7e8fe68

📥 Commits

Reviewing files that changed from the base of the PR and between 6b6d547 and 5ba58d5.

📒 Files selected for processing (10)
  • CLAUDE.md
  • compaction/compaction.go
  • compaction/compaction_test.go
  • compaction/todo_integration_test.go
  • docs/guides/todo-lists.md
  • todo/extension.go
  • todo/extension_integration_test.go
  • todo/extension_test.go
  • todo/state.go
  • toolkit/firecrawl/firecrawl_test.go
✅ Files skipped from review due to trivial changes (2)
  • CLAUDE.md
  • todo/extension_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/guides/todo-lists.md
  • toolkit/firecrawl/firecrawl_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant