Add custom tool callbacks and e2e tests #157

ashwin-ant · 2025-09-08T01:53:32Z

Summary

This PR adds support for custom tool callbacks and comprehensive e2e testing for MCP calculator functionality.

Key Features Added

Custom tool permission callbacks - Allow dynamic tool permission control via can_use_tool callback
E2E test suite - Real Claude API tests validating MCP tool execution end-to-end
Fixed MCP calculator example - Now properly uses allowed_tools for permission management

Changes

Custom Callbacks

Added ToolPermissionContext and PermissionResult types for tool permission handling
Implemented can_use_tool callback support in SDK client
Added comprehensive tests in tests/test_tool_callbacks.py

E2E Testing Infrastructure

Created e2e-tests/ directory with pytest-based test suite
test_mcp_calculator.py - Tests all calculator operations with real API calls
conftest.py - Pytest config with mandatory API key validation
GitHub Actions workflow for automated e2e testing on main branch
Comprehensive documentation in e2e-tests/README.md

Bug Fixes

Fixed MCP calculator example to use allowed_tools instead of incorrect permission_mode
Resolved tool permission issues preventing MCP tools from executing

Testing

E2E tests require ANTHROPIC_API_KEY environment variable and will fail without it.

Run locally:

export ANTHROPIC_API_KEY=your-key
python -m pytest e2e-tests/ -v -m e2e

Run unit tests including callback tests:

python -m pytest tests/test_tool_callbacks.py -v

🤖 Generated with Claude Code

This reverts commit e4feaf2.

- Pass SDK MCP servers to CLI with instance field stripped - Add initialize method handler for MCP protocol - Handle notifications/initialized message - Fix tools/list response to handle both dict and Pydantic schemas - Wrap MCP responses in mcp_response field for control protocol - Update calculator example to use ClaudeSDKClient streaming mode The SDK MCP servers now properly initialize and list tools when used with the streaming client, enabling the calculator example to work. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Create fresh client session for each query to avoid tool accumulation in persistent sessions. This prevents 'Tool names must be unique' errors when using SDK MCP servers with multiple sequential queries. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Create e2e-tests/ directory with pytest-based tests - Add test_mcp_calculator.py with tests for all calculator operations - Verify actual tool execution with real API calls - Add GitHub Actions workflow for e2e tests on main branch - Fix MCP calculator example to use allowed_tools for proper permissions - Tests require ANTHROPIC_API_KEY and fail if not set (no skipping) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Rename job from e2e-test to test-examples for clarity - Remove conditional to run on all pushes/PRs, not just main branch - Update step descriptions to reflect example testing with real API 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Add matrix strategy to test on Python 3.10-3.13 - Install Claude Code CLI before running e2e tests - Verify Claude Code installation with version check - Ensures e2e tests run with actual Claude Code CLI installed 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Rename test_mcp_calculator.py to test_sdk_mcp_tools.py - Replace calculator-specific tests with generic tool execution tests - Use simple echo/greet tools to test SDK MCP functionality - Add direct prompts and strict assertions on tool execution - Consolidate workflows: delete test-examples.yml, add test-examples job to test.yml - Clean up test_debug_permissions.py helper script Tests now focus on verifying SDK MCP tools work correctly rather than testing calculator logic. This better validates the inline tool system. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Define tools per-test with execution tracking lists - Assert on whether Python functions are actually called, not just message parsing - Removes fixture and shared state between tests - Each test creates its own tools with executions.append() tracking - Much cleaner and more accurate validation of SDK MCP mechanics 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Add explicit type annotation for sdk_config to resolve incompatible types assignment error when filtering out instance field from SDK server config. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Add type annotation for servers_for_cli dict to allow different server config types. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Create test_tool_permissions.py with execution-based validation - Test allow/deny/modify behaviors with actual tool execution tracking - Verify selective permission enforcement by tool name - Follows same pattern as SDK MCP tests - assert on what matters Each test defines its own tools with execution tracking to validate that permission callbacks actually control tool execution. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Reduced test_tool_permissions.py from 4 tests to 1 test that simply verifies the can_use_tool callback gets invoked. Removed all assertions about tool execution or behavior - just checking that the callback mechanism works. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Added print statements and tool execution tracking to diagnose why the can_use_tool callback is not being invoked for SDK MCP tools. Also improved the prompt to be more explicit about using the tool. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

The CLI requires `--permission-prompt-tool stdio` to enable SDK control protocol for permission callbacks. When `can_use_tool` is provided but this flag isn't set, the CLI falls back to its built-in permission system. This fix matches the TypeScript SDK behavior by: - Automatically setting `permission_prompt_tool_name="stdio"` when `can_use_tool` callback is provided - Validating that `can_use_tool` requires streaming mode (AsyncIterable) - Ensuring `can_use_tool` and `permission_prompt_tool_name` are mutually exclusive 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

dicksontsai · 2025-09-08T15:39:01Z

src/claude_code_sdk/_internal/query.py

            # This forces us to manually route methods. When Python MCP adds Transport
            # support, we can refactor to match the TypeScript approach.
-            if method == "tools/list":
+            if method == "initialize":


Note (no action needed): The refactoring mentioned above this line is already done in #139

dicksontsai · 2025-09-08T15:40:04Z

test_debug_permissions.py

@@ -0,0 +1,55 @@
+#!/usr/bin/env python3
+"""Debug script to test tool permissions."""


Removed, can you stamp again?

ashwin-ant force-pushed the ashwin/unrevert branch 3 times, most recently from 5dac5d9 to 8dea2a5 Compare September 8, 2025 02:38

ashwin-ant requested a review from km-anthropic September 8, 2025 02:45

ashwin-ant changed the title ~~Add custom tool callbacks and e2e tests for MCP calculator~~ Add custom tool callbacks and e2e tests Sep 8, 2025

ashwin-ant and others added 25 commits September 8, 2025 08:07

Revert "Remove unstable public APIs from SDK (#151)"

bb5af77

This reverts commit e4feaf2.

rename

ca651ed

lint

85485df

tests

0a3a680

tests

e9b18e8

lint

60b309a

log

c0cf97d

log

ac0ef4b

Fix remaining mypy type error in subprocess_cli.py

fdbe188

Add type annotation for servers_for_cli dict to allow different server config types. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

tmp

5f4c0b3

wite

a470e1c

lint

1434cf8

lint

9abc015

ashwin-ant force-pushed the ashwin/unrevert branch from 0e35512 to 9abc015 Compare September 8, 2025 15:08

ashwin-ant requested a review from dicksontsai September 8, 2025 15:19

dicksontsai previously approved these changes Sep 8, 2025

View reviewed changes

rm script

bb5405d

ashwin-ant dismissed dicksontsai’s stale review via bb5405d September 8, 2025 15:43

km-anthropic approved these changes Sep 8, 2025

View reviewed changes

ashwin-ant merged commit 8393004 into main Sep 8, 2025
16 checks passed

ashwin-ant deleted the ashwin/unrevert branch September 8, 2025 15:51

arthurgousset mentioned this pull request Sep 9, 2025

[BUG] Claude Code SDK: SDK MCP server faills to connect due to closed stream anthropics/claude-code#6710

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add custom tool callbacks and e2e tests #157

Add custom tool callbacks and e2e tests #157

Uh oh!

ashwin-ant commented Sep 8, 2025

Uh oh!

dicksontsai Sep 8, 2025

Uh oh!

dicksontsai Sep 8, 2025

Uh oh!

ashwin-ant Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -0,0 +1,55 @@
		#!/usr/bin/env python3
		"""Debug script to test tool permissions."""

Add custom tool callbacks and e2e tests #157

Add custom tool callbacks and e2e tests #157

Uh oh!

Conversation

ashwin-ant commented Sep 8, 2025

Summary

Key Features Added

Changes

Custom Callbacks

E2E Testing Infrastructure

Bug Fixes

Testing

Uh oh!

dicksontsai Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

dicksontsai Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

ashwin-ant Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants