Skip to content

Conversation

@ashwin-ant
Copy link
Collaborator

Summary

This PR adds support for custom tool callbacks and comprehensive e2e testing for MCP calculator functionality.

Key Features Added

  • Custom tool permission callbacks - Allow dynamic tool permission control via can_use_tool callback
  • E2E test suite - Real Claude API tests validating MCP tool execution end-to-end
  • Fixed MCP calculator example - Now properly uses allowed_tools for permission management

Changes

Custom Callbacks

  • Added ToolPermissionContext and PermissionResult types for tool permission handling
  • Implemented can_use_tool callback support in SDK client
  • Added comprehensive tests in tests/test_tool_callbacks.py

E2E Testing Infrastructure

  • Created e2e-tests/ directory with pytest-based test suite
  • test_mcp_calculator.py - Tests all calculator operations with real API calls
  • conftest.py - Pytest config with mandatory API key validation
  • GitHub Actions workflow for automated e2e testing on main branch
  • Comprehensive documentation in e2e-tests/README.md

Bug Fixes

  • Fixed MCP calculator example to use allowed_tools instead of incorrect permission_mode
  • Resolved tool permission issues preventing MCP tools from executing

Testing

E2E tests require ANTHROPIC_API_KEY environment variable and will fail without it.

Run locally:

export ANTHROPIC_API_KEY=your-key
python -m pytest e2e-tests/ -v -m e2e

Run unit tests including callback tests:

python -m pytest tests/test_tool_callbacks.py -v

🤖 Generated with Claude Code

@ashwin-ant ashwin-ant force-pushed the ashwin/unrevert branch 3 times, most recently from 5dac5d9 to 8dea2a5 Compare September 8, 2025 02:38
@ashwin-ant ashwin-ant changed the title Add custom tool callbacks and e2e tests for MCP calculator Add custom tool callbacks and e2e tests Sep 8, 2025
ashwin-ant and others added 25 commits September 8, 2025 08:07
- Pass SDK MCP servers to CLI with instance field stripped
- Add initialize method handler for MCP protocol
- Handle notifications/initialized message
- Fix tools/list response to handle both dict and Pydantic schemas
- Wrap MCP responses in mcp_response field for control protocol
- Update calculator example to use ClaudeSDKClient streaming mode

The SDK MCP servers now properly initialize and list tools when used
with the streaming client, enabling the calculator example to work.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Create fresh client session for each query to avoid tool accumulation
in persistent sessions. This prevents 'Tool names must be unique' errors
when using SDK MCP servers with multiple sequential queries.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Create e2e-tests/ directory with pytest-based tests
- Add test_mcp_calculator.py with tests for all calculator operations
- Verify actual tool execution with real API calls
- Add GitHub Actions workflow for e2e tests on main branch
- Fix MCP calculator example to use allowed_tools for proper permissions
- Tests require ANTHROPIC_API_KEY and fail if not set (no skipping)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Rename job from e2e-test to test-examples for clarity
- Remove conditional to run on all pushes/PRs, not just main branch
- Update step descriptions to reflect example testing with real API

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Add matrix strategy to test on Python 3.10-3.13
- Install Claude Code CLI before running e2e tests
- Verify Claude Code installation with version check
- Ensures e2e tests run with actual Claude Code CLI installed

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Rename test_mcp_calculator.py to test_sdk_mcp_tools.py
- Replace calculator-specific tests with generic tool execution tests
- Use simple echo/greet tools to test SDK MCP functionality
- Add direct prompts and strict assertions on tool execution
- Consolidate workflows: delete test-examples.yml, add test-examples job to test.yml
- Clean up test_debug_permissions.py helper script

Tests now focus on verifying SDK MCP tools work correctly rather than
testing calculator logic. This better validates the inline tool system.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Define tools per-test with execution tracking lists
- Assert on whether Python functions are actually called, not just message parsing
- Removes fixture and shared state between tests
- Each test creates its own tools with executions.append() tracking
- Much cleaner and more accurate validation of SDK MCP mechanics

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Add explicit type annotation for sdk_config to resolve incompatible types
assignment error when filtering out instance field from SDK server config.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Add type annotation for servers_for_cli dict to allow different server config types.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Create test_tool_permissions.py with execution-based validation
- Test allow/deny/modify behaviors with actual tool execution tracking
- Verify selective permission enforcement by tool name
- Follows same pattern as SDK MCP tests - assert on what matters

Each test defines its own tools with execution tracking to validate
that permission callbacks actually control tool execution.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Reduced test_tool_permissions.py from 4 tests to 1 test that simply
verifies the can_use_tool callback gets invoked. Removed all assertions
about tool execution or behavior - just checking that the callback
mechanism works.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Added print statements and tool execution tracking to diagnose why the
can_use_tool callback is not being invoked for SDK MCP tools. Also
improved the prompt to be more explicit about using the tool.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
The CLI requires `--permission-prompt-tool stdio` to enable SDK control
protocol for permission callbacks. When `can_use_tool` is provided but
this flag isn't set, the CLI falls back to its built-in permission system.

This fix matches the TypeScript SDK behavior by:
- Automatically setting `permission_prompt_tool_name="stdio"` when
  `can_use_tool` callback is provided
- Validating that `can_use_tool` requires streaming mode (AsyncIterable)
- Ensuring `can_use_tool` and `permission_prompt_tool_name` are mutually
  exclusive

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
dicksontsai
dicksontsai previously approved these changes Sep 8, 2025
# This forces us to manually route methods. When Python MCP adds Transport
# support, we can refactor to match the TypeScript approach.
if method == "tools/list":
if method == "initialize":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note (no action needed): The refactoring mentioned above this line is already done in #139

@@ -0,0 +1,55 @@
#!/usr/bin/env python3
"""Debug script to test tool permissions."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, can you stamp again?

@ashwin-ant ashwin-ant merged commit 8393004 into main Sep 8, 2025
16 checks passed
@ashwin-ant ashwin-ant deleted the ashwin/unrevert branch September 8, 2025 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants