Skip to content

Conversation

@Ju-usc
Copy link

@Ju-usc Ju-usc commented Oct 10, 2025

Description

Adds tool description optimization capability to GEPA optimizer for multi-agent systems.

When optimize_tool_descriptions=True, GEPA now:

  1. Extracts tool descriptions from all nested modules (via named_sub_modules())
  2. Includes them in the optimization process alongside signature instructions
  3. Returns optimized system with improved tool descriptions

This enables holistic optimization where both agent reasoning (signatures) and tool usage (descriptions) are improved based on end-to-end execution traces.

Issue

Closes #8706

Changes

  • Add optimize_tool_descriptions parameter to GEPA (default False)
  • Extract tool descriptions using named_sub_modules() traversal in compile()
  • Apply optimized descriptions in DspyAdapter.build_program()
  • Add 4 comprehensive tests covering single-agent and multi-agent scenarios

Usage Example

import dspy

# Create multi-agent system
class ResearchAssistant(dspy.Module):
    def __init__(self):
        super().__init__()
        search_tool = dspy.Tool(search_fn, name="search", desc="Searches web")
        self.researcher = dspy.ReAct("query -> findings", tools=[search_tool])
        
        def delegate_research(query):
            return self.researcher(query=query).findings
        
        research_tool = dspy.Tool(delegate_research, name="research", desc="Research things")
        calc_tool = dspy.Tool(calc_fn, name="calculator", desc="Does math")
        self.assistant = dspy.ReAct("question -> answer", tools=[research_tool, calc_tool])

# Enable tool optimization
optimizer = dspy.GEPA(
    metric=my_metric,
    reflection_lm=lm,
    auto="light",
    optimize_tool_descriptions=True,  # ← Enable tool optimization
)

# Optimizes ALL tools (calculator, research, search) holistically
optimized = optimizer.compile(ResearchAssistant(), trainset=train, valset=val)

Backward Compatibility

✅ Fully backward compatible - default optimize_tool_descriptions=False

Tests

  • All 16 tests pass (4 new + 12 existing GEPA tests)
  • Tests cover: adapter functionality, single-agent, multi-agent nested discovery, end-to-end optimization

Ju-usc added 3 commits October 9, 2025 20:07
- Add optimize_tool_descriptions parameter (default False) to GEPA
- Extract tool descriptions from all nested modules via named_sub_modules()
- Apply optimized descriptions in DspyAdapter.build_program()
- Enables holistic optimization of tools across main and subagent modules
- Tests: 4 new tests, all 16 pass (4 new + 12 existing)
@Ju-usc
Copy link
Author

Ju-usc commented Oct 10, 2025

Apologies for accidentally closing #8927

Thank you for the thorough review, @LakshyAAAgrawal! I'll address your feedback:

  1. Since tools are categorically different from prompts, they should use a different reflection meta prompt. The default reflection meta prompt is shown here https://dspy.ai/api/optimizers/GEPA/GEPA_Advanced/#default-implementation, whereas I assume that the tool must use somewhat different meta prompt. Can you implement a propose_new_texts method that mimics the default_proposer shown in the link above for all prompts, but calls to a tool description specific prompt/signature for tool evolution.
  2. Can you also add some description to the documentation, explaining that this feature is beneficial for React agents.
  3. (This is not a requirement to merge the PR) Would it be possible to add a simple and short tutorial demonstrating the use and performance improvement via tool evolution?

I'll start working on items 1 and 2 and update the PR soon. Please let me know if you have any specific preferences for the tutorial format!

@LakshyAAAgrawal
Copy link
Collaborator

Thanks a lot! For the tutorial, I think you can follow the current GEPA tutorial format (load a dataset, show an example from the dataset, build a dspy program, evaluate the baseline program on testset, run GEPA with new optimization settings, show the optimized programs' prompts and tool descriptions, and finally evaluate the optimized program).

Hopefully we should be able to see a nice and large gain on agentic tasks with this amazing contribution by you!

- Add ToolProposer with GenerateImprovedToolDescription signature
- Implement routing logic to separate tools from signatures
- Tools use ToolProposer, signatures use custom or parent default
- Backward compatible: preserves existing custom_instruction_proposer behavior
- Add test verifying routing splits components correctly
- Define tool functions outside class for clarity
- Match structure of simple ReAct example
- Add clear comments explaining architecture
- Make code more readable and maintainable
@Ju-usc Ju-usc force-pushed the feature/tool-description-optimization branch from 197f077 to c4f2041 Compare October 10, 2025 09:38
@Ju-usc
Copy link
Author

Ju-usc commented Oct 10, 2025

Hi @LakshyAAAgrawal,

I've implemented the tool-specific proposer as requested! Here's what's included:

1. Tool-Specific Proposer Implementation

  • Added GenerateImprovedToolDescriptionFromFeedback signature with a specialized reflection prompt
  • Implemented ToolProposer and SingleComponentToolProposer following the MultiModalInstructionProposer pattern
  • Routing logic in DspyAdapter that directs tools to ToolProposer and signatures to custom/default proposers
  • Fully backward compatible with existing custom instruction proposers

2. Documentation

  • Added comprehensive section to GEPA_Advanced.md
  • Explains when to use tool optimization (ReAct agents, multi-agent systems)
  • Includes usage examples for both simple and nested agent architectures
  • Documents how to inspect optimized tool descriptions

Reflection Prompt Design:
The tool-specific prompt is intentionally open-ended to avoid prescriptive patterns that might lead to local minima. It asks the LM to identify patterns in successful/unsuccessful tool usage and extract domain-specific information, without suggesting specific heuristics.

Before I create a short tutorial (item #3), would you have any feedback on:

  • The reflection prompt design - is it general enough? Any improvements you'd suggest?
  • The implementation approach - does the routing logic make sense?
  • The documentation - anything unclear or missing?

Any feedback would be helpful before I invest time in the tutorial. Thank you!

@Ju-usc
Copy link
Author

Ju-usc commented Oct 11, 2025

wait there is a bug in the implementation working on it to fix. Also test has to be fixed.

…euse

Tools now copy ReAct's reflective data with tool-specific annotation
instead of complex trajectory extraction. This 15-line approach reuses
ReAct's existing context (thoughts, tool calls, observations) and adds
focused annotation for each tool.

Implementation:
- Tools receive full ReAct reflective examples (same trajectory context)
- Feedback prefixed: [Optimizing tool: 'X'] for focused optimization
- Reflection LM sees complete multi-step execution traces per tool

Benefits:
- Simpler: 15 lines vs 70+ line extraction approach
- Reuses code: No duplicate trajectory formatting logic
- Same context: Tools see full ReAct execution traces
- Clean: Removed all debug output

Tests:
- 4 focused tests following GEPA patterns (removed 1 redundant)
- 226KB fixture with 34 LM + 6 reflection calls
- All tests passing with gpt-5-nano traces

Documentation:
- Updated GEPA_Advanced.md with implementation details
- Explains reflective dataset construction approach

The `optimize_tool_descriptions` parameter enables GEPA to optimize tool descriptions in addition to signature instructions. This is particularly valuable for ReAct agents and other tool-using systems, where the quality of tool descriptions directly impacts the agent's ability to select appropriate tools for each task.

Unlike signature instructions that guide reasoning strategies, tool descriptions serve a fundamentally different purpose: they help agents decide **which tool to use** in a given situation. GEPA recognizes this categorical difference and applies a specialized reflection prompt tailored for tool selection decisions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which tool to use, when to use it, and how to use it. All three are captured by the description.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid the word "fundamentally". One can imagine that all of tool descriptions can (and many times do) simply included in the system prompt itself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a corresponding entry in GEPA Overview, that links to this file/section.


Consider enabling `optimize_tool_descriptions=True` when:

- **Building ReAct agents**: ReAct agents rely on tool descriptions to make action selection decisions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One should consider using this, when they use dspy.Tool anywhere in the DSPy program. Here are a few scenarios for using dspy.Tool:

)
```

**Note:** Tool optimization is fully backward compatible. Existing programs without tools, or with `optimize_tool_descriptions=False`, continue to work exactly as before.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to inform users about backward compatibility here. It should be implicit that there should be no behaviour changes for any program not containing dspy.Tool.

raised if a mismatch in module-level and predictor-level score is detected.
optimize_tool_descriptions: Whether to optimize tool descriptions for modules with tools
(e.g., ReAct agents). When enabled, tool descriptions are included in the optimization
process alongside signature instructions. Default is False.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a link to GEPA Advanced/Tool section

)

self.propose_new_texts = custom_propose_new_texts
elif self.optimize_tool_descriptions:
Copy link
Collaborator

@LakshyAAAgrawal LakshyAAAgrawal Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge case: What should happen when user tries to provide both a custom proposer, and enables optimize_tool_descriptions

# Handle signature components - replicate proposer's default behavior
sig_texts = {}
if sig_components:
from gepa.strategies.instruction_proposal import InstructionProposalSignature
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slight deviation from this PR, but would be a large enhancement (feel free to ignore):

  1. Create 2 fields, self.instruction_proposal_signature and self.tool_proposer, which are initialized to the default InstructionProposalSignature and ToolProposerSignature.
  2. Take an argument from dspy.GEPA that can override the default signature values.

# Second pass: Process tools by copying ReAct data with annotation
react_module_name = None
for name in ret_d.keys():
if "react" in name.lower():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this robust? Might it be better to use isinstance or some other way?

Your task is to write a better description for this tool.
Read the examples carefully and identify patterns in when the tool was used successfully versus when it was misused or overlooked. Identify any domain-specific information about the tool's capabilities or appropriate usage that may not be available to the assistant in the future. The assistant may have developed effective patterns for tool selection - if so, ensure the tool description supports those patterns.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tool use. Also suggest identifying any failure modes of the tool?

@LakshyAAAgrawal
Copy link
Collaborator

Dear @Ju-usc,

This is a great PR. Thanks a lot! I have tried to be overly critical and made too many nits. Feel free to ignore if you disagree with something. Let me know if you'd like me to address anything!

Regarding the meta prompt, overall I think it looks great. However, I suggest that as you build the tutorial, you may find that the reflection prompt needs tweaking, or the content exposed in reflective_dataset for the tool may be lacking or need improvement. This is going to be an empirical exercise, which will guide what works in the reflection meta prompts. ! Looking forward to the tutorial on this too!

You may already have thoughts about what you'd like to show in the tutorial, but if not, you may consider building off (https://kargarisaac.medium.com/building-and-optimizing-multi-agent-rag-systems-with-dspy-and-gepa-2b88b5838ce2) by @kargarisaac.

- Add GenerateImprovedToolDescriptionFromFeedback signature documentation
- Include tool-aware metric example showing trajectory access
- Document tool prefix annotation in feedback
- Note component_selector applies to both signatures and tools
- Fix 'fundamentally' language per reviewer feedback
- Separate Pass 1 (predictor examples) and Pass 2 (tool aggregation)
- Clarify Generated Outputs includes full trajectory for ReAct
- Fix feedback annotation format to [Tool 'name' from 'predictor_key']
- Add Component Identification & Proposer Routing section
- Explain dual-proposer independence (custom proposer doesn't affect tool proposer)
- Use consistent terminology: 'predictor' and 'signature instructions'
Adds comprehensive test proving GEPA can optimize ReAct modules end-to-end:
- Baseline with minimal tool descriptions achieves 0% accuracy
- After optimization, achieves 100% accuracy
- Tests unified ReAct architecture (react + extract + tools as one module)

Key features:
- Uses stable SHA256 hashing for deterministic fixture replay
- Avoids Python's PYTHONHASHSEED randomization issues
- 189KB fixture with security check passed (no API keys/tokens)
- Verifies all components are optimized (react, extract, tool descriptions)
This test file was for the old architecture where tools were optimized
separately from ReAct modules. With the unified ReAct optimization approach,
this test is replaced by test_gepa_react_optimization.py which tests the
new architecture where ReAct modules (react + extract + tools) are optimized
as a single unified component.
@Ju-usc
Copy link
Author

Ju-usc commented Oct 24, 2025

still working on nested agent test, doc refine, and tutorial

Regenerates fixture to match commit 3418b59 which changed how
tool arg descriptions are optimized. Reduces LM calls from 26→22
by improving the optimization process efficiency.
@Ju-usc Ju-usc force-pushed the feature/tool-description-optimization branch from f849f10 to 8e63c62 Compare October 25, 2025 09:05
Ju-usc added 10 commits October 25, 2025 16:37
- Replace repr()-based hashing with json.dumps(sort_keys=True)
- Fixes CI failures caused by Python version differences (3.12.9 vs 3.12.11)
- repr() formatting can differ between Python micro versions
- JSON spec is standardized and stable across all versions
- Regenerate fixture with new hashing approach
…omponents

- Rename parameter to better reflect that we optimize all ReAct components
- Components include: react instructions, extract instructions, tool descriptions, and tool argument descriptions
- Update all code references, tests, and documentation
- No functional changes, pure rename for clarity
- Clarify that specialized optimization applies only to dspy.ReAct modules
- Explain ReAct module structure (react predictor, extract predictor, tools)
- List all 4 optimizable components with clear descriptions
- Specify react instruction always optimized, others optional based on failures
- Simplify language: 'contradict' vs 'work together' instead of complex terms
- Add link to ReAct documentation for deeper dive
…ptimization prompt

- Rename section: 'Tool-Specific Reflection Prompt' → 'ReAct Optimization Prompt'
- Replace GenerateImprovedToolDescriptionFromFeedback (doesn't exist) with GenerateImprovedReActDescriptionsFromFeedback (actual implementation)
- Show that prompt receives ALL components (react, extract, tools) and optimizes jointly
- Update metric example: tool_feedback_metric → react_metric for clarity
- Remove outdated notes about tool-specific prefixes and component_selector behavior
- Clarify that tool descriptions/args are added dynamically via signature.append()
…for ReAct

- Clarify custom proposer receives ALL components (regular + ReAct)
- Add realistic signature with ReAct failure patterns and component types
- Use exact naming from implementation: examples_with_feedback, component_reflective_data, propose_instruction
- Show _format_examples() helper matching real markdown formatting
- Remove regular component handling to keep example focused on ReAct
- Test code example validates successfully
- Fix contradiction: optimize_react_components must be True (not irrelevant)

docs(gepa): clarify custom proposer behavior in routing section

Change 'overrides the default routing' to 'receives all components and handles the optimization logic' to avoid confusion with optimize_react_components which still controls discovery/serialization

docs(gepa): remove discouraging recommendation from custom proposer section

Users reading this section want to learn how to implement custom proposers for ReAct - don't discourage them from doing so
@Ju-usc Ju-usc force-pushed the feature/tool-description-optimization branch from e587a0b to 776ab9b Compare October 26, 2025 03:08

## ReAct Component Optimization

### What is optimize_react_components?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does dspy.Tool not work with any other components than dspy.React?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose what you have built is truly general. We should be easily able to extend this to issues like #8962 (not suggesting we need to do it right here in this PR, but just trying to future proof!)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that the right direction is to make the system extendable to other component types beyond dspy.ReAct and to handle cases like #8962.

GEPA should automatically detect all modules in a program and optimize them based on how each module is structured for example:

  • a ReAct module → predictor, extractor, and tools
  • other modules → their own configurable predictors or tool sets.

Conceptually, we can reduce everything to two fundamental abstractions that cover most applications:

  1. Instruction — directive text that tells the model what to do (system prompt)
    example: signature docstrings (signature.instruction)

  2. Description — descriptive text that tells the model what something is (data model metadata)
    example: signature field descriptions, tool descriptions, argument descriptions

Do instruction and description seem like the right foundational abstractions for generalizing optimization across modules (beyond dspy.ReAct)?

desc="Improved Extract module instruction",
default=""
)
# Note: Tool descriptions and arg descriptions are added dynamically via signature.append()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has your experience been that generating improved version of all components (2 prompts + N tool descriptions) at once has been good?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some experiments- here's the gist with baseline/optimized prompts:
https://gist.github.com/Ju-usc/0298732be01123486c831d8f210abd36

Used nested ReAct (lead agent → subagent → web search) with 10 BrowseComp examples. GEPA updated both modules + tool descriptions together and the optimized prompts look pretty interesting although it didn't improve much due to simple setup.

Would love to hear your thoughts or feedback!

…itization

- Fix ReAct module lookup to handle top-level modules correctly
  Previously failed to match 'self' path for top-level ReAct instances

- Remove tool name sanitization entirely
  Tool names are now used as-is in dynamic signatures
  Removed _sanitize_name() function and all calls to it
  Simplifies code and avoids surprising behavior

- Skip failing test_gepa_react_optimization
  Hash-based fixtures are fragile across Python versions

- Add debug logging to trace processing for troubleshooting
- Replace all magic string 'react_module' with REACT_MODULE_PREFIX constant
- Unify path normalization pattern across gepa.py and gepa_utils.py
- Rename 'prefix' to 'normalized_path' for clarity
- Simplify module lookup by using consistent normalization
- Remove awkward OR clause in ReAct module matching logic

This makes the codebase more maintainable with a single source of truth
for the module prefix and consistent naming throughout.
- Add 3 comprehensive detection tests: single ReAct, mixed workflow (2 ReAct + ChainOfThought), orchestrator with 2 workers
- Tests validate full path preservation (bug fix validation)
- Uses monkey patching to capture base_program from gepa.optimize
- Helper functions for DRY: setup spy, create optimizer, assert detection
- Validates all ReAct components: react, extract, tools, tool metadata
Detection tests (3):
- test_single_react_module_detection: top-level ReAct module
- test_multi_react_workflow_detection: mixed ReAct + ChainOfThought (bug fix validation)
- test_nested_react_orchestrator_worker_detection: orchestrator with 2 workers as tools

Reconstruction tests (3):
- test_build_program_single_react: single ReAct module
- test_build_program_multi_react_workflow: mixed workflow with ReAct + non-ReAct
- test_build_program_orchestrator_with_workers: complex nested structure

Helper functions (12):
- setup_spy_for_base_program: captures base_program from gepa.optimize
- simple_metric_for_detection/reconstruction: test metrics
- create_gepa_optimizer_for_detection: creates optimizer
- assert_react_module_detected/updated: validates ReAct modules
- assert_regular_module_detected/updated: validates non-ReAct modules
- mock_optimized_react_module: mocks optimized candidate
- create_*_program: 3 reusable program builders

Validates:
- Full path preservation (bug fix)
- All 4 ReAct components (react, extract, tools, arg_desc)
- Non-ReAct module handling
- Deepcopy verification (original unchanged)
- Both detection and reconstruction phases
…alidation

Adds 2 new tests validating make_reflective_dataset captures complete trajectories:
- test_make_reflective_dataset_single_react: Single ReAct module
- test_make_reflective_dataset_orchestrator_with_workers: Multi-agent system (3 modules)

New helpers:
- simple_feedback: Reusable feedback function (consolidates 5 duplicates)
- assert_reflective_example_has_trajectory: Validates trajectory completeness

Tests validate:
- Complete trajectory capture (all iterations with thoughts/tools/observations)
- No duplicate/missing iterations
- Full path preservation in multi-agent systems
- Each module's trajectory captured separately

Improvements:
- Clean up docstrings and remove redundant comments
- Fix whitespace linter warnings (9 auto-fixed)
- Reduce from 1054 to 975 lines

All 8 tests passing (6 detection/reconstruction + 2 new reflective dataset)
@Ju-usc Ju-usc force-pushed the feature/tool-description-optimization branch from 6ea156e to a50552a Compare October 28, 2025 04:52
- Update assert_react_module_updated to check tool.args['param']['description']
- Add arg_desc to test cases for comprehensive validation
- Expose bug: GEPA updates arg_desc but not tool.args (what renders in prompts)
tool.arg_desc is only used during Tool.__init__; updating it after creation
has no effect on prompts. Only tool.args is rendered, so GEPA must update
args for optimized descriptions to appear in prompts.

Fixes the bug where reflection LM improves tool parameter descriptions but
they don't show in actual prompts because arg_desc changes weren't propagated
to the args schema.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Allow GEPA to update tool descriptions and tool error responses

2 participants