Skip to content
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
6412a5d
feat(gepa): add tool description optimization for multi-agent systems
Ju-usc Oct 10, 2025
cf0be4f
style: fix ruff formatting (trailing whitespace)
Ju-usc Oct 10, 2025
aa53fe2
style: apply ruff formatting fixes
Ju-usc Oct 10, 2025
045c6cf
feat(gepa): implement tool-specific proposer for tool descriptions
Ju-usc Oct 10, 2025
c4f2041
docs(gepa): clean up multi-agent example code
Ju-usc Oct 10, 2025
260ca80
refactor(gepa): simplify tool reflective dataset with ReAct context r…
Ju-usc Oct 11, 2025
04f7e3d
fix(gepa): unify custom proposer routing for tools
Ju-usc Oct 12, 2025
f92e184
docs(gepa): clarify tool reflection prompt
Ju-usc Oct 12, 2025
7178869
test: streamline GEPA tool optimization tests
Ju-usc Oct 12, 2025
e34703b
fix(gepa): streamline tool proposer formatting
Ju-usc Oct 12, 2025
3f05311
test(gepa): drop legacy dummy tool fixture
Ju-usc Oct 12, 2025
4df9ce5
docs(gepa): add tool-specific reflection prompt and metric example
Ju-usc Oct 12, 2025
4296ccf
docs(gepa): fix implementation details with accurate code flow
Ju-usc Oct 13, 2025
ea1204a
docs(gepa): remove backward compatibility note
Ju-usc Oct 13, 2025
48d5cd6
docs(gepa): improve usage examples with optimization visualization
Ju-usc Oct 13, 2025
548d9b6
docs(gepa): add design rationale comments for tool context sharing
Ju-usc Oct 13, 2025
e61d0a1
docs(gepa): add tool optimization links to overview and parameter docs
Ju-usc Oct 13, 2025
5c95412
docs(gepa): refine tool optimization scenarios and remove implementat…
Ju-usc Oct 13, 2025
19d7717
docs(gepa): clarify future work section in code comments
Ju-usc Oct 13, 2025
9ce5fe4
refactor(gepa): unify ReAct optimization as single module
Ju-usc Oct 24, 2025
91331d0
test(gepa): add end-to-end ReAct module optimization test
Ju-usc Oct 24, 2025
3418b59
fix(gepa): enable arg description optimization for ReAct tools
Ju-usc Oct 24, 2025
b26d39a
chore: remove legacy test_gepa_tool_optimization.py
Ju-usc Oct 24, 2025
2791b5c
fix: restore accidentally removed score mismatch warning
Ju-usc Oct 24, 2025
8e63c62
test: update fixture after arg description optimization fix
Ju-usc Oct 25, 2025
7a9d2f3
fix(test): use JSON-based hashing for cross-version fixture stability
Ju-usc Oct 25, 2025
cd0de57
refactor(gepa): rename optimize_tool_descriptions to optimize_react_c…
Ju-usc Oct 26, 2025
67bb739
docs(gepa): improve 'What is optimize_react_components?' section
Ju-usc Oct 26, 2025
b3026a7
docs(gepa): replace outdated tool-specific prompt with actual ReAct o…
Ju-usc Oct 26, 2025
4e107aa
docs(gepa): simplify 'How It Works' section with accurate routing beh…
Ju-usc Oct 26, 2025
78547e7
docs(gepa): remove outdated Implementation Details section
Ju-usc Oct 26, 2025
7fa829b
docs(gepa): replace theoretical scenarios with real user pain points
Ju-usc Oct 26, 2025
da0e7bc
docs(gepa): fix usage examples reference to match updated scenarios
Ju-usc Oct 26, 2025
e51158d
docs(gepa): update inspect section to show all 4 ReAct components wit…
Ju-usc Oct 26, 2025
776ab9b
docs(gepa): rewrite Section 8 with accurate custom proposer behavior …
Ju-usc Oct 26, 2025
ec6bb7b
fix(gepa): fix top-level ReAct module lookup and remove tool name san…
Ju-usc Oct 27, 2025
b6cc67b
refactor(gepa): unify ReAct module key handling and use constant
Ju-usc Oct 28, 2025
1206f38
test(gepa): add ReAct module detection tests for nested structures
Ju-usc Oct 28, 2025
333cbbf
test(gepa): add comprehensive ReAct detection and reconstruction tests
Ju-usc Oct 28, 2025
a50552a
test(gepa): add reflective dataset tests for multi-agent trajectory v…
Ju-usc Oct 28, 2025
965b157
test(gepa): verify tool arg descriptions propagate to args schema
Ju-usc Oct 29, 2025
5ddc6d3
fix(gepa): propagate arg_desc updates to tool.args for prompt rendering
Ju-usc Oct 29, 2025
2269de5
test(gepa): remove fixture-based test and unused dependencies
Ju-usc Oct 29, 2025
17456f0
test(gepa): remove unused fixture file
Ju-usc Oct 29, 2025
c884c18
style: fix ruff linting issues (import formatting, whitespace, bare e…
Ju-usc Oct 31, 2025
82dee25
refactor(test): rename setup_spy_for_base_program to setup_capture_fo…
Ju-usc Oct 31, 2025
ca84b9d
docs(gepa): clarify why Tool.func uses placeholder lambda in proposer
Ju-usc Oct 31, 2025
2eb8986
refactor(gepa): make all ReAct components optional with None default …
Ju-usc Oct 31, 2025
9f37ac1
docs(gepa): clarify 'LM' as 'reflection LM' in comments for precision
Ju-usc Oct 31, 2025
bd4cdac
refactor(gepa): refine reflection prompt to guide concise, focused Re…
Ju-usc Oct 31, 2025
0ad4077
docs(gepa): revise ReAct metric example to be general and extensible
Ju-usc Oct 31, 2025
ef5563e
docs(gepa): replace custom proposer example with reference to ReActMo…
Ju-usc Oct 31, 2025
1b10b65
docs(gepa): make custom proposer section more approachable and clear
Ju-usc Oct 31, 2025
675a0cd
docs(gepa): update ReAct reflection prompt to match current implement…
Ju-usc Nov 1, 2025
4a4d209
feat(gepa): warn when ReAct modules detected but optimization disabled
Ju-usc Nov 3, 2025
d84842f
test(gepa): fix DummyLM configuration and remove exception swallowing
Ju-usc Nov 9, 2025
bb28f5f
test(gepa): add failing tests for generic tool optimization
Ju-usc Nov 9, 2025
a590e46
refactor(gepa): rename optimize_react_components to enable_tool_optim…
Ju-usc Nov 9, 2025
6aceaf5
refactor(gepa): extract nested function to private method
Ju-usc Nov 9, 2025
7a5bf05
feat(gepa): detect tool-using predictors via type checking
Ju-usc Nov 9, 2025
12b01ed
test(gepa): update ReAct tests for predictor-name-based keys
Ju-usc Nov 10, 2025
265896c
test(gepa): use explicit predictor keys in tool optimization tests
Ju-usc Nov 10, 2025
fe19dac
feat(gepa): extract tools from runtime traces
Ju-usc Nov 10, 2025
38dd7cb
feat(gepa): detect tool-using predictors at compile time
Ju-usc Nov 10, 2025
7f05a73
refactor(gepa): use predictor identity for ReAct detection
Ju-usc Nov 10, 2025
0a6016d
test(gepa): refactor ReAct tests to use dynamic predictor names
Ju-usc Nov 10, 2025
a635768
refactor(gepa): generalize proposer to support both ReAct and tool mo…
Ju-usc Nov 10, 2025
e35603a
refactor(gepa): eliminate create-delete pattern in base_program build
Ju-usc Nov 10, 2025
ecb3726
refactor(gepa): eliminate ReAct coupling in build_program
Ju-usc Nov 11, 2025
d3693c9
refactor(gepa): apply code cleanup principles consistently
Ju-usc Nov 11, 2025
a086646
refactor(gepa): unify config extraction patterns
Ju-usc Nov 11, 2025
0cecb75
refactor(gepa): remove verbose logs and consolidate comments
Ju-usc Nov 11, 2025
9592c50
docs(gepa): clarify ReAct trace workaround with TODO
Ju-usc Nov 12, 2025
76d7af5
test(gepa): remove deprecated ReAct-specific tests and refactor tool …
Ju-usc Nov 13, 2025
ac66e05
feat(gepa): add assertion for ReAct two-predictor design
Ju-usc Nov 13, 2025
3ec4ada
test(gepa): add DSPy ReAct design docs and improve test consistency
Ju-usc Nov 13, 2025
b679ba2
fix(test): remove trailing whitespace and extra blank lines
Ju-usc Nov 13, 2025
02aa151
refactor(gepa): clarify tool proposer output field descriptions
Ju-usc Nov 14, 2025
d37e433
Merge branch 'main' into feature/tool-description-optimization
Ju-usc Nov 14, 2025
d8b7c66
refactor(gepa): treat args as canonical for tool arg descriptions
Ju-usc Nov 14, 2025
f62a68e
refactor(gepa): tolerate missing arg descriptions when applying tool …
Ju-usc Nov 14, 2025
e031409
refactor(gepa): use args as sole source of tool arg descriptions
Ju-usc Nov 14, 2025
a133545
test(gepa): drop arg_desc expectations from tool optimization tests
Ju-usc Nov 14, 2025
b1e4f3d
refactor(gepa): refine reflection prompts for tool optimization
Ju-usc Nov 19, 2025
7f81e88
refactor(gepa): improve tool extraction robustness and observability
Ju-usc Nov 19, 2025
f267ccc
refactor(gepa): simplify initialization logic
Ju-usc Nov 19, 2025
28ceb70
refactor(gepa): remove ReAct trace workaround
Ju-usc Nov 19, 2025
d8275ef
chore(gepa): clean up whitespace and style changes from tool optimiza…
Ju-usc Nov 19, 2025
deeb010
chore(gepa): clean up whitespace and style changes from tool optimiza…
Ju-usc Nov 19, 2025
4bcc714
chore: restore .gitignore to match main
Ju-usc Nov 19, 2025
4b872d7
docs(gepa): document tool optimization flag in overview
Ju-usc Nov 19, 2025
5129586
docs(gepa): clarify enable_tool_optimization and custom proposers
Ju-usc Nov 19, 2025
ebe4221
docs(gepa): update tool module optimization prompt to match actual code
Ju-usc Nov 20, 2025
2133b0b
docs(gepa): update How Tool Optimization Works section
Ju-usc Nov 20, 2025
9c05b6a
docs(gepa): update When to Use Tool Optimization section
Ju-usc Nov 20, 2025
ec9241b
docs(gepa): update custom proposers section for tool optimization
Ju-usc Nov 20, 2025
46d8f5e
docs(gepa): update usage examples with correct tool patterns and inte…
Ju-usc Nov 20, 2025
5d33fc6
docs(gepa): remove redundant metrics section
Ju-usc Nov 20, 2025
b564029
refactor(gepa): use absolute import for ToolModuleProposer
Ju-usc Nov 20, 2025
13209f5
docs(gepa): update tool optimization doc link
Ju-usc Nov 20, 2025
09990a6
docs(gepa): replace eval() example with get_weather tool
Ju-usc Nov 29, 2025
33fc771
fix(gepa): change ReAct detection log from warning to info
Ju-usc Dec 2, 2025
fa72fc0
refactor(gepa): extract _propose_component_texts as private method
Ju-usc Dec 2, 2025
2a15e56
refactor(gepa): TODO out generic tool module optimization, keep ReAct…
Ju-usc Dec 2, 2025
59f23e5
refactor(gepa): remove generic tool module detection, keep ReAct only
Ju-usc Dec 2, 2025
68d7021
refactor(gepa): improve naming and extract tool update methods
Ju-usc Dec 2, 2025
d99ba1d
refactor(gepa): remove unused TOOL_MODULE_PREFIX and rename to tool_c…
Ju-usc Dec 2, 2025
3fd9a0a
refactor(gepa): rename ToolModuleProposer to ToolProposer
Ju-usc Dec 2, 2025
7d64e7a
docs(gepa): update tool optimization docs for ReAct-only support
Ju-usc Dec 2, 2025
4b3ee18
refactor(gepa): unify prefix to TOOL_MODULE_PREFIX for all tool-using…
Ju-usc Dec 2, 2025
3a5fb7f
docs(gepa): remove CustomAgent example, keep ReAct only
Ju-usc Dec 2, 2025
0e75d8c
docs(gepa): update enable_tool_optimization docstring for ReAct-only …
Ju-usc Dec 2, 2025
734fbdf
test(gepa): remove generic tool tests, keep ReAct-only tests
Ju-usc Dec 2, 2025
1fb15ba
refactor(gepa): use local ToolProposer variable, update docs for ReAc…
Ju-usc Dec 2, 2025
da2f6d0
docs(gepa): update tool optimization docs for ReAct-only support
Ju-usc Dec 2, 2025
a942246
some fixes
chenmoneygithub Dec 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
406 changes: 406 additions & 0 deletions docs/docs/api/optimizers/GEPA/GEPA_Advanced.md

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions docs/docs/api/optimizers/GEPA/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,12 @@ Practical Recipe for GEPA-Friendly Feedback:
- **Multi-Objective Tasks** (e.g., PUPA): Decompose aggregate scores to reveal contributions from each objective, highlighting tradeoffs (e.g., quality vs. privacy).
- **Stacked Pipelines** (e.g., code generation: parse → compile → run → profile → evaluate): Expose stage-specific failures; natural-language traces often suffice for LLM self-correction.

## ReAct Component Optimization

GEPA can optimize ReAct modules holistically. When `optimize_react_components=True`, GEPA jointly optimizes all four components of ReAct modules: react instructions, extract instructions, tool descriptions, and tool argument descriptions. This helps agents make better decisions by learning from execution traces how all components work together.

For details on how ReAct optimization works, when to use it, and usage examples, see [ReAct Component Optimization](GEPA_Advanced.md#react-component-optimization) in the Advanced Features guide.

## Custom Instruction Proposal

For advanced customization of GEPA's instruction proposal mechanism, including custom instruction proposers and component selectors, see [Advanced Features](GEPA_Advanced.md).
Expand Down
55 changes: 54 additions & 1 deletion dspy/teleprompt/gepa/gepa.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import inspect
import json
import logging
import random
from dataclasses import dataclass
Expand All @@ -9,6 +10,7 @@
from gepa.proposer.reflective_mutation.base import ReflectionComponentSelector

from dspy.clients.lm import LM
from dspy.predict.react import ReAct
from dspy.primitives import Example, Module, Prediction
from dspy.teleprompt.gepa.gepa_utils import DspyAdapter, DSPyTrace, PredictorFeedbackFn, ScoreWithFeedback
from dspy.teleprompt.teleprompt import Teleprompter
Expand Down Expand Up @@ -273,6 +275,11 @@ def metric(
warn_on_score_mismatch: GEPA (currently) expects the metric to return the same module-level score when
called with and without the pred_name. This flag (defaults to True) determines whether a warning is
raised if a mismatch in module-level and predictor-level score is detected.
optimize_react_components: Whether to optimize ReAct module components including react
instructions, extract instructions, tool descriptions, and tool argument descriptions.
When enabled, GEPA jointly optimizes all four components of ReAct modules. See the
[ReAct Component Optimization guide](https://dspy.ai/api/optimizers/GEPA/GEPA_Advanced/#react-component-optimization)
for details on when to use this feature and how it works. Default is False.
seed: The random seed to use for reproducibility. Default is 0.
gepa_kwargs: (Optional) provide additional kwargs to be passed to [gepa.optimize](https://github.com/gepa-ai/gepa/blob/main/src/gepa/api.py) method

Expand Down Expand Up @@ -328,6 +335,7 @@ def __init__(
wandb_init_kwargs: dict[str, Any] | None = None,
track_best_outputs: bool = False,
warn_on_score_mismatch: bool = True,
optimize_react_components: bool = False,
use_mlflow: bool = False,
# Reproducibility
seed: int | None = 0,
Expand Down Expand Up @@ -390,6 +398,7 @@ def __init__(
self.wandb_api_key = wandb_api_key
self.wandb_init_kwargs = wandb_init_kwargs
self.warn_on_score_mismatch = warn_on_score_mismatch
self.optimize_react_components = optimize_react_components
self.use_mlflow = use_mlflow

if track_best_outputs:
Expand Down Expand Up @@ -518,11 +527,55 @@ def feedback_fn(
rng=rng,
reflection_lm=self.reflection_lm,
custom_instruction_proposer=self.custom_instruction_proposer,
warn_on_score_mismatch=self.warn_on_score_mismatch
warn_on_score_mismatch=self.warn_on_score_mismatch,
optimize_react_components=self.optimize_react_components,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of engineering this towards dspy.ReAct, I recommend covering tool calling in general: optimize_react_components => optimize_tools. ReAct is just one way for tool calling agent, and it's quite common for users to make customizations, and we may create other tool calling agent architectures in the near future.

)

# Instantiate GEPA with the simpler adapter-based API
base_program = {name: pred.signature.instructions for name, pred in student.named_predictors()}

if self.optimize_react_components:
for module_path, module in student.named_sub_modules():
# Only process ReAct modules
if not isinstance(module, ReAct):
continue
prefix = module_path.removeprefix("self.") if module_path != "self" else ""

# Get first predictor name as module identifier
for pred_name, _ in module.named_predictors():
comp_name = pred_name if not prefix else f"{prefix}.{pred_name}"
module_key = f"react_module:{comp_name.split('.')[0]}" if prefix else "react_module"

# Build JSON config with tool args for reflection
config = {
"react": module.react.signature.instructions,
"extract": module.extract.predict.signature.instructions,
"tools": {
tool_name: {
"desc": tool.desc,
"args": tool.args,
"arg_desc": tool.arg_desc or {}
}
for tool_name, tool in module.tools.items()
if tool_name != "finish"
}
}

# Replace predictor keys with module key and extract key to prevent duplicates
base_program.pop(comp_name, None)
extract_key = f"{prefix}.extract.predict" if prefix else "extract.predict"
base_program.pop(extract_key, None)
base_program[module_key] = json.dumps(config, indent=2)
break

# Log base_program keys for debugging
logger.info(f"Initialized base_program with {len(base_program)} components:")
for key in sorted(base_program.keys()):
if key.startswith("react_module"):
logger.info(f" {key}: <ReAct module JSON config>")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would get rid of these logs. These are internal implementation details, so users won't be able to follow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree I'll address this!

else:
logger.info(f" {key}: <instruction>")

gepa_result: GEPAResult = optimize(
seed_candidate=base_program,
trainset=trainset,
Expand Down
Loading
Loading