Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
112 commits
Select commit Hold shift + click to select a range
b8cdbf0
decentralization + tests
cvt8 Jul 11, 2025
7553b26
adding notes from meeting
cvt8 Jul 16, 2025
6486d90
decentralization actualization
cvt8 Jul 16, 2025
3da5424
update tests
cvt8 Jul 16, 2025
5cd6c24
Fix function renaming in Tool.to_dict
cvt8 Jul 17, 2025
b9aad79
Merge pull request #1 from cvt8/codex/find-and-fix-a-bug-in-codebase
cvt8 Jul 17, 2025
04a15f7
docs: fix Hugging Face capitalization
cvt8 Jul 17, 2025
011e6c6
Replace HF_API_KEY env var with HF_TOKEN
cvt8 Jul 17, 2025
1e68969
docs: fix Hugging Face capitalization
cvt8 Jul 17, 2025
d6fb349
Merge pull request #3 from cvt8/codex/replace-hf_api_key-with-hf_token
cvt8 Jul 17, 2025
725f757
Merge pull request #4 from cvt8/w9azij-codex/update-hugging-face-key-…
cvt8 Jul 17, 2025
5fe9bb0
Add execution timeout to LocalPythonExecutor
cvt8 Jul 17, 2025
cadb6f4
Merge pull request #5 from cvt8/codex/verify-iteration-limit-in-codea…
cvt8 Jul 17, 2025
31dc574
Merge pull request #2 from cvt8/codex/update-hugging-face-key-capital…
cvt8 Jul 17, 2025
975a5bf
Revert decentralization logic from agents
cvt8 Jul 17, 2025
0e31135
Update README for communication tools
cvt8 Jul 17, 2025
7aec585
some corrections
cvt8 Jul 17, 2025
10d939f
Add message queue tools
cvt8 Jul 17, 2025
a3ca939
add decentralization tests folder
cvt8 Jul 17, 2025
789cf70
Add messaging tools and integrate
cvt8 Jul 17, 2025
29c08ac
identation correction
cvt8 Jul 17, 2025
a34aa56
Merge pull request #7 from cvt8/codex/implement-decentralization-in-t…
cvt8 Jul 17, 2025
b70784d
adding langfuse
cvt8 Jul 17, 2025
4b939a0
update scores
cvt8 Jul 17, 2025
43f15b1
adding test tools
cvt8 Jul 17, 2025
76d9dc3
update gitignore
cvt8 Jul 17, 2025
13ea63c
Fix function renaming in Tool.to_dict
cvt8 Jul 17, 2025
7f23c52
Replace HF_API_KEY env var with HF_TOKEN
cvt8 Jul 17, 2025
8589fd2
docs: fix Hugging Face capitalization
cvt8 Jul 17, 2025
a077101
Add execution timeout to LocalPythonExecutor
cvt8 Jul 17, 2025
e476ed5
add decentralization tests folder
cvt8 Jul 17, 2025
ed5d57a
identation correction
cvt8 Jul 17, 2025
7e41bf8
Add messaging tools and integrate
cvt8 Jul 17, 2025
ed71a66
adding langfuse
cvt8 Jul 17, 2025
35cb15c
update scores
cvt8 Jul 17, 2025
772ad80
adding test tools
cvt8 Jul 17, 2025
4c0a7a6
update agents
cvt8 Jul 17, 2025
b3033c8
Merge branch 'main' into codex/implement-decentralization-feature-in-…
cvt8 Jul 17, 2025
15870a7
Merge pull request #9 from cvt8/codex/implement-decentralization-feat…
cvt8 Jul 17, 2025
4d76a5b
adding langfuse
cvt8 Jul 17, 2025
f3a7fb8
test_logging
cvt8 Jul 17, 2025
c316b8f
update installation.md
cvt8 Jul 17, 2025
8c0b91c
deleted tests-decentralized
cvt8 Jul 17, 2025
538ebd2
actualize gitignore
cvt8 Jul 17, 2025
4cb8457
Merge branch 'main' into codex/implement-decentralization-feature-in-…
cvt8 Jul 17, 2025
43c3675
update repo organization
cvt8 Jul 18, 2025
e4b8b4a
adding span.end()
cvt8 Jul 18, 2025
0b466e0
Update tests to match new agent API
cvt8 Jul 18, 2025
cc9dafe
Merge branch 'main' into codex/update-tests-for-compatibility-with-cu…
cvt8 Jul 18, 2025
dd780fa
Merge pull request #10 from cvt8/codex/update-tests-for-compatibility…
cvt8 Jul 18, 2025
4b2f80f
Add _finalize_step callback invocation and defaults
cvt8 Jul 18, 2025
4f7c44f
Add _finalize_step callback invocation and defaults
cvt8 Jul 18, 2025
d636a6d
Merge pull request #11 from cvt8/codex/implement-finalize_step-method…
cvt8 Jul 18, 2025
a133724
Improve Langfuse tracing
cvt8 Jul 18, 2025
c3405fd
Add final answer validation
cvt8 Jul 18, 2025
eac8a40
Handle final answer interrupt
cvt8 Jul 18, 2025
e9b6b71
Merge branch 'main' into codex/review-lanfuse-logging-implementation
cvt8 Jul 18, 2025
c49eab9
Merge pull request #13 from cvt8/codex/review-lanfuse-logging-impleme…
cvt8 Jul 18, 2025
3a12718
Merge branch 'main' into codex/add-final-answer-validation-checks
cvt8 Jul 18, 2025
2756ac9
Merge pull request #14 from cvt8/codex/add-final-answer-validation-ch…
cvt8 Jul 18, 2025
cbda555
Merge branch 'main' into codex/update-_process_tool_call-and-run-loop
cvt8 Jul 18, 2025
cb3bbb1
Merge pull request #15 from cvt8/codex/update-_process_tool_call-and-…
cvt8 Jul 18, 2025
bb1d08b
Merge branch 'main' into syft7m-codex/implement-finalize_step-method-…
cvt8 Jul 18, 2025
07fcd77
Merge pull request #12 from cvt8/syft7m-codex/implement-finalize_step…
cvt8 Jul 18, 2025
b10f392
update documentation
cvt8 Jul 18, 2025
8d3582c
add test logging and some debugging
cvt8 Jul 18, 2025
233c117
Code debugging to run gaia benchmark on open deep research, debuging …
cvt8 Jul 24, 2025
4034cc3
Merge branch 'main' of https://github.com/huggingface/smolagents
cvt8 Jul 24, 2025
28bf657
improving errors handling
cvt8 Jul 24, 2025
f079ff7
# GAIA Benchmark Tracing Improvements
cvt8 Jul 25, 2025
a78f9ad
# Smolagents Benchmark Debugging Summary
cvt8 Jul 25, 2025
9caac98
make style corrections
cvt8 Jul 25, 2025
e381dab
Merge branch 'huggingface:main' into cvt8/Benchmarking_corrections
cvt8 Jul 25, 2025
ff5f96d
logs availibility
cvt8 Jul 25, 2025
dd49490
correction prompts.
cvt8 Jul 28, 2025
404da42
Merge branch 'huggingface:main' into cvt8/Benchmarking_corrections
cvt8 Aug 2, 2025
9451a5b
Merge branch 'cvt8/Benchmarking_corrections' of https://github.com/cv…
cvt8 Aug 5, 2025
0797059
Merge branch 'huggingface:main' into cvt8/Benchmarking_corrections
cvt8 Aug 12, 2025
b6bf1f4
Merge branch 'main' into cvt8/Benchmarking_corrections
cvt8 Aug 12, 2025
a33564f
Merge pull request #18 from cvt8/cvt8/Benchmarking_corrections
cvt8 Aug 12, 2025
80bcda3
update codebase
cvt8 Aug 18, 2025
bd09658
Merge branch 'huggingface:main' into main
cvt8 Aug 18, 2025
4bc2012
update tests and ids
cvt8 Aug 18, 2025
294ab7a
adding scripts
cvt8 Aug 18, 2025
6818b69
Merge branch 'huggingface:main' into main
cvt8 Aug 20, 2025
3caa80d
decentralized agents 0 shot.
cvt8 Aug 20, 2025
84a8128
tests update
cvt8 Aug 20, 2025
6d0f04b
running zero_shot !
cvt8 Aug 20, 2025
cf085eb
updating prompts
cvt8 Aug 21, 2025
d9d3b34
updating prompts + alternative concensus
cvt8 Aug 21, 2025
9644863
well working decentralized agent !
cvt8 Aug 22, 2025
d37585e
running tests
cvt8 Aug 22, 2025
1cfb019
runner corrections + langfuse logging.
cvt8 Aug 22, 2025
74677a4
update codebase and centralized agents.
cvt8 Aug 25, 2025
ad1d2c2
Merge branch 'huggingface:main' into main
cvt8 Aug 25, 2025
dcd4c9b
Dentraluzed tools, simpler and better code
cvt8 Aug 27, 2025
f5b2ad3
Answer format correction
cvt8 Aug 29, 2025
07d3652
Update communication and avoiding some errors.
cvt8 Sep 1, 2025
62f17cf
Code style corrections
cvt8 Sep 1, 2025
1af042e
Merge branch 'main' into main
cvt8 Sep 1, 2025
7b3044e
adding decentraliozed_agents
cvt8 Sep 1, 2025
b031ef9
centralized_agent comparison.
cvt8 Sep 1, 2025
65ab324
style corrections
cvt8 Sep 1, 2025
28162a2
improving prompt
cvt8 Sep 1, 2025
40ec034
Transform chat messages into a nice HTML
cvt8 Sep 12, 2025
8a3687c
Uploading outputs
cvt8 Sep 13, 2025
beadd38
Merge pull request #19 from huggingface/main
cvt8 Sep 12, 2025
bb2db70
delete outputs
cvt8 Oct 30, 2025
b7a66ea
Merge branch 'main' into decentralized_for_hf
cvt8 Oct 30, 2025
3b8174e
formatting improvement
cvt8 Oct 30, 2025
480353a
delete outputs
cvt8 Oct 30, 2025
9771815
delete unused file and adding a readme
cvt8 Oct 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 17 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,21 @@
logs
tmp
wandb
make_test_log.xml
runs/
runs_old/
runs_v0/
output/

#Test gaia
wb/
pdb5wb7.ent
downloads_folder/
model_performance_comparison.png
langfuse_test.py

# Data
data
outputs
data/

# Apple
Expand Down Expand Up @@ -148,8 +159,11 @@ interpreter_workspace/
# Archive
archive/
savedir/
output/
#output/
tool_output/

# Gradio runtime
.gradio/
.gradio/

#Other cache
.ruff_cache/
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ style:

# Run smolagents tests
test:
pytest ./tests/
pytest ./tests/ --junitxml=make_test_log.xml
197 changes: 104 additions & 93 deletions README.md

Large diffs are not rendered by default.

186 changes: 186 additions & 0 deletions examples/decentralized_smolagents_benchmark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
# Decentralized smolagents Benchmark

This folder contains a decentralized multi-agent system implementation for benchmarking against the smolagents benchmark dataset. The system coordinates multiple specialized agents working collaboratively to solve complex problems.

## Overview

The decentralized approach distributes problem-solving across multiple specialized agents that communicate and coordinate through a message-passing system with consensus mechanisms. This contrasts with the centralized approach where a single agent has access to all tools.

### Architecture

The system consists of:

- **4 Specialized Agents**:
- **CodeAgent**: Handles code execution and computational tasks
- **WebSearchAgent**: Performs web searches and information retrieval
- **DeepResearchAgent**: Conducts in-depth research using web browsing
- **DocumentReaderAgent**: Reads and analyzes various document formats

- **Message Store**: Central communication hub for agent coordination
- **Consensus Protocol**: Voting mechanism for final answer agreement

## Files

- **`decentralized_agent.py`**: Main entry point for running a single question through the decentralized agent team
- **`run.py`**: Benchmark runner that evaluates the decentralized system across the entire benchmark dataset
- **`run_centralized.py`**: Comparison implementation using a centralized agent approach
- **`requirements.txt`**: Python dependencies required for the project
- **`scripts/`**: Supporting modules for agents, tools, communication, and utilities

### Key Scripts

- `scripts/agents.py`: Agent definitions and team coordination logic
- `scripts/message_store.py`: Message-passing infrastructure for agent communication
- `scripts/consensus_protocol.py`: Voting mechanism for reaching consensus on answers
- `scripts/decentralized_tools.py`: Custom tools for decentralized agent communication
- `scripts/text_web_browser.py`: Text-based web browsing tools
- `scripts/text_inspector_tool.py`: Document reading and analysis tools
- `scripts/visual_qa.py`: Visual question answering capabilities
- `scripts/html_renderer.py`: HTML visualization of agent runs
- `scripts/convert_messages_to_html.py`: Convert message logs to HTML format
- `scripts/gaia_scorer.py`: Scoring utilities for GAIA benchmark format

## Installation

1. Install the required dependencies:
```bash
pip install -r requirements.txt
```

2. Set up your environment variables in a `.env` file:
```bash
# API Keys
OPENAI_API_KEY=your_openai_key #You can replace it by whatever model you want to use
ANTHROPIC_API_KEY=your_anthropic_key #You can replace it by whatever model you want to use
SERPAPI_API_KEY=your_serpapi_key # For web search functionality
LANGFUSE_PUBLIC_KEY=your_langfuse_public_key # Optional: for tracing
LANGFUSE_SECRET_KEY=your_langfuse_secret_key # Optional: for tracing
LANGFUSE_HOST=your_langfuse_host # Optional: for tracing
```

## Usage

### Running a Single Question

Use `decentralized_agent.py` to run a single question through the decentralized team:

```bash
python decentralized_agent.py \
--model-type LiteLLMModel \
--model-id gpt-4o \ #or another model
--provider openai \ #or another provider
"What is the half of the speed of a Leopard?"
```

**Arguments:**
- `--model-type`: Model type to use (e.g., `LiteLLMModel`)
- `--model-id`: Specific model identifier (e.g., `gpt-4o`, `claude-3-5-sonnet-20241022`)
- `--provider`: Model provider (e.g., `openai`, `anthropic`, `hf-inference`)
- `question`: The question to answer (positional argument)

**Output:**
- Creates a `runs/{run_id}/` directory with:
- `run.log`: JSON-formatted execution logs
- Agent interaction traces and message history

### Running the Full Benchmark

Use `run.py` to evaluate across the entire benchmark dataset:

```bash
python run.py \
--model-type LiteLLMModel \
--model-id gpt-4o \ #or another model
--provider openai \ #or another provider
--parallel-workers 4
```

**Arguments:**
- `--date`: Date string for the evaluation (default: current date)
- `--eval-dataset`: Dataset to evaluate on (default: `smolagents/benchmark-v1`)
- `--model-type`: Model type to use
- `--model-id`: Specific model identifier
- `--provider`: Model provider
- `--parallel-workers`: Number of concurrent benchmark runs (default: 4)
- `--num-examples`: Limit examples per task for testing (optional)
- `--push-answers-to-hub`: Push results to HuggingFace Hub
- `--answers-dataset`: Dataset name for answers (default: `smolagents/answers`)

**Output:**
- `output/results_{date}_{model_id}.csv`: Benchmark results
- `output/answers_{date}_{model_id}.json`: Generated answers
- Individual run directories under `runs/`

### Running the Centralized Baseline

For comparison, run the centralized agent:

```bash
python run_centralized.py \
--model-type LiteLLMModel \
--model-id gpt-4o \ #or another model
--provider openai \ #or another provider
--parallel-workers 4
```

Uses the same arguments as `run.py`.

## Features

### Decentralized Coordination

- **Message-Based Communication**: Agents communicate through a shared message store
- **Consensus Protocol**: Multiple agents must agree on the final answer through voting
- **Specialized Roles**: Each agent has specific capabilities and responsibilities
- **Parallel Execution**: Agents can work concurrently on different aspects of the problem

### Monitoring & Observability

- **Langfuse Integration**: Optional tracing and monitoring of agent interactions
- **JSON Logging**: Structured logs for debugging and analysis
- **HTML Visualization**: Convert message logs to interactive HTML reports
- **Run Tracking**: Unique run IDs for tracking individual executions

### Tool Capabilities

The agents have access to various tools:
- Python code execution
- Google search
- Web browsing (text-based)
- Document reading (PDF, DOCX, PPTX, etc.)
- Visual question answering
- File downloads
- Archive searching

## Project Structure

```
decentralized_smolagents_benchmark/
├── decentralized_agent.py # Single question entry point
├── run.py # Benchmark runner (decentralized)
├── run_centralized.py # Benchmark runner (centralized baseline)
├── requirements.txt # Dependencies
├── scripts/ # Supporting modules
│ ├── agents.py # Agent definitions
│ ├── message_store.py # Communication infrastructure
│ ├── consensus_protocol.py # Voting mechanism
│ ├── decentralized_tools.py # Communication tools
│ ├── text_web_browser.py # Web browsing tools
│ ├── text_inspector_tool.py # Document tools
│ ├── visual_qa.py # Visual QA
│ ├── html_renderer.py # HTML visualization
│ └── ... # Other utilities
├── runs/ # Individual run outputs (created at runtime)
└── output/ # Benchmark results (created at runtime)
```

## Contributing

When contributing to this project, please follow the guidelines in the root-level `AGENTS.md`:
- Follow OOP principles
- Be Pythonic: follow Python best practices and idiomatic patterns
- Write unit tests for new functionality

## License

This project is part of the smolagents repository. Please refer to the root LICENSE file for licensing information.
Loading