An AI-powered agent that automatically discovers, downloads, and organizes web content based on your criteria. Built with browser-use and LangChain.
- Flexible Content Discovery: Configure the agent to search any website for any type of content
- Smart Filtering: Uses GPT-4 to evaluate and select content based on your criteria
- Template System: Pre-built templates for common use cases (research papers, news articles, etc.)
- Visual History: Creates GIF recordings of browsing sessions for transparency
- Customizable Output: Organize and format downloaded content your way
- Python 3.11 or higher
- OpenAI API key
- Git (for cloning)
- Clone the repository:
git clone https://github.com/chronometer/web-content-agent.git
cd web-content-agent
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
playwright install
- Configure environment:
cp .env.example .env
# Edit .env and add your OpenAI API key
- Using templates:
python src/content_agent.py --template research --config config/config.yaml
- Custom task:
python src/content_agent.py --task "Find articles about AI" --urls "news.com,blog.com"
agent:
name: "Web Content Agent"
model: "gpt-4o"
timeout: 3600
output:
directory: "downloads"
naming_pattern: "{date}_{title}"
Templates are YAML files in config/templates/
that define specific search patterns:
- Research Papers (
research.yaml
):
task: """
Search for research papers about:
{topics}
From: {urls}
"""
parameters:
topics:
- "machine learning"
- "AI"
urls:
- arxiv.org
- scholar.google.com
- News Articles (
news.yaml
):
task: """
Find news articles about:
{topics}
Published between:
{date_from} and {date_to}
"""
parameters:
date_from: "2024-01-01"
date_to: "2024-12-31"
The agent produces:
- Downloaded content in
downloads/
- Summary reports with metadata
- Visual browsing history (agent_history.gif)
- Create a new YAML file in
config/templates/
:
name: "Custom Template"
description: "Your template description"
task: """
Your task description with
{parameter_placeholders}
"""
parameters:
your_parameter:
type: string
description: "Parameter description"
default: "Default value"
- Use the template:
python src/content_agent.py --template your_template
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- browser-use for the browser automation framework
- LangChain for the LLM integration
- All contributors and users of this project
This tool is for research purposes only. Please respect websites' terms of service and robots.txt when using this tool. Some websites may require authentication or have specific terms for automated access.