Web Content Agent 🤖

An AI-powered agent that automatically discovers, downloads, and organizes web content based on your criteria. Built with browser-use and LangChain.

🌟 Features

Flexible Content Discovery: Configure the agent to search any website for any type of content
Smart Filtering: Uses GPT-4 to evaluate and select content based on your criteria
Template System: Pre-built templates for common use cases (research papers, news articles, etc.)
Visual History: Creates GIF recordings of browsing sessions for transparency
Customizable Output: Organize and format downloaded content your way

🚀 Quick Start

Prerequisites

Python 3.11 or higher
OpenAI API key
Git (for cloning)

Installation

Clone the repository:

git clone https://github.com/chronometer/web-content-agent.git
cd web-content-agent

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt
playwright install

Configure environment:

cp .env.example .env
# Edit .env and add your OpenAI API key

Usage

Using templates:

python src/content_agent.py --template research --config config/config.yaml

Custom task:

python src/content_agent.py --task "Find articles about AI" --urls "news.com,blog.com"

🎯 Configuration

Main Configuration (config/config.yaml)

agent:
  name: "Web Content Agent"
  model: "gpt-4o"
  timeout: 3600

output:
  directory: "downloads"
  naming_pattern: "{date}_{title}"

Task Templates

Templates are YAML files in config/templates/ that define specific search patterns:

Research Papers (research.yaml):

task: """
Search for research papers about:
{topics}
From: {urls}
"""
parameters:
  topics:
    - "machine learning"
    - "AI"
  urls:
    - arxiv.org
    - scholar.google.com

News Articles (news.yaml):

task: """
Find news articles about:
{topics}
Published between:
{date_from} and {date_to}
"""
parameters:
  date_from: "2024-01-01"
  date_to: "2024-12-31"

📝 Output

The agent produces:

Downloaded content in downloads/
Summary reports with metadata
Visual browsing history (agent_history.gif)

🛠️ Creating Custom Templates

Create a new YAML file in config/templates/:

name: "Custom Template"
description: "Your template description"
task: """
Your task description with
{parameter_placeholders}
"""
parameters:
  your_parameter:
    type: string
    description: "Parameter description"
    default: "Default value"

Use the template:

python src/content_agent.py --template your_template

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

browser-use for the browser automation framework
LangChain for the LLM integration
All contributors and users of this project

⚠️ Disclaimer

This tool is for research purposes only. Please respect websites' terms of service and robots.txt when using this tool. Some websites may require authentication or have specific terms for automated access.

Name	Name	Last commit message	Last commit date
Latest commit chronometer Initial commit: Web Content Agent Jan 25, 2025 60da3c2 · Jan 25, 2025 History 1 Commit
config	config	Initial commit: Web Content Agent	Jan 25, 2025
src	src	Initial commit: Web Content Agent	Jan 25, 2025
.env.example	.env.example	Initial commit: Web Content Agent	Jan 25, 2025
.gitignore	.gitignore	Initial commit: Web Content Agent	Jan 25, 2025
LICENSE	LICENSE	Initial commit: Web Content Agent	Jan 25, 2025
README.md	README.md	Initial commit: Web Content Agent	Jan 25, 2025
requirements.txt	requirements.txt	Initial commit: Web Content Agent	Jan 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Content Agent 🤖

🌟 Features

🚀 Quick Start

Prerequisites

Installation

Usage

🎯 Configuration

Main Configuration (config/config.yaml)

Task Templates

📝 Output

🛠️ Creating Custom Templates

🤝 Contributing

📄 License

🙏 Acknowledgments

⚠️ Disclaimer

About

Releases

Packages

Languages

License

chronometer/web-content-agent

Folders and files

Latest commit

History

Repository files navigation

Web Content Agent 🤖

🌟 Features

🚀 Quick Start

Prerequisites

Installation

Usage

🎯 Configuration

Main Configuration (config/config.yaml)

Task Templates

📝 Output

🛠️ Creating Custom Templates

🤝 Contributing

📄 License

🙏 Acknowledgments

⚠️ Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages