ScrapAgent – Autonomous Research Agent

Demo-ready project that searches the web, clusters content with embeddings, and writes an executive brief with sources and trend scores.

Quickstart

Python 3.10+ recommended
python -m venv .venv && source .venv/bin/activate (Windows: venv\Scripts\activate)
pip install -r requirements.txt
Copy .env.example to .env and add your API keys:
- OPENAI_API_KEY (Azure/OpenAI)
- HF_API_KEY (Hugging Face embeddings)
- GOOGLE_API_KEY and GOOGLE_CSE_ID (for Google Custom Search)
streamlit run app.py

How it works

Search: Google Custom Search API via WebSearchTool
Scrape: Optional content fetch from URLs via ScrapeUrlsTool (initiated by the model)
Embeddings: Azure OpenAI text-embedding-3-small
Clustering: HDBSCAN over normalized vectors
Summaries: GPT-5-MINI per cluster; final brief with GPT-5

Tips

Keep topics specific: e.g., "electric bikes this week" vs. "bikes"
For a reliable demo, run with 10–20 items; adjust min_cluster_size in agent/loop.py
Extend sources by adding RSS or site-specific scrapers to tools/
Expect clustering results to vary slightly between DDGS and Google CSE due to different search results

Safety & Respect

Use responsibly, comply with each site’s Terms of Service and robots.txt. For production, add caching and rate limiting.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
agent		agent
assets		assets
example		example
tools		tools
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
self_test.py		self_test.py
site_score_aggregator.py		site_score_aggregator.py
ui_cluster_viz.py		ui_cluster_viz.py
unwrap_sdk.py		unwrap_sdk.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScrapAgent – Autonomous Research Agent

Quickstart

How it works

Tips

Safety & Respect

Screenshots

Design

Home Page - Enter Your Prompt

Agent Logs - See the Process

View Research Brief

See Themes in Depth

View Charts

Cluster Graph Legend

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

ngostream/scrapagent

Folders and files

Latest commit

History

Repository files navigation

ScrapAgent – Autonomous Research Agent

Quickstart

How it works

Tips

Safety & Respect

Screenshots

Design

Home Page - Enter Your Prompt

Agent Logs - See the Process

View Research Brief

See Themes in Depth

View Charts

Cluster Graph Legend

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages