🕸️ Agent Dash — Instagram Multi-Crawler Data System

A production-grade Instagram data crawling platform that uses multiple concurrent crawlers with proxy-based IP rotation to collect profiles, posts, reels, and comments at scale — outputting validated, deduplicated JSON for data-sharing.

Features

Multi-Crawler Pool — configurable concurrent workers (1–10)
Proxy IP Rotation — round-robin assignment, health tracking, auto-failover
Anti-Detection — randomized delays, User-Agent rotation, session cycling
Pydantic Validation — all data strictly typed and validated
JSON Data Store — organized per-user directories with deduplication
Batch Jobs — define targets in YAML/JSON, crawl them all in parallel
Rich CLI — beautiful terminal output with tables and status panels

Quick Start

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Crawl a single account
python main.py crawl virat.kohli --limit 5 --delay 5

# Batch crawl from jobs file
python main.py batch --jobs jobs.yaml --workers 3

# Check data status
python main.py status

# Export latest data for a target
python main.py export virat.kohli

# Check proxy health
python main.py proxies --proxies proxies.txt

Commands

Command	Description
`crawl`	Crawl a single Instagram account
`batch`	Batch crawl multiple accounts from a job file
`status`	Show status of all crawled data
`export`	Print latest JSON data for a target
`proxies`	Check proxy list health and availability

Proxy Setup

Add your proxies to proxies.txt (one per line):

http://proxy1.example.com:8080
socks5://user:password@proxy2.example.com:1080
https://proxy3.example.com:3128

If no proxy file is provided, the system runs in direct mode (your own IP).

Job File Format (`jobs.yaml`)

targets:
  - username: virat.kohli
    data_types: [profile, posts, reels]
    posts_limit: 20

  - username: cristiano
    data_types: [profile, posts]
    posts_limit: 50

Output Structure

output/
├── index.json                 # Master index of all crawls
├── virat.kohli/
│   ├── profile_20260219.json
│   ├── posts_20260219.json
│   ├── reels_20260219.json
│   └── latest.json            # Full combined result
└── cristiano/
    └── ...

Configuration

All settings are configurable via environment variables (prefix AGENTDASH_):

Variable	Default	Description
`AGENTDASH_MAX_WORKERS`	`3`	Concurrent crawler workers
`AGENTDASH_DELAY_MIN`	`3.0`	Min delay between requests (s)
`AGENTDASH_DELAY_MAX`	`8.0`	Max delay between requests (s)
`AGENTDASH_PROXY_FILE`	`proxies.txt`	Proxy list file
`AGENTDASH_MAX_RETRIES`	`3`	Retries per failed request
`AGENTDASH_SESSION_ROTATE_AFTER`	`15`	Rotate session after N requests

⚠️ Disclaimer

This tool accesses publicly available data only. Use responsibly and respect Instagram's rate limits and Terms of Service.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
src		src
.gitignore		.gitignore
README.md		README.md
agentdash.db		agentdash.db
jobs.yaml		jobs.yaml
main.py		main.py
proxies.txt		proxies.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🕸️ Agent Dash — Instagram Multi-Crawler Data System

Features

Quick Start

Commands

Proxy Setup

Job File Format (`jobs.yaml`)

Output Structure

Configuration

⚠️ Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🕸️ Agent Dash — Instagram Multi-Crawler Data System

Features

Quick Start

Commands

Proxy Setup

Job File Format (jobs.yaml)

Output Structure

Configuration

⚠️ Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Job File Format (`jobs.yaml`)

Packages