🕸️ Agent Dash — Instagram Multi-Crawler Data System

A production-grade Instagram data crawling platform that uses multiple concurrent crawlers with proxy-based IP rotation to collect profiles, posts, reels, and comments at scale — outputting validated, deduplicated JSON for data-sharing.

Features

Multi-Crawler Pool — configurable concurrent workers (1–10)
Proxy IP Rotation — round-robin assignment, health tracking, auto-failover
Anti-Detection — randomized delays, User-Agent rotation, session cycling
Pydantic Validation — all data strictly typed and validated
JSON Data Store — organized per-user directories with deduplication
Batch Jobs — define targets in YAML/JSON, crawl them all in parallel
Rich CLI — beautiful terminal output with tables and status panels

Quick Start

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Crawl a single account
python main.py crawl virat.kohli --limit 5 --delay 5

# Batch crawl from jobs file
python main.py batch --jobs jobs.yaml --workers 3

# Check data status
python main.py status

# Export latest data for a target
python main.py export virat.kohli

# Check proxy health
python main.py proxies --proxies proxies.txt

Commands

Command	Description
`crawl`	Crawl a single Instagram account
`batch`	Batch crawl multiple accounts from a job file
`status`	Show status of all crawled data
`export`	Print latest JSON data for a target
`proxies`	Check proxy list health and availability

Proxy Setup

Add your proxies to proxies.txt (one per line):

http://proxy1.example.com:8080
socks5://user:password@proxy2.example.com:1080
https://proxy3.example.com:3128

If no proxy file is provided, the system runs in direct mode (your own IP).

Job File Format (`jobs.yaml`)

targets:
  - username: virat.kohli
    data_types: [profile, posts, reels]
    posts_limit: 20

  - username: cristiano
    data_types: [profile, posts]
    posts_limit: 50

Output Structure

output/
├── index.json                 # Master index of all crawls
├── virat.kohli/
│   ├── profile_20260219.json
│   ├── posts_20260219.json
│   ├── reels_20260219.json
│   └── latest.json            # Full combined result
└── cristiano/
    └── ...

Configuration

All settings are configurable via environment variables (prefix AGENTDASH_):

Variable	Default	Description
`AGENTDASH_MAX_WORKERS`	`3`	Concurrent crawler workers
`AGENTDASH_DELAY_MIN`	`3.0`	Min delay between requests (s)
`AGENTDASH_DELAY_MAX`	`8.0`	Max delay between requests (s)
`AGENTDASH_PROXY_FILE`	`proxies.txt`	Proxy list file
`AGENTDASH_MAX_RETRIES`	`3`	Retries per failed request
`AGENTDASH_SESSION_ROTATE_AFTER`	`15`	Rotate session after N requests

⚠️ Disclaimer

This tool accesses publicly available data only. Use responsibly and respect Instagram's rate limits and Terms of Service.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🕸️ Agent Dash — Instagram Multi-Crawler Data System

Features

Quick Start

Commands

Proxy Setup

Job File Format (`jobs.yaml`)

Output Structure

Configuration

⚠️ Disclaimer

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🕸️ Agent Dash — Instagram Multi-Crawler Data System

Features

Quick Start

Commands

Proxy Setup

Job File Format (jobs.yaml)

Output Structure

Configuration

⚠️ Disclaimer

Job File Format (`jobs.yaml`)