A production-grade Instagram data crawling platform that uses multiple concurrent crawlers with proxy-based IP rotation to collect profiles, posts, reels, and comments at scale β outputting validated, deduplicated JSON for data-sharing.
- Multi-Crawler Pool β configurable concurrent workers (1β10)
- Proxy IP Rotation β round-robin assignment, health tracking, auto-failover
- Anti-Detection β randomized delays, User-Agent rotation, session cycling
- Pydantic Validation β all data strictly typed and validated
- JSON Data Store β organized per-user directories with deduplication
- Batch Jobs β define targets in YAML/JSON, crawl them all in parallel
- Rich CLI β beautiful terminal output with tables and status panels
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Crawl a single account
python main.py crawl virat.kohli --limit 5 --delay 5
# Batch crawl from jobs file
python main.py batch --jobs jobs.yaml --workers 3
# Check data status
python main.py status
# Export latest data for a target
python main.py export virat.kohli
# Check proxy health
python main.py proxies --proxies proxies.txt| Command | Description |
|---|---|
crawl |
Crawl a single Instagram account |
batch |
Batch crawl multiple accounts from a job file |
status |
Show status of all crawled data |
export |
Print latest JSON data for a target |
proxies |
Check proxy list health and availability |
Add your proxies to proxies.txt (one per line):
http://proxy1.example.com:8080
socks5://user:password@proxy2.example.com:1080
https://proxy3.example.com:3128
If no proxy file is provided, the system runs in direct mode (your own IP).
targets:
- username: virat.kohli
data_types: [profile, posts, reels]
posts_limit: 20
- username: cristiano
data_types: [profile, posts]
posts_limit: 50output/
βββ index.json # Master index of all crawls
βββ virat.kohli/
β βββ profile_20260219.json
β βββ posts_20260219.json
β βββ reels_20260219.json
β βββ latest.json # Full combined result
βββ cristiano/
βββ ...
All settings are configurable via environment variables (prefix AGENTDASH_):
| Variable | Default | Description |
|---|---|---|
AGENTDASH_MAX_WORKERS |
3 |
Concurrent crawler workers |
AGENTDASH_DELAY_MIN |
3.0 |
Min delay between requests (s) |
AGENTDASH_DELAY_MAX |
8.0 |
Max delay between requests (s) |
AGENTDASH_PROXY_FILE |
proxies.txt |
Proxy list file |
AGENTDASH_MAX_RETRIES |
3 |
Retries per failed request |
AGENTDASH_SESSION_ROTATE_AFTER |
15 |
Rotate session after N requests |
This tool accesses publicly available data only. Use responsibly and respect Instagram's rate limits and Terms of Service.