An MCP server that gives AI assistants web crawling superpowers. Wraps crawl4ai (Playwright/Chromium) and exposes it as MCP tools — crawl pages, extract structured data, batch-crawl sitemaps, and manage browser sessions, all through the Model Context Protocol.
Works with any MCP-compatible client: Claude Code, Claude Desktop, Cursor, Windsurf, OpenAI Agents SDK, and others.
AI coding assistants can't browse the web natively. This MCP server gives them a full Chromium-based crawler — handling JS-rendered pages, authenticated sessions, batch crawls, structured extraction, and sitemap ingestion. No API key required for basic crawling.
| Tool | Description |
|---|---|
ping |
Health check — confirms server and browser are running |
crawl_url |
Crawl a URL and return clean markdown. Supports JS rendering, custom headers/cookies, CSS scoping, and cache control |
crawl_many |
Crawl multiple URLs concurrently with configurable parallelism, politeness delays, and optional disk persistence |
deep_crawl |
BFS site crawl — follows links with configurable depth and page limits, politeness delays, and optional disk storage |
crawl_sitemap |
Crawl all URLs from an XML sitemap (supports gzip and sitemap indexes, politeness delays, optional disk persistence) |
extract_structured |
LLM-powered structured JSON extraction with a user-defined schema |
extract_css |
CSS-selector-based structured extraction — deterministic, no LLM required |
create_session |
Create a persistent browser session (preserves cookies and state) |
list_sessions |
List all active browser sessions |
destroy_session |
Destroy a named browser session |
list_profiles |
List available crawl profiles and their settings |
check_update |
Check if a newer version of crawl4ai is available on PyPI |
- Python 3.12+
- uv — Python package manager
# Clone the repository
git clone https://github.com/potterdigital/crawl4ai-mcp.git
cd crawl4ai-mcp
# Install dependencies (uv manages the virtualenv automatically)
uv sync
# Install Playwright browser (required by crawl4ai — downloads Chromium)
uv run crawl4ai-setup
# Verify the installation
uv run crawl4ai-doctor# Replace /path/to/crawl4ai-mcp with your actual clone path
claude mcp add-json --scope user crawl4ai '{
"type": "stdio",
"command": "uv",
"args": [
"run",
"--directory",
"/path/to/crawl4ai-mcp",
"python",
"-m",
"crawl4ai_mcp.server"
]
}'Verify with claude mcp list — crawl4ai should appear.
Add the following to your client's MCP server configuration:
{
"crawl4ai": {
"type": "stdio",
"command": "uv",
"args": [
"run",
"--directory",
"/path/to/crawl4ai-mcp",
"python",
"-m",
"crawl4ai_mcp.server"
]
}
}Replace /path/to/crawl4ai-mcp with the absolute path to your clone. The --directory flag is required — without it, uv run looks for the virtualenv in the client's working directory.
"Crawl https://docs.python.org/3/library/asyncio.html and summarize the key concepts"
"Fetch https://example.com with a custom Authorization header"
"Crawl this page using the js_heavy profile and wait for #content to appear"
"Extract all product names and prices from this page as JSON using CSS selectors"
"Use the LLM extractor to pull a list of API endpoints from this documentation page"
"Crawl all pages in this sitemap and summarize each one"
"Deep crawl this docs site to depth 2 and find all pages mentioning authentication"
"Crawl this list of URLs with a 1-second delay between requests to be respectful"
"Batch crawl these URLs and save all results to /tmp/crawl_results as markdown files"
"Create a browser session, log into this site, then crawl the dashboard page"
All batch tools (crawl_many, deep_crawl, crawl_sitemap) support two optional parameters:
-
delay(default: 0): Politeness delay in seconds between requests. Use this to respect target servers and avoid overwhelming them. Fordeep_crawl, this becomesdelay_before_return_htmlin crawl4ai. Forcrawl_manyandcrawl_sitemap, this wires a RateLimiter into request dispatch. -
output_dir(default: None): Directory to write per-page.mdfiles and amanifest.jsoninstead of returning content inline. Useful for large batch crawls. When set, the tool returns a metadata summary with file paths instead of full page content.
Example:
# Crawl with politeness delay
crawl_many(urls=[...], delay=1.5)
# Crawl and save to disk
crawl_sitemap(sitemap_url="...", output_dir="/tmp/results")
# Both
deep_crawl(url="...", delay=0.5, output_dir="/tmp/crawl")Four built-in profiles control crawler behavior. Use the profile parameter on any crawl tool, or call list_profiles to see all options.
| Profile | Use Case | Key Settings |
|---|---|---|
default |
General-purpose crawling | domcontentloaded wait, 60s timeout |
fast |
Static pages, quick fetches | domcontentloaded wait, 15s timeout, low word threshold |
js_heavy |
SPAs, lazy-loaded content | networkidle wait, 90s timeout, full-page scroll, overlay removal |
stealth |
Anti-bot protected sites | networkidle wait, 90s timeout, simulated user behavior, navigator masking |
Profiles are YAML files in src/crawl4ai_mcp/profiles/. You can add custom profiles there.
Merge order: default ← named profile ← per-call overrides
# Run the server directly (for debugging)
uv run python -m crawl4ai_mcp.server
# See server logs (stderr) while discarding MCP frames (stdout)
uv run python -m crawl4ai_mcp.server 2>&1 1>/dev/null
# Lint (catches print() calls that would corrupt stdio transport)
uv run ruff check src/
# Run tests
uv run pytest
# Diagnose crawl4ai / Playwright health
uv run crawl4ai-doctorTools don't appear in your MCP client
Check that the --directory path in the registration command matches the actual project location. uv run without --directory looks for the virtualenv in the client's working directory, not this project.
Server disconnects immediately
Any output to stdout (from a print() call or verbose=True in crawl4ai config) corrupts the MCP stdio transport. Check stderr for the actual error:
uv run python -m crawl4ai_mcp.server 2>&1 1>/dev/nullChromium fails to start / "Playwright Chromium binary is missing or stale"
Most often happens after uv sync upgrades Playwright to a new version — the cached Chromium under ~/Library/Caches/ms-playwright/ (macOS) or ~/.cache/ms-playwright/ (Linux) goes stale. The server runs a preflight check at startup and exits with a clear message in your MCP client's log; the fix is:
uv run crawl4ai-setupRun uv run crawl4ai-doctor for a deeper diagnostic.
extract_structured returns an error about missing API key
The LLM extraction tool requires a provider and corresponding API key (e.g., OPENAI_API_KEY). The extract_css tool is a free alternative that doesn't require an LLM.
See CONTRIBUTING.md for development setup and guidelines.
This project is licensed under the Apache License 2.0 — see LICENSE for details.
This project uses crawl4ai, which is also licensed under Apache 2.0.
- crawl4ai by @unclecode — the crawling engine that powers this server
- Model Context Protocol — the protocol that makes this possible
- Built with Claude Code
Created by Potter Digital