crawl4ai-mcp

An MCP server that gives AI assistants web crawling superpowers. Wraps crawl4ai (Playwright/Chromium) and exposes it as MCP tools — crawl pages, extract structured data, batch-crawl sitemaps, and manage browser sessions, all through the Model Context Protocol.

Works with any MCP-compatible client: Claude Code, Claude Desktop, Cursor, Windsurf, OpenAI Agents SDK, and others.

Why

AI coding assistants can't browse the web natively. This MCP server gives them a full Chromium-based crawler — handling JS-rendered pages, authenticated sessions, batch crawls, structured extraction, and sitemap ingestion. No API key required for basic crawling.

Available Tools

Tool	Description
`ping`	Health check — confirms server and browser are running
`crawl_url`	Crawl a URL and return clean markdown. Supports JS rendering, custom headers/cookies, CSS scoping, and cache control
`crawl_many`	Crawl multiple URLs concurrently with configurable parallelism, politeness delays, and optional disk persistence
`deep_crawl`	BFS site crawl — follows links with configurable depth and page limits, politeness delays, and optional disk storage
`crawl_sitemap`	Crawl all URLs from an XML sitemap (supports gzip and sitemap indexes, politeness delays, optional disk persistence)
`extract_structured`	LLM-powered structured JSON extraction with a user-defined schema
`extract_css`	CSS-selector-based structured extraction — deterministic, no LLM required
`create_session`	Create a persistent browser session (preserves cookies and state)
`list_sessions`	List all active browser sessions
`destroy_session`	Destroy a named browser session
`list_profiles`	List available crawl profiles and their settings
`check_update`	Check if a newer version of crawl4ai is available on PyPI

Prerequisites

Python 3.12+
uv — Python package manager

Installation

# Clone the repository
git clone https://github.com/potterdigital/crawl4ai-mcp.git
cd crawl4ai-mcp

# Install dependencies (uv manages the virtualenv automatically)
uv sync

# Install Playwright browser (required by crawl4ai — downloads Chromium)
uv run crawl4ai-setup

# Verify the installation
uv run crawl4ai-doctor

Register with Your MCP Client

Claude Code

# Replace /path/to/crawl4ai-mcp with your actual clone path
claude mcp add-json --scope user crawl4ai '{
  "type": "stdio",
  "command": "uv",
  "args": [
    "run",
    "--directory",
    "/path/to/crawl4ai-mcp",
    "python",
    "-m",
    "crawl4ai_mcp.server"
  ]
}'

Verify with claude mcp list — crawl4ai should appear.

Other MCP Clients

Add the following to your client's MCP server configuration:

{
  "crawl4ai": {
    "type": "stdio",
    "command": "uv",
    "args": [
      "run",
      "--directory",
      "/path/to/crawl4ai-mcp",
      "python",
      "-m",
      "crawl4ai_mcp.server"
    ]
  }
}

Replace /path/to/crawl4ai-mcp with the absolute path to your clone. The --directory flag is required — without it, uv run looks for the virtualenv in the client's working directory.

Usage Examples

Basic Crawling

"Crawl https://docs.python.org/3/library/asyncio.html and summarize the key concepts"

"Fetch https://example.com with a custom Authorization header"

JS-Heavy Pages

"Crawl this page using the js_heavy profile and wait for #content to appear"

Structured Extraction

"Extract all product names and prices from this page as JSON using CSS selectors"

"Use the LLM extractor to pull a list of API endpoints from this documentation page"

Batch Crawling

"Crawl all pages in this sitemap and summarize each one"

"Deep crawl this docs site to depth 2 and find all pages mentioning authentication"

"Crawl this list of URLs with a 1-second delay between requests to be respectful"

"Batch crawl these URLs and save all results to /tmp/crawl_results as markdown files"

Sessions

"Create a browser session, log into this site, then crawl the dashboard page"

Batch Crawling Options

All batch tools (crawl_many, deep_crawl, crawl_sitemap) support two optional parameters:

delay (default: 0): Politeness delay in seconds between requests. Use this to respect target servers and avoid overwhelming them. For deep_crawl, this becomes delay_before_return_html in crawl4ai. For crawl_many and crawl_sitemap, this wires a RateLimiter into request dispatch.
output_dir (default: None): Directory to write per-page .md files and a manifest.json instead of returning content inline. Useful for large batch crawls. When set, the tool returns a metadata summary with file paths instead of full page content.

Example:

# Crawl with politeness delay
crawl_many(urls=[...], delay=1.5)

# Crawl and save to disk
crawl_sitemap(sitemap_url="...", output_dir="/tmp/results")

# Both
deep_crawl(url="...", delay=0.5, output_dir="/tmp/crawl")

Profiles

Four built-in profiles control crawler behavior. Use the profile parameter on any crawl tool, or call list_profiles to see all options.

Profile	Use Case	Key Settings
`default`	General-purpose crawling	`domcontentloaded` wait, 60s timeout
`fast`	Static pages, quick fetches	`domcontentloaded` wait, 15s timeout, low word threshold
`js_heavy`	SPAs, lazy-loaded content	`networkidle` wait, 90s timeout, full-page scroll, overlay removal
`stealth`	Anti-bot protected sites	`networkidle` wait, 90s timeout, simulated user behavior, navigator masking

Profiles are YAML files in src/crawl4ai_mcp/profiles/. You can add custom profiles there.

Merge order: default ← named profile ← per-call overrides

Development

# Run the server directly (for debugging)
uv run python -m crawl4ai_mcp.server

# See server logs (stderr) while discarding MCP frames (stdout)
uv run python -m crawl4ai_mcp.server 2>&1 1>/dev/null

# Lint (catches print() calls that would corrupt stdio transport)
uv run ruff check src/

# Run tests
uv run pytest

# Diagnose crawl4ai / Playwright health
uv run crawl4ai-doctor

Troubleshooting

Tools don't appear in your MCP client Check that the --directory path in the registration command matches the actual project location. uv run without --directory looks for the virtualenv in the client's working directory, not this project.

Server disconnects immediately Any output to stdout (from a print() call or verbose=True in crawl4ai config) corrupts the MCP stdio transport. Check stderr for the actual error:

uv run python -m crawl4ai_mcp.server 2>&1 1>/dev/null

Chromium fails to start / "Playwright Chromium binary is missing or stale" Most often happens after uv sync upgrades Playwright to a new version — the cached Chromium under ~/Library/Caches/ms-playwright/ (macOS) or ~/.cache/ms-playwright/ (Linux) goes stale. The server runs a preflight check at startup and exits with a clear message in your MCP client's log; the fix is:

uv run crawl4ai-setup

Run uv run crawl4ai-doctor for a deeper diagnostic.

extract_structured returns an error about missing API key The LLM extraction tool requires a provider and corresponding API key (e.g., OPENAI_API_KEY). The extract_css tool is a free alternative that doesn't require an LLM.

Contributing

See CONTRIBUTING.md for development setup and guidelines.

License

This project is licensed under the Apache License 2.0 — see LICENSE for details.

This project uses crawl4ai, which is also licensed under Apache 2.0.

Acknowledgments

crawl4ai by @unclecode — the crawling engine that powers this server
Model Context Protocol — the protocol that makes this possible
Built with Claude Code

Created by Potter Digital

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.github		.github
scripts		scripts
src/crawl4ai_mcp		src/crawl4ai_mcp
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
context7.json		context7.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

crawl4ai-mcp

Why

Available Tools

Prerequisites

Installation

Register with Your MCP Client

Claude Code

Other MCP Clients

Usage Examples

Basic Crawling

JS-Heavy Pages

Structured Extraction

Batch Crawling

Sessions

Batch Crawling Options

Profiles

Development

Troubleshooting

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

crawl4ai-mcp

Why

Available Tools

Prerequisites

Installation

Register with Your MCP Client

Claude Code

Other MCP Clients

Usage Examples

Basic Crawling

JS-Heavy Pages

Structured Extraction

Batch Crawling

Sessions

Batch Crawling Options

Profiles

Development

Troubleshooting

Contributing

License

Acknowledgments

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages