Scrapeless Python SDK

The official Python SDK for Scrapeless AI - End-to-End Data Infrastructure for AI Developers & Enterprises.

📑 Table of Contents

🌟 Features
📦 Installation
🚀 Quick Start
📖 Usage Examples
🔧 API Reference
📚 Examples
📄 License
📞 Support
🏢 About Scrapeless

🌟 Features

Browser: Advanced browser session management supporting Playwright and pyppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows.
Universal Scraping API: web interaction and data extraction with full browser capabilities. Execute JavaScript rendering, simulate user interactions (clicks, scrolls), bypass anti-scraping measures, and export structured data in formats.
Crawl: Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.
Scraping API: Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors.
Deep SerpApi: Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates.
Proxies: Geo-targeted proxy network with 195+ countries. Optimize requests for better success rates and regional data access.
Actor: Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management.
Storage Solutions: Scalable data storage solutions for crawled content, supporting seamless integration with cloud services and databases.

📦 Installation

Install the SDK using pip:

pip install scrapeless

🚀 Quick Start

Prerequisite

Log in to the Scrapeless Dashboard and get the API Key

Basic Setup

from scrapeless import Scrapeless

client = Scrapeless({
    'api_key': 'your-api-key'  # Get your API key from https://scrapeless.com
})

Environment Variables

You can also configure the SDK using environment variables:

# Required
SCRAPELESS_API_KEY=your-api-key

# Optional - Custom API endpoints
SCRAPELESS_BASE_API_URL=https://api.scrapeless.com
SCRAPELESS_ACTOR_API_URL=https://actor.scrapeless.com
SCRAPELESS_STORAGE_API_URL=https://storage.scrapeless.com
SCRAPELESS_BROWSER_API_URL=https://browser.scrapeless.com
SCRAPELESS_CRAWL_API_URL=https://api.scrapeless.com

📖 Usage Examples

Browser

Advanced browser session management supporting Playwright and Pyppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows:

from scrapeless import Scrapeless
from scrapeless.types import ICreateBrowser
import pyppeteer

client = Scrapeless()


async def example():
    # Create a browser session
    config = ICreateBrowser(
        session_name='sdk_test',
        session_ttl=180,
        proxy_country='US',
        session_recording=True
    )
    session = client.browser.create(config).__dict__
    browser_ws_endpoint = session['browser_ws_endpoint']
    print('Browser WebSocket endpoint created:', browser_ws_endpoint)

    # Connect to browser using pyppeteer
    browser = await pyppeteer.connect({'browserWSEndpoint': browser_ws_endpoint})
    # Open new page and navigate to website
    page = await browser.newPage()
    await page.goto('https://www.scrapeless.com')

Crawl

Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.

from scrapeless import Scrapeless

client = Scrapeless()

result = client.scraping_crawl.scrape_url("https://example.com")
print(result)

Scraping API

Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors:

from scrapeless import Scrapeless
from scrapeless.types import ScrapingTaskRequest

client = Scrapeless()
request = ScrapingTaskRequest(
    actor='scraper.google.search',
    input={'q': 'nike site:www.nike.com'}
)
result = client.scraping.scrape(request=request)
print(result)

Deep SerpApi

Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates:

from scrapeless import Scrapeless
from scrapeless.types import ScrapingTaskRequest

client = Scrapeless()
request = ScrapingTaskRequest(
    actor='scraper.google.search',
    input={'q': 'nike site:www.nike.com'}
)
result = client.deepserp.scrape(request=request)
print(result)

Actor

Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management:

from scrapeless import Scrapeless
from scrapeless.types import IRunActorData, IActorRunOptions

client = Scrapeless()
data = IRunActorData(
    input={'url': 'https://example.com'},
    run_options=IActorRunOptions(
        CPU=2,
        memory=2048,
        timeout=600,
    )
)

run = client.actor.run(
    actor_id='your_actor_id',
    data=data
)
print('Actor run result:', run)

Error Handling

The SDK throws ScrapelessError for API-related errors:

from scrapeless import Scrapeless, ScrapelessError

client = Scrapeless()
try:
    result = client.scraping.scrape({'url': 'invalid-url'})
except ScrapelessError as error:
    print(f"Scrapeless API error: {error}")
    if hasattr(error, 'status_code'):
        print(f"Status code: {error.status_code}")

🔧 API Reference

Client Configuration

from scrapeless.types import ScrapelessConfig 

config = ScrapelessConfig(
    api_key='', # Your api key
    timeout=30000, # Request timeout in milliseconds (default: 30000)
    base_api_url='', # Base API URL
    actor_api_url='', # Actor service URL
    storage_api_url='', # Storage service URL
    browser_api_url='', # Browser service URL
    scraping_crawl_api_url='' # Crawl service URL
)

Available Services

The SDK provides the following services through the main client:

client.browser - browser automation with Playwright/Pyppeteer support, anti-detection tools (fingerprinting, CAPTCHA solving), and extensible workflows.
client.universal - JS rendering, user simulation (clicks/scrolls), anti-block bypass, and structured data export.
client.scraping_crawl - Recursive site crawling with multi-format export (Markdown, JSON, HTML, screenshots, links).
client.scraping - Pre-built connectors for sites (e.g., e-commerce, travel) to extract product data, pricing, and reviews.
client.deepserp - Search engine results extraction
client.proxies - Proxy management
client.actor - Scalable workflow automation with built-in scheduling and resource management.
client.storage - Data storage solutions

📚 Examples

Check out the examples directory for comprehensive usage examples:

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📞 Support

📖 Documentation: https://docs.scrapeless.com
💬 Community: Join our Discord
🐛 Issues: GitHub Issues
📧 Email: [email protected]

🏢 About Scrapeless

Scrapeless is a powerful web scraping and browser automation platform that helps businesses extract data from any website at scale. Our platform provides:

High-performance web scraping infrastructure
Global proxy network
Browser automation capabilities
Enterprise-grade reliability and support

Visit scrapeless.com to learn more and get started.

Made with ❤️ by the Scrapeless team

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
examples		examples
scrapeless		scrapeless
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scrapeless Python SDK

📑 Table of Contents

🌟 Features

📦 Installation

🚀 Quick Start

Prerequisite

Basic Setup

Environment Variables

📖 Usage Examples

Browser

Crawl

Scraping API

Deep SerpApi

Actor

Error Handling

🔧 API Reference

Client Configuration

Available Services

📚 Examples

📄 License

📞 Support

🏢 About Scrapeless

About

Uh oh!

Releases 8

Packages

Languages

License

scrapeless-ai/sdk-python

Folders and files

Latest commit

History

Repository files navigation

Scrapeless Python SDK

📑 Table of Contents

🌟 Features

📦 Installation

🚀 Quick Start

Prerequisite

Basic Setup

Environment Variables

📖 Usage Examples

Browser

Crawl

Scraping API

Deep SerpApi

Actor

Error Handling

🔧 API Reference

Client Configuration

Available Services

📚 Examples

📄 License

📞 Support

🏢 About Scrapeless

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Languages

Packages