The official Python SDK for Scrapeless AI - End-to-End Data Infrastructure for AI Developers & Enterprises.
- π Features
- π¦ Installation
- π Quick Start
- π Usage Examples
- π§ API Reference
- π Examples
- π License
- π Support
- π’ About Scrapeless
- Browser: Advanced browser session management supporting Playwright and pyppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows.
- Universal Scraping API: web interaction and data extraction with full browser capabilities. Execute JavaScript rendering, simulate user interactions (clicks, scrolls), bypass anti-scraping measures, and export structured data in formats.
- Crawl: Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.
- Scraping API: Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors.
- Deep SerpApi: Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates.
- Proxies: Geo-targeted proxy network with 195+ countries. Optimize requests for better success rates and regional data access.
- Actor: Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management.
- Storage Solutions: Scalable data storage solutions for crawled content, supporting seamless integration with cloud services and databases.
Install the SDK using pip:
pip install scrapeless
Log in to the Scrapeless Dashboard and get the API Key
from scrapeless import Scrapeless
client = Scrapeless({
'api_key': 'your-api-key' # Get your API key from https://scrapeless.com
})
You can also configure the SDK using environment variables:
# Required
SCRAPELESS_API_KEY=your-api-key
# Optional - Custom API endpoints
SCRAPELESS_BASE_API_URL=https://api.scrapeless.com
SCRAPELESS_ACTOR_API_URL=https://actor.scrapeless.com
SCRAPELESS_STORAGE_API_URL=https://storage.scrapeless.com
SCRAPELESS_BROWSER_API_URL=https://browser.scrapeless.com
SCRAPELESS_CRAWL_API_URL=https://api.scrapeless.com
Advanced browser session management supporting Playwright and Pyppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows:
from scrapeless import Scrapeless
from scrapeless.types import ICreateBrowser
import pyppeteer
client = Scrapeless()
async def example():
# Create a browser session
config = ICreateBrowser(
session_name='sdk_test',
session_ttl=180,
proxy_country='US',
session_recording=True
)
session = client.browser.create(config).__dict__
browser_ws_endpoint = session['browser_ws_endpoint']
print('Browser WebSocket endpoint created:', browser_ws_endpoint)
# Connect to browser using pyppeteer
browser = await pyppeteer.connect({'browserWSEndpoint': browser_ws_endpoint})
# Open new page and navigate to website
page = await browser.newPage()
await page.goto('https://www.scrapeless.com')
Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.
from scrapeless import Scrapeless
client = Scrapeless()
result = client.scraping_crawl.scrape_url("https://example.com")
print(result)
Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors:
from scrapeless import Scrapeless
from scrapeless.types import ScrapingTaskRequest
client = Scrapeless()
request = ScrapingTaskRequest(
actor='scraper.google.search',
input={'q': 'nike site:www.nike.com'}
)
result = client.scraping.scrape(request=request)
print(result)
Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates:
from scrapeless import Scrapeless
from scrapeless.types import ScrapingTaskRequest
client = Scrapeless()
request = ScrapingTaskRequest(
actor='scraper.google.search',
input={'q': 'nike site:www.nike.com'}
)
result = client.deepserp.scrape(request=request)
print(result)
Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management:
from scrapeless import Scrapeless
from scrapeless.types import IRunActorData, IActorRunOptions
client = Scrapeless()
data = IRunActorData(
input={'url': 'https://example.com'},
run_options=IActorRunOptions(
CPU=2,
memory=2048,
timeout=600,
)
)
run = client.actor.run(
actor_id='your_actor_id',
data=data
)
print('Actor run result:', run)
The SDK throws ScrapelessError
for API-related errors:
from scrapeless import Scrapeless, ScrapelessError
client = Scrapeless()
try:
result = client.scraping.scrape({'url': 'invalid-url'})
except ScrapelessError as error:
print(f"Scrapeless API error: {error}")
if hasattr(error, 'status_code'):
print(f"Status code: {error.status_code}")
from scrapeless.types import ScrapelessConfig
config = ScrapelessConfig(
api_key='', # Your api key
timeout=30000, # Request timeout in milliseconds (default: 30000)
base_api_url='', # Base API URL
actor_api_url='', # Actor service URL
storage_api_url='', # Storage service URL
browser_api_url='', # Browser service URL
scraping_crawl_api_url='' # Crawl service URL
)
The SDK provides the following services through the main client:
client.browser
- browser automation with Playwright/Pyppeteer support, anti-detection tools (fingerprinting, CAPTCHA solving), and extensible workflows.client.universal
- JS rendering, user simulation (clicks/scrolls), anti-block bypass, and structured data export.client.scraping_crawl
- Recursive site crawling with multi-format export (Markdown, JSON, HTML, screenshots, links).client.scraping
- Pre-built connectors for sites (e.g., e-commerce, travel) to extract product data, pricing, and reviews.client.deepserp
- Search engine results extractionclient.proxies
- Proxy managementclient.actor
- Scalable workflow automation with built-in scheduling and resource management.client.storage
- Data storage solutions
Check out the examples
directory for comprehensive usage examples:
- Browser
- Playwright Integration
- Pyppeteer Integration
- Scraping API
- Actor
- Storage Usage
- Proxies
- Deep SerpApi
This project is licensed under the MIT License - see the LICENSE file for details.
- π Documentation: https://docs.scrapeless.com
- π¬ Community: Join our Discord
- π Issues: GitHub Issues
- π§ Email: [email protected]
Scrapeless is a powerful web scraping and browser automation platform that helps businesses extract data from any website at scale. Our platform provides:
- High-performance web scraping infrastructure
- Global proxy network
- Browser automation capabilities
- Enterprise-grade reliability and support
Visit scrapeless.com to learn more and get started.
Made with β€οΈ by the Scrapeless team