An enhanced interactive CLI tool for browser automation using the official browser-use library. This tool provides a user-friendly interface to control your browser using natural language commands with advanced features and customization options.
π Now aligned with browser-use v0.2.2 - the latest official release!
-
π€ Multiple LLM Provider Support:
- OpenAI GPT-4o (default)
- Anthropic Claude 3.5 Sonnet (20241022)
- Azure OpenAI Services
- Google Gemini (via browser-use)
- DeepSeek (via browser-use)
- And more through browser-use's LLM integrations
-
π Configurable System Behaviors:
- Default mode for standard automation
- Safety First mode with enhanced security
- Data Collection mode for comprehensive gathering
- Research mode for systematic exploration
- Wikipedia First mode for research tasks
-
πΈ Advanced Logging and Recording:
- Automatic conversation logging
- Session recordings (when configured)
- Comprehensive task execution logs
- Structured data storage
- Debug-level logging support
-
π Modern Browser Integration:
- Uses browser-use's optimized browser handling
- Vision support for visual understanding
- Configurable browser settings
- Support for headless and headed modes
- Cloud browser provider compatibility
-
π οΈ Enhanced User Experience:
- Interactive CLI with clear feedback
- Structured output formats (JSON, etc.)
- Error handling and recovery
- Graceful shutdown handling
- Cross-platform compatibility
For basic browser automation, you can use the simple example:
import asyncio
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from browser_use import Agent
load_dotenv()
async def main():
llm = ChatOpenAI(model="gpt-4o")
agent = Agent(
task="Compare the price of gpt-4o and DeepSeek-V3",
llm=llm,
)
result = await agent.run()
print(result)
asyncio.run(main())
For an enhanced interactive experience with multiple features:
python main.py
The tool supports structured output formats using Pydantic models. Currently available formats:
class Post:
post_title: str
post_url: str
num_comments: int
hours_since_post: int
Enable structured output by setting the OUTPUT_FORMAT
environment variable:
# Use structured posts format
OUTPUT_FORMAT=posts
- Python 3.11 or higher (required by browser-use)
- API Keys for your chosen LLM provider:
- OpenAI API Key (for GPT-4o - default)
- Anthropic API Key (for Claude 3.5 Sonnet)
- Azure OpenAI credentials (for Azure OpenAI)
- Google API Key (for Gemini)
- DeepSeek API Key (for DeepSeek models)
git clone https://github.com/PierrunoYT/browser-use-script
cd browser-use-script
Using pip:
pip install -r requirements.txt
Using uv (recommended):
uv pip install -r requirements.txt
playwright install chromium --with-deps --no-shell
# Copy the example environment file
cp .env.example .env # On macOS/Linux
copy .env.example .env # On Windows
# Required: Add your API key
OPENAI_API_KEY=your_openai_api_key_here
# Optional: Configure LLM provider
LLM_PROVIDER=openai # Options: openai, anthropic, azure, google, deepseek
# Optional: Configure system behavior
SYSTEM_PROMPT=default # Options: default, safety, collection, research, wiki
# Optional: Browser settings
BROWSER_HEADLESS=false
USE_VISION=true
# Optional: Telemetry
ANONYMIZED_TELEMETRY=true
Start the enhanced CLI for interactive browser automation:
python main.py
The tool will display your current configuration:
Welcome to Browser Use CLI!
Using LLM Provider: OPENAI
System Prompt: DEFAULT
Enter your tasks and watch the browser automation in action.
Press Ctrl+C to exit.
For basic automation, use the simple example:
python simple_example.py
Enter your tasks in natural language. Here are some examples:
- Web Research: "Search for the latest AI news and summarize the top 3 articles"
- Information Gathering: "Go to Wikipedia and find information about quantum computing"
- Comparison Tasks: "Compare the pricing of OpenAI GPT-4 and Anthropic Claude"
- Data Collection: "Visit Hacker News and get the top 5 posts with their titles and URLs"
- Standard browser automation behavior
- Balanced between functionality and safety
- Enhanced security and privacy features
- Requires confirmation for form submissions
- Respects robots.txt and terms of service
- Prevents automated logins without permission
- Avoids suspicious or untrusted links
- Focused on comprehensive data gathering
- Automatic search result saving
- Screenshot capture of relevant content
- Organized data storage with timestamps
- Detailed URL documentation
The tool automatically creates and organizes various outputs:
logs/conversation_*.json
: Detailed conversation historylogs/results/*.json
: Structured search resultslogs/screenshots/*.png
: Element screenshotslogs/recordings/
: Browser session recordingslogs/traces/
: Debug trace files
Here are some example tasks you can try:
- "Go to Reddit, search for 'browser-use' and return the first post's title"
- "Search for flights on kayak.com from New York to London"
- "Go to Google Docs and create a new document titled 'Meeting Notes'"
- "Visit GitHub and star the browser-use repository"
- Modern API: Updated to use the latest browser-use API patterns
- Simplified Architecture: Removed complex custom controller logic in favor of browser-use's built-in capabilities
- Better Performance: Leverages browser-use's optimized browser handling
- Enhanced Compatibility: Full compatibility with the official browser-use ecosystem
If you're upgrading from an older version of this script:
- Dependencies: The script now uses the official browser-use package instead of custom implementations
- Configuration: Environment variables remain largely the same for backward compatibility
- Custom Functions: Complex custom functions have been simplified to align with browser-use patterns
- API Changes: The core Agent API is now simpler and more reliable
- browser-use >= 0.2.2 (official browser automation library)
- langchain-openai >= 0.3.11 (OpenAI LLM integration)
- langchain-anthropic >= 0.3.3 (Anthropic Claude integration)
- langchain-core >= 0.3.49 (LangChain core functionality)
- playwright >= 1.52.0 (browser automation engine)
- python-dotenv >= 1.0.1 (environment variable management)
- pydantic >= 2.10.4 (data validation and serialization)
- rich >= 14.0.0 (enhanced CLI formatting)
- click >= 8.1.8 (CLI framework)
- sentence-transformers >= 4.0.2 (for memory features)
Contributions are welcome! Feel free to open issues for bugs or feature requests.
This project is licensed under the MIT License - see the LICENSE file for details.
The default configuration launches a new browser instance with customizable settings:
# .env configuration
BROWSER_HEADLESS=false
BROWSER_VIEWPORT_WIDTH=1280
BROWSER_VIEWPORT_HEIGHT=1100
Connect to your real Chrome browser with existing profiles and logged-in sessions:
# .env configuration
CHROME_INSTANCE_PATH=C:\Program Files\Google\Chrome\Application\chrome.exe # Windows
CHROME_INSTANCE_PATH=/Applications/Google Chrome.app/Contents/MacOS/Google Chrome # macOS
CHROME_INSTANCE_PATH=/usr/bin/google-chrome # Linux
Connect to cloud-based browser services for enhanced reliability:
# .env configuration
# WebSocket connection (wss)
BROWSER_WSS_URL=wss://your-provider.com/browser
# Chrome DevTools Protocol (CDP)
BROWSER_CDP_URL=http://your-cdp-provider.com
Fine-tune browser behavior with these settings:
# .env configuration
# Page Load Settings
MIN_PAGE_LOAD_TIME=0.5
NETWORK_IDLE_TIME=1.0
MAX_PAGE_LOAD_TIME=5.0
# Security Settings
BROWSER_DISABLE_SECURITY=true
IGNORE_HTTPS_ERRORS=true
JAVASCRIPT_ENABLED=true
# Display Settings
HIGHLIGHT_ELEMENTS=true
VIEWPORT_EXPANSION=500
BROWSER_LOCALE=en-US
# URL Restrictions
ALLOWED_DOMAINS=["example.com","another-domain.com"]
# Debug and Recording
SAVE_RECORDING_PATH=logs/recordings
TRACE_PATH=logs/traces
BROWSER_HEADLESS=false
BROWSER_DISABLE_SECURITY=true
USE_VISION=true
BROWSER_HEADLESS=true
BROWSER_DISABLE_SECURITY=false
USE_VISION=true
ALLOWED_DOMAINS=["trusted-domain.com"]
CHROME_INSTANCE_PATH=/path/to/chrome
USE_PERSISTENT_CONTEXT=true
In addition to the basic configuration, you can customize:
# Exclude specific functions
EXCLUDED_ACTIONS=[] # JSON array of action IDs
# Output format
OUTPUT_FORMAT= # Options: posts, or leave empty for text
# Enable debug logging for model thoughts
LOG_LEVEL=DEBUG
# Save browser recordings
SAVE_RECORDING_PATH=logs/recordings
TRACE_PATH=logs/traces
The tool organizes outputs in the following structure:
logs/
βββ browser_use.log # Main log file
βββ conversation_*.json # Conversation history
βββ results/ # Structured search results
βββ screenshots/ # Element screenshots
βββ content/ # Extracted page content
βββ tables/ # CSV table data
βββ downloads/ # Downloaded files
βββ recordings/ # Browser session recordings
βββ traces/ # Debug trace files