This solution implements a conversational AI agent that controls a web browser to send emails through Gmail's web interface. Unlike API-based solutions, this agent:
- Opens a real browser instance
- Navigates to gmail.com
- Interacts with UI elements directly
- Captures screenshots at each step
- Embeds visual feedback in the chat interface
graph TD
A[User Interface] <--> B[Backend: FastAPI]
B <--> C[Conversation Manager]
C <--> D[Browser Automation Engine]
C <--> E[AI Content Generator]
D --> F[Playwright]
E --> G[OpenRouter API]
- ✅ NO APIs USED: Solution uses browser automation only
- ✅ Real Browser Control: Playwright controls Chromium
- ✅ Visual Feedback: Screenshots embedded in chat
- ✅ Natural Language Processing: Understands user requests
- ✅ Extensible Architecture: Separated layers for easy modification
- User requests email sending via natural language
- Agent collects necessary information
- Agent opens browser and navigates to Gmail
- Step-by-step interaction with Gmail UI
- Screenshots captured and displayed in chat
- Email sent confirmation
- Intent extraction from conversational inputs
- Contextual question generation
- Memory management for conversation flow
- Playwright for browser control
- Robust element selectors
- Error handling for dynamic content
- Screenshot capture at each step
- Headless/headful mode support
- FastAPI backend
- WebSocket for real-time updates
- Base64 image embedding
- Responsive chat UI
- OpenRouter API integration
- Dynamic email content generation
- Context-aware subject lines
- Professional tone adaptation
- Install dependencies:
pip install -r requirements.txt
playwright install chromium- Configure environment variables:
Create
.envfile with:
OPENROUTER_API_KEY=your_api_key
- Run the application:
uvicorn main:app --reload- Access the UI:
Open
http://localhost:8000in your browser
- Email sent to: reportinsurebuzz@gmail.com
- Subject: "AI Agent Task - Rana Talukdar"
- Sent via Gmail web interface (no APIs used)
- Browser Automation: Playwright
- Backend Framework: FastAPI
- Frontend: HTML/CSS/JavaScript
- AI Integration: OpenRouter API
- Conversation Management: Custom state machine
-
Dynamic Element Handling:
- Implemented robust selectors with fallbacks
- Added explicit waits for element visibility
-
Screenshot Integration:
- Base64 encoding for inline display
- Compression to reduce payload size
-
Session Management:
- Isolated browser contexts per session
- Proper resource cleanup
-
Python compatability with Playwright
- Had to shift between multiple versions of python in order to find the right version.
- Finally python python-3.10.11 was the right fit
- Multi-website support
- Voice command integration (to be added)
- Cross-browser compatibility
- Plugin system for new actions
This solution demonstrates true browser automation - no email APIs were used in accordance with assignment requirements.


