Conversational Browser Control Agent

Project Overview

This solution implements a conversational AI agent that controls a web browser to send emails through Gmail's web interface. Unlike API-based solutions, this agent:

Opens a real browser instance
Navigates to gmail.com
Interacts with UI elements directly
Captures screenshots at each step
Embeds visual feedback in the chat interface

Architecture Diagram

graph TD
    A[User Interface] <--> B[Backend: FastAPI]
    B <--> C[Conversation Manager]
    C <--> D[Browser Automation Engine]
    C <--> E[AI Content Generator]
    D --> F[Playwright]
    E --> G[OpenRouter API]

Critical Requirements

✅ NO APIs USED: Solution uses browser automation only
✅ Real Browser Control: Playwright controls Chromium
✅ Visual Feedback: Screenshots embedded in chat
✅ Natural Language Processing: Understands user requests
✅ Extensible Architecture: Separated layers for easy modification

User Journey

User requests email sending via natural language
Agent collects necessary information
Agent opens browser and navigates to Gmail
Step-by-step interaction with Gmail UI
Screenshots captured and displayed in chat
Email sent confirmation

Technical Implementation

Natural Language Understanding

Intent extraction from conversational inputs
Contextual question generation
Memory management for conversation flow

Browser Automation Engine

Playwright for browser control
Robust element selectors
Error handling for dynamic content
Screenshot capture at each step
Headless/headful mode support

Conversational Interface

FastAPI backend
WebSocket for real-time updates
Base64 image embedding
Responsive chat UI

AI-Powered Content Generation

OpenRouter API integration
Dynamic email content generation
Context-aware subject lines
Professional tone adaptation

Setup Instructions

Install dependencies:

pip install -r requirements.txt
playwright install chromium

Configure environment variables: Create .env file with:

OPENROUTER_API_KEY=your_api_key

Run the application:

uvicorn main:app --reload

Access the UI: Open http://localhost:8000 in your browser

Screenshots

Proof of Functionality

Email sent to: reportinsurebuzz@gmail.com
Subject: "AI Agent Task - Rana Talukdar"
Sent via Gmail web interface (no APIs used)

Technology Stack

Browser Automation: Playwright
Backend Framework: FastAPI
Frontend: HTML/CSS/JavaScript
AI Integration: OpenRouter API
Conversation Management: Custom state machine

Challenges and Solutions

Dynamic Element Handling:
- Implemented robust selectors with fallbacks
- Added explicit waits for element visibility
Screenshot Integration:
- Base64 encoding for inline display
- Compression to reduce payload size
Session Management:
- Isolated browser contexts per session
- Proper resource cleanup
Python compatability with Playwright
- Had to shift between multiple versions of python in order to find the right version.
- Finally python python-3.10.11 was the right fit

Future Extensions

Multi-website support
Voice command integration (to be added)
Cross-browser compatibility
Plugin system for new actions

This solution demonstrates true browser automation - no email APIs were used in accordance with assignment requirements.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.venv		.venv
__pycache__		__pycache__
backend		backend
.env		.env
Demo_working video.mp4		Demo_working video.mp4
README.md		README.md
gmail_preview.png		gmail_preview.png
main.py		main.py
playwright_test.py		playwright_test.py
requirements.txt		requirements.txt
screenshot.png		screenshot.png
sent_confirmation.png		sent_confirmation.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conversational Browser Control Agent

Project Overview

Architecture Diagram

Critical Requirements

User Journey

Technical Implementation

Natural Language Understanding

Browser Automation Engine

Conversational Interface

AI-Powered Content Generation

Setup Instructions

Screenshots

Proof of Functionality

Technology Stack

Challenges and Solutions

Future Extensions

Made with HaRd WoRk by Rana Talukdar

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Conversational Browser Control Agent

Project Overview

Architecture Diagram

Critical Requirements

User Journey

Technical Implementation

Natural Language Understanding

Browser Automation Engine

Conversational Interface

AI-Powered Content Generation

Setup Instructions

Screenshots

Proof of Functionality

Technology Stack

Challenges and Solutions

Future Extensions

Made with HaRd WoRk by Rana Talukdar

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages