Skip to content

Auth0r-C0dez/MailVoyager

Repository files navigation

Conversational Browser Control Agent

Project Overview

This solution implements a conversational AI agent that controls a web browser to send emails through Gmail's web interface. Unlike API-based solutions, this agent:

  • Opens a real browser instance
  • Navigates to gmail.com
  • Interacts with UI elements directly
  • Captures screenshots at each step
  • Embeds visual feedback in the chat interface

Architecture Diagram

graph TD
    A[User Interface] <--> B[Backend: FastAPI]
    B <--> C[Conversation Manager]
    C <--> D[Browser Automation Engine]
    C <--> E[AI Content Generator]
    D --> F[Playwright]
    E --> G[OpenRouter API]
Loading

Critical Requirements

  • NO APIs USED: Solution uses browser automation only
  • Real Browser Control: Playwright controls Chromium
  • Visual Feedback: Screenshots embedded in chat
  • Natural Language Processing: Understands user requests
  • Extensible Architecture: Separated layers for easy modification

User Journey

  1. User requests email sending via natural language
  2. Agent collects necessary information
  3. Agent opens browser and navigates to Gmail
  4. Step-by-step interaction with Gmail UI
  5. Screenshots captured and displayed in chat
  6. Email sent confirmation

Technical Implementation

Natural Language Understanding

  • Intent extraction from conversational inputs
  • Contextual question generation
  • Memory management for conversation flow

Browser Automation Engine

  • Playwright for browser control
  • Robust element selectors
  • Error handling for dynamic content
  • Screenshot capture at each step
  • Headless/headful mode support

Conversational Interface

  • FastAPI backend
  • WebSocket for real-time updates
  • Base64 image embedding
  • Responsive chat UI

AI-Powered Content Generation

  • OpenRouter API integration
  • Dynamic email content generation
  • Context-aware subject lines
  • Professional tone adaptation

Setup Instructions

  1. Install dependencies:
pip install -r requirements.txt
playwright install chromium
  1. Configure environment variables: Create .env file with:
OPENROUTER_API_KEY=your_api_key
  1. Run the application:
uvicorn main:app --reload
  1. Access the UI: Open http://localhost:8000 in your browser

Screenshots

Login Step Compose Email Email Sent

Proof of Functionality

Technology Stack

  • Browser Automation: Playwright
  • Backend Framework: FastAPI
  • Frontend: HTML/CSS/JavaScript
  • AI Integration: OpenRouter API
  • Conversation Management: Custom state machine

Challenges and Solutions

  1. Dynamic Element Handling:

    • Implemented robust selectors with fallbacks
    • Added explicit waits for element visibility
  2. Screenshot Integration:

    • Base64 encoding for inline display
    • Compression to reduce payload size
  3. Session Management:

    • Isolated browser contexts per session
    • Proper resource cleanup
  4. Python compatability with Playwright

    • Had to shift between multiple versions of python in order to find the right version.
    • Finally python python-3.10.11 was the right fit

Future Extensions

  • Multi-website support
  • Voice command integration (to be added)
  • Cross-browser compatibility
  • Plugin system for new actions

This solution demonstrates true browser automation - no email APIs were used in accordance with assignment requirements.

Made with HaRd WoRk by Rana Talukdar

About

MailVoyager is a conversational AI agent that controls a web browser to perform tasks like sending emails through Gmail's UI. It demonstrates true browser automation by interacting with web elements like a human, capturing screenshots on completion, and providing visual feedback within a chat interface.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors