https://github.com/auth0r-c0dez/mailvoyager
MailVoyager is a conversational AI agent that controls a web browser to perform tasks like sending emails through Gmail's UI. It demonstrates true browser automation by interacting with web elements like a human, capturing screenshots on completion, and providing visual feedback within a chat interface.
https://github.com/auth0r-c0dez/mailvoyager
aiagent fastapi llm python
Last synced: 3 months ago
JSON representation
MailVoyager is a conversational AI agent that controls a web browser to perform tasks like sending emails through Gmail's UI. It demonstrates true browser automation by interacting with web elements like a human, capturing screenshots on completion, and providing visual feedback within a chat interface.
- Host: GitHub
- URL: https://github.com/auth0r-c0dez/mailvoyager
- Owner: Auth0r-C0dez
- Created: 2025-07-17T21:03:53.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-07-17T21:42:33.000Z (12 months ago)
- Last Synced: 2025-07-18T01:27:08.255Z (12 months ago)
- Topics: aiagent, fastapi, llm, python
- Language: Python
- Homepage:
- Size: 42.9 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Conversational Browser Control Agent
## Project Overview
This solution implements a conversational AI agent that controls a web browser to send emails through Gmail's web interface. Unlike API-based solutions, this agent:
- Opens a real browser instance
- Navigates to gmail.com
- Interacts with UI elements directly
- Captures screenshots at each step
- Embeds visual feedback in the chat interface
## Architecture Diagram
```mermaid
graph TD
A[User Interface] <--> B[Backend: FastAPI]
B <--> C[Conversation Manager]
C <--> D[Browser Automation Engine]
C <--> E[AI Content Generator]
D --> F[Playwright]
E --> G[OpenRouter API]
```
## Critical Requirements
- ✅ **NO APIs USED**: Solution uses browser automation only
- ✅ **Real Browser Control**: Playwright controls Chromium
- ✅ **Visual Feedback**: Screenshots embedded in chat
- ✅ **Natural Language Processing**: Understands user requests
- ✅ **Extensible Architecture**: Separated layers for easy modification
## User Journey
1. User requests email sending via natural language
2. Agent collects necessary information
3. Agent opens browser and navigates to Gmail
4. Step-by-step interaction with Gmail UI
5. Screenshots captured and displayed in chat
6. Email sent confirmation
## Technical Implementation
### Natural Language Understanding
- Intent extraction from conversational inputs
- Contextual question generation
- Memory management for conversation flow
### Browser Automation Engine
- Playwright for browser control
- Robust element selectors
- Error handling for dynamic content
- Screenshot capture at each step
- Headless/headful mode support
### Conversational Interface
- FastAPI backend
- WebSocket for real-time updates
- Base64 image embedding
- Responsive chat UI
### AI-Powered Content Generation
- OpenRouter API integration
- Dynamic email content generation
- Context-aware subject lines
- Professional tone adaptation
## Setup Instructions
1. **Install dependencies**:
```bash
pip install -r requirements.txt
playwright install chromium
```
2. **Configure environment variables**:
Create `.env` file with:
```
OPENROUTER_API_KEY=your_api_key
```
3. **Run the application**:
```bash
uvicorn main:app --reload
```
4. **Access the UI**:
Open `http://localhost:8000` in your browser
## Screenshots



## Proof of Functionality
- Email sent to: reportinsurebuzz@gmail.com
- Subject: "AI Agent Task - Rana Talukdar"
- Sent via Gmail web interface (no APIs used)
## Technology Stack
- **Browser Automation**: Playwright
- **Backend Framework**: FastAPI
- **Frontend**: HTML/CSS/JavaScript
- **AI Integration**: OpenRouter API
- **Conversation Management**: Custom state machine
## Challenges and Solutions
1. **Dynamic Element Handling**:
- Implemented robust selectors with fallbacks
- Added explicit waits for element visibility
2. **Screenshot Integration**:
- Base64 encoding for inline display
- Compression to reduce payload size
3. **Session Management**:
- Isolated browser contexts per session
- Proper resource cleanup
4. **Python compatability with Playwright**
- Had to shift between multiple versions of python in order to find the right version.
- Finally python python-3.10.11 was the right fit
## Future Extensions
- Multi-website support
- Voice command integration (to be added)
- Cross-browser compatibility
- Plugin system for new actions
---
*This solution demonstrates true browser automation - no email APIs were used in accordance with assignment requirements.*
## Made with HaRd WoRk by Rana Talukdar