https://github.com/ibtisamafzal/voyance
AI visual web research agent — natural language → Gemini vision navigates live sites → spoken briefing + comparison table. UI Navigator @ Gemini Live Agent Challenge.
https://github.com/ibtisamafzal/voyance
ai-agent cloud-run elevenlabs fastapi firecrawl gemini google-cloud hackathon perplexity playwright react ui-navigator vite web-research
Last synced: 2 months ago
JSON representation
AI visual web research agent — natural language → Gemini vision navigates live sites → spoken briefing + comparison table. UI Navigator @ Gemini Live Agent Challenge.
- Host: GitHub
- URL: https://github.com/ibtisamafzal/voyance
- Owner: ibtisamafzal
- Created: 2026-03-01T02:47:21.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-09T20:58:17.000Z (3 months ago)
- Last Synced: 2026-03-10T01:55:38.507Z (3 months ago)
- Topics: ai-agent, cloud-run, elevenlabs, fastapi, firecrawl, gemini, google-cloud, hackathon, perplexity, playwright, react, ui-navigator, vite, web-research
- Language: TypeScript
- Homepage: https://voyance-beta.vercel.app
- Size: 15.4 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Voyance
**AI-powered visual web research agent** — speak a task, watch it navigate live sites with Gemini vision, get a spoken briefing and a comparison report.
[](https://geminiliveagentchallenge.devpost.com/)
**Track:** [UI Navigator](https://geminiliveagentchallenge.devpost.com/) · Visual UI understanding & interaction
Live Demo
See Voyance research, verify, and narrate in a real end-to-end flow.
Dev.to Blog
Read the architecture and implementation decisions behind Voyance.
---
## Table of contents
- [Voyance](#voyance)
- [Table of contents](#table-of-contents)
- [What it does](#what-it-does)
- [Features](#features)
- [Screenshots](#screenshots)
- [Hackathon alignment](#hackathon-alignment)
- [Google Cloud Deployment](#google-cloud-deployment)
- [Quick start](#quick-start)
- [Prerequisites](#prerequisites)
- [1. Clone and install](#1-clone-and-install)
- [2. Backend](#2-backend)
- [3. Frontend](#3-frontend)
- [4. Run a research task](#4-run-a-research-task)
- [Tech stack](#tech-stack)
- [Architecture](#architecture)
- [Voyance mind map](#voyance-mind-map)
- [Environment variables](#environment-variables)
- [Deployment](#deployment)
- [Project structure](#project-structure)
- [Community \& write-ups](#community--write-ups)
- [Contact](#contact)
- [License](#license)
---
## What it does
Voyance turns **natural language** into **competitive intelligence** in minutes:
| Step | Description |
| ---- | ----------- |
| **1. You say** | What you need — e.g. *"Compare pricing for the top 5 CRM tools"* |
| **2. The agent** | Plans, visits 3–5 live websites, and “reads” pages with **Gemini multimodal vision** (screenshots only — no DOM scraping) |
| **3. You get** | A sortable comparison table, CSV/HTML export, and **Vera** (ElevenLabs) reading the briefing aloud |
No DOM hacks, no site-specific APIs. Works across site redesigns. Backend is configured for deployment on **Google Cloud Run**.
### Features
- **Natural language input** — Describe your research task in plain English (e.g. compare pricing, features, or reviews).
- **Multi-site research** — Agent visits 3–5 live websites per task with no DOM scraping or site-specific APIs.
- **Gemini vision** — Screenshot-based page understanding; works across redesigns and any site.
- **Comparison table** — Sortable results with company, segment, pricing, and key details.
- **Export** — Download results as CSV or HTML.
- **Spoken briefing (Vera)** — ElevenLabs TTS reads the summary aloud.
- **Interrupt + replan** — During a run, you can submit a redirect instruction (text/voice); the agent queues it and replans on the next loop iteration.
- **Fact verification** — Perplexity-backed claim checks where relevant.
### Screenshots
**Hero** — Enter your research query and start the agent.

**Output** — Comparison table, CSV/HTML export, and *Listen to Vera*.

### Hackathon alignment
| Requirement | Voyance |
| ----------- | ------- |
| **Gemini model** | Gemini 2.0 Flash (planning, screenshot analysis, synthesis) |
| **Google GenAI SDK / ADK** | **Google GenAI SDK** (`google-generativeai`): Gemini for planning, vision, synthesis. Custom agent loop (plan → navigate → extract → verify), not the ADK library. |
| **Google Cloud service** | Backend deployment target is **Google Cloud Run** (`infra/cloudbuild.yaml`, `infra/main.tf`) |
| **UI Navigator** | Screenshots analyzed by Gemini vision; agent outputs navigation and extraction actions |
*Third-party: ElevenLabs (Vera TTS), Firecrawl (extraction), Perplexity (fact verification).*
### Google Cloud Deployment
- Live backend URL: [voyance-backend-712979751443.us-central1.run.app](https://voyance-backend-712979751443.us-central1.run.app)
- Judge artifact: `Google-Cloud-Logs-Voyance.png` (Cloud Run logs screenshot)

---
## Quick start
### Prerequisites
- **Node.js** 18+
- **Python** 3.10+
- **API keys:** [Google AI Studio](https://aistudio.google.com/) (Gemini), [ElevenLabs](https://elevenlabs.io/), [Firecrawl](https://firecrawl.dev/), [Perplexity](https://www.perplexity.ai/) — see `backend/.env.example`
### 1. Clone and install
```bash
git clone https://github.com/ibtisamafzal/voyance.git
cd voyance
npm install
```
### 2. Backend
```bash
cd backend
pip install -r requirements.txt
playwright install chromium
cp .env.example .env
# Edit .env with your API keys
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
```
| Service | URL |
| ------- | --- |
| Backend | |
| API docs | |
### 3. Frontend
From the **repo root** (new terminal):
```bash
npm run dev
```
Frontend: ****
### 4. Run a research task
1. Enter a query in the hero (e.g. *"Compare pricing for top 5 CRM tools"*).
2. Click **Research** — the agent plans, navigates, extracts, and verifies.
3. In the Output section: sort the table, export **CSV** or **HTML**, and click **Listen to Vera** for the spoken briefing.
---
## Tech stack
| Layer | Technology |
| ----- | ---------- |
| **AI & vision** | Gemini 2.0 Flash |
| **Browser** | Playwright (headless Chromium), screenshot-based only |
| **Extraction** | Firecrawl API → Gemini vision fallback |
| **Verification** | Perplexity API |
| **Voice** | ElevenLabs TTS (Vera) |
| **Backend** | FastAPI, WebSockets; **Google Cloud Run** deployment target |
| **Frontend** | React, Vite, Tailwind |
| **Infra** | Docker, Cloud Build, Terraform (`infra/`) |
---
## Architecture
User and frontend → backend (Cloud Run target) → Gemini, Playwright, Firecrawl, Perplexity, ElevenLabs.
[](https://github.com/ibtisamafzal/voyance/blob/main/Architecture%20diagram.png)
---
## Voyance mind map
> **From idea to implementation at a glance.**
>
> This mind map captures the core of Voyance for the Gemini Live Agent Challenge — from the problem and solution, through key features and technical stack, to user personas and submission requirements.

---
## Environment variables
Copy `backend/.env.example` to `backend/.env` and set:
| Variable | Purpose |
| -------- | ------- |
| `GEMINI_API_KEY` | Google AI Studio |
| `ELEVENLABS_API_KEY` | Vera TTS |
| `FIRECRAWL_API_KEY` | Fast extraction |
| `PERPLEXITY_API_KEY` | Fact verification |
| `GOOGLE_CLOUD_PROJECT` | Optional (Firestore); in-memory fallback if unset |
| `CONTACT_EMAIL` | Contact form recipient email (set in server env) |
| `CONTACT_EMAIL_APP_PASSWORD` | Gmail App Password for SMTP contact form sending |
---
## Deployment
- **Backend:** Google Cloud Run. Deploy with `infra/cloudbuild.yaml` from repo root:
```bash
gcloud builds submit --config=infra/cloudbuild.yaml .
```
Default: 1 GiB memory, 1 CPU (increase to 2 GiB if needed for Playwright).
- **Frontend:** Host on Vercel or any static host; set `VITE_API_URL` to your Cloud Run URL (no trailing slash).
**Troubleshooting:** Stuck on "Connecting…" → set `VITE_API_URL` on your host. WebSocket 403 → ensure no trailing slash in `VITE_API_URL`. OOM → increase memory in `cloudbuild.yaml`.
---
## Project structure
```text
├── src/app/ # React frontend
│ ├── components/ # HeroSection, ResearchOutputSection, Navbar, etc.
│ └── context/ # ResearchContext (shared state)
├── backend/ # FastAPI backend
│ ├── app/
│ │ ├── agent.py # Research loop (plan → navigate → extract → verify)
│ │ ├── routers/ # Research, voice, health, sessions
│ │ └── services/ # Gemini, Firecrawl, Perplexity, Playwright, ElevenLabs
│ └── main.py
└── infra/ # GCP automation
├── cloudbuild.yaml # Build & deploy to Cloud Run
└── main.tf # Terraform
```
---
## Community & write-ups
- **Deep-dive blog**: [How We Built Voyance (DEV.to)](https://dev.to/ibtisamafzal/how-we-built-voyance-an-ai-agent-that-researches-the-web-by-seeing-it-214h)
- **Reddit build log**: [How we built Voyance — an AI agent that researches the web by “seeing” it](https://www.reddit.com/user/IbtisamAfzal/comments/1rhtivl/how_we_built_voyance_an_ai_agent_that_researches/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button)
- **Hackathon submission**: [Gemini Live Agent Challenge — UI Navigator track](https://geminiliveagentchallenge.devpost.com/)
- **Source code**: [Voyance on GitHub](https://github.com/ibtisamafzal/voyance)
- **GDG profile**: [g.dev/IbtisamAfzal](https://g.dev/IbtisamAfzal)
## Contact
| | |
| --- | --- |
| **Contact** | Use the in-app contact form (`/contact`) |
| **LinkedIn** | [linkedin.com/in/ibtisamafzal](https://linkedin.com/in/ibtisamafzal/) |
**Blog:** [How We Built Voyance (DEV)](https://dev.to/ibtisamafzal/how-we-built-voyance-an-ai-agent-that-researches-the-web-by-seeing-it-214h) · **Hackathon:** [Gemini Live Agent Challenge](https://geminiliveagentchallenge.devpost.com/) (see Devpost for current schedule)
---
## License
MIT