https://github.com/ibtisamafzal/voyance

AI visual web research agent — natural language → Gemini vision navigates live sites → spoken briefing + comparison table. UI Navigator @ Gemini Live Agent Challenge.
https://github.com/ibtisamafzal/voyance

ai-agent cloud-run elevenlabs fastapi firecrawl gemini google-cloud hackathon perplexity playwright react ui-navigator vite web-research

Last synced: 4 months ago
JSON representation

AI visual web research agent — natural language → Gemini vision navigates live sites → spoken briefing + comparison table. UI Navigator @ Gemini Live Agent Challenge.

Host: GitHub
URL: https://github.com/ibtisamafzal/voyance
Owner: ibtisamafzal
Created: 2026-03-01T02:47:21.000Z (5 months ago)
Default Branch: main
Last Pushed: 2026-03-09T20:58:17.000Z (5 months ago)
Last Synced: 2026-03-10T01:55:38.507Z (5 months ago)
Topics: ai-agent, cloud-run, elevenlabs, fastapi, firecrawl, gemini, google-cloud, hackathon, perplexity, playwright, react, ui-navigator, vite, web-research
Language: TypeScript
Homepage: https://voyance-beta.vercel.app
Size: 15.4 MB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Voyance

**AI-powered visual web research agent** — speak a task, watch it navigate live sites with Gemini vision, get a spoken briefing and a comparison report.

[![Gemini Live Agent Challenge 2026](https://img.shields.io/badge/Gemini%20Live%20Agent%20Challenge-2026-4285F4?style=flat&logo=google)](https://geminiliveagentchallenge.devpost.com/)  

**Track:** [UI Navigator](https://geminiliveagentchallenge.devpost.com/) · Visual UI understanding & interaction

  

    

      
Live Demo

      See Voyance research, verify, and narrate in a real end-to-end flow.

      Open Demo

    

    

      Dev.to Blog

      Read the architecture and implementation decisions behind Voyance.

      Read Blog Post

    

  

---

## Table of contents

- [Voyance](#voyance)

  - [Table of contents](#table-of-contents)

  - [What it does](#what-it-does)

    - [Features](#features)

    - [Screenshots](#screenshots)

    - [Hackathon alignment](#hackathon-alignment)

    - [Google Cloud Deployment](#google-cloud-deployment)

  - [Quick start](#quick-start)

    - [Prerequisites](#prerequisites)

    - [1. Clone and install](#1-clone-and-install)

    - [2. Backend](#2-backend)

    - [3. Frontend](#3-frontend)

    - [4. Run a research task](#4-run-a-research-task)

  - [Tech stack](#tech-stack)

  - [Architecture](#architecture)

  - [Voyance mind map](#voyance-mind-map)

  - [Environment variables](#environment-variables)

  - [Deployment](#deployment)

  - [Project structure](#project-structure)

  - [Community \& write-ups](#community--write-ups)

  - [Contact](#contact)

  - [License](#license)

---

## What it does

Voyance turns **natural language** into **competitive intelligence** in minutes:

| Step | Description |

| ---- | ----------- |

| **1. You say** | What you need — e.g. *"Compare pricing for the top 5 CRM tools"* |

| **2. The agent** | Plans, visits 3–5 live websites, and “reads” pages with **Gemini multimodal vision** (screenshots only — no DOM scraping) |

| **3. You get** | A sortable comparison table, CSV/HTML export, and **Vera** (ElevenLabs) reading the briefing aloud |

No DOM hacks, no site-specific APIs. Works across site redesigns. Backend is configured for deployment on **Google Cloud Run**.

### Features

- **Natural language input** — Describe your research task in plain English (e.g. compare pricing, features, or reviews).

- **Multi-site research** — Agent visits 3–5 live websites per task with no DOM scraping or site-specific APIs.

- **Gemini vision** — Screenshot-based page understanding; works across redesigns and any site.

- **Comparison table** — Sortable results with company, segment, pricing, and key details.

- **Export** — Download results as CSV or HTML.

- **Spoken briefing (Vera)** — ElevenLabs TTS reads the summary aloud.

- **Interrupt + replan** — During a run, you can submit a redirect instruction (text/voice); the agent queues it and replans on the next loop iteration.

- **Fact verification** — Perplexity-backed claim checks where relevant.

### Screenshots

**Hero** — Enter your research query and start the agent.

![Hero section](public/Hero-Section.png)

**Output** — Comparison table, CSV/HTML export, and *Listen to Vera*.

![Output section](public/Output-Section.png)

### Hackathon alignment

| Requirement | Voyance |

| ----------- | ------- |

| **Gemini model** | Gemini 2.0 Flash (planning, screenshot analysis, synthesis) |

| **Google GenAI SDK / ADK** | **Google GenAI SDK** (`google-generativeai`): Gemini for planning, vision, synthesis. Custom agent loop (plan → navigate → extract → verify), not the ADK library. |

| **Google Cloud service** | Backend deployment target is **Google Cloud Run** (`infra/cloudbuild.yaml`, `infra/main.tf`) |

| **UI Navigator** | Screenshots analyzed by Gemini vision; agent outputs navigation and extraction actions |

*Third-party: ElevenLabs (Vera TTS), Firecrawl (extraction), Perplexity (fact verification).*

### Google Cloud Deployment

- Live backend URL: [voyance-backend-712979751443.us-central1.run.app](https://voyance-backend-712979751443.us-central1.run.app)

- Judge artifact: `Google-Cloud-Logs-Voyance.png` (Cloud Run logs screenshot)

![Google Cloud Run logs proof](Google-Cloud-Logs-Voyance.png)

---

## Quick start

### Prerequisites

- **Node.js** 18+

- **Python** 3.10+

- **API keys:** [Google AI Studio](https://aistudio.google.com/) (Gemini), [ElevenLabs](https://elevenlabs.io/), [Firecrawl](https://firecrawl.dev/), [Perplexity](https://www.perplexity.ai/) — see `backend/.env.example`

### 1. Clone and install

```bash

git clone https://github.com/ibtisamafzal/voyance.git

cd voyance

npm install

```

### 2. Backend

```bash

cd backend

pip install -r requirements.txt

playwright install chromium

cp .env.example .env

# Edit .env with your API keys

uvicorn main:app --host 0.0.0.0 --port 8000 --reload

```

| Service | URL |

| ------- | --- |

| Backend |  |

| API docs |  |

### 3. Frontend

From the **repo root** (new terminal):

```bash

npm run dev

```

Frontend: ****

### 4. Run a research task

1. Enter a query in the hero (e.g. *"Compare pricing for top 5 CRM tools"*).

2. Click **Research** — the agent plans, navigates, extracts, and verifies.

3. In the Output section: sort the table, export **CSV** or **HTML**, and click **Listen to Vera** for the spoken briefing.

---

## Tech stack

| Layer | Technology |

| ----- | ---------- |

| **AI & vision** | Gemini 2.0 Flash |

| **Browser** | Playwright (headless Chromium), screenshot-based only |

| **Extraction** | Firecrawl API → Gemini vision fallback |

| **Verification** | Perplexity API |

| **Voice** | ElevenLabs TTS (Vera) |

| **Backend** | FastAPI, WebSockets; **Google Cloud Run** deployment target |

| **Frontend** | React, Vite, Tailwind |

| **Infra** | Docker, Cloud Build, Terraform (`infra/`) |

---

## Architecture

User and frontend → backend (Cloud Run target) → Gemini, Playwright, Firecrawl, Perplexity, ElevenLabs.

[![Voyance architecture](https://github.com/ibtisamafzal/voyance/blob/main/Architecture%20diagram.png)](https://github.com/ibtisamafzal/voyance/blob/main/Architecture%20diagram.png)

---

## Voyance mind map

> **From idea to implementation at a glance.**

>

> This mind map captures the core of Voyance for the Gemini Live Agent Challenge — from the problem and solution, through key features and technical stack, to user personas and submission requirements.

![Voyance mind map for Gemini Live Agent Challenge](public/Voyance-mind-map.png)

---

## Environment variables

Copy `backend/.env.example` to `backend/.env` and set:

| Variable | Purpose |

| -------- | ------- |

| `GEMINI_API_KEY` | Google AI Studio |

| `ELEVENLABS_API_KEY` | Vera TTS |

| `FIRECRAWL_API_KEY` | Fast extraction |

| `PERPLEXITY_API_KEY` | Fact verification |

| `GOOGLE_CLOUD_PROJECT` | Optional (Firestore); in-memory fallback if unset |

| `CONTACT_EMAIL` | Contact form recipient email (set in server env) |

| `CONTACT_EMAIL_APP_PASSWORD` | Gmail App Password for SMTP contact form sending |

---

## Deployment

- **Backend:** Google Cloud Run. Deploy with `infra/cloudbuild.yaml` from repo root:

  ```bash

  gcloud builds submit --config=infra/cloudbuild.yaml .

  ```

  Default: 1 GiB memory, 1 CPU (increase to 2 GiB if needed for Playwright).

- **Frontend:** Host on Vercel or any static host; set `VITE_API_URL` to your Cloud Run URL (no trailing slash).

**Troubleshooting:** Stuck on "Connecting…" → set `VITE_API_URL` on your host. WebSocket 403 → ensure no trailing slash in `VITE_API_URL`. OOM → increase memory in `cloudbuild.yaml`.

---

## Project structure

```text

├── src/app/              # React frontend

│   ├── components/       # HeroSection, ResearchOutputSection, Navbar, etc.

│   └── context/          # ResearchContext (shared state)

├── backend/              # FastAPI backend

│   ├── app/

│   │   ├── agent.py      # Research loop (plan → navigate → extract → verify)

│   │   ├── routers/      # Research, voice, health, sessions

│   │   └── services/     # Gemini, Firecrawl, Perplexity, Playwright, ElevenLabs

│   └── main.py

└── infra/                # GCP automation

    ├── cloudbuild.yaml   # Build & deploy to Cloud Run

    └── main.tf           # Terraform

```

---

## Community & write-ups

- **Deep-dive blog**: [How We Built Voyance (DEV.to)](https://dev.to/ibtisamafzal/how-we-built-voyance-an-ai-agent-that-researches-the-web-by-seeing-it-214h)

- **Reddit build log**: [How we built Voyance — an AI agent that researches the web by “seeing” it](https://www.reddit.com/user/IbtisamAfzal/comments/1rhtivl/how_we_built_voyance_an_ai_agent_that_researches/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button)

- **Hackathon submission**: [Gemini Live Agent Challenge — UI Navigator track](https://geminiliveagentchallenge.devpost.com/)

- **Source code**: [Voyance on GitHub](https://github.com/ibtisamafzal/voyance)

- **GDG profile**: [g.dev/IbtisamAfzal](https://g.dev/IbtisamAfzal)

## Contact

| | |

| --- | --- |

| **Contact** | Use the in-app contact form (`/contact`) |

| **LinkedIn** | [linkedin.com/in/ibtisamafzal](https://linkedin.com/in/ibtisamafzal/) |

**Blog:** [How We Built Voyance (DEV)](https://dev.to/ibtisamafzal/how-we-built-voyance-an-ai-agent-that-researches-the-web-by-seeing-it-214h) · **Hackathon:** [Gemini Live Agent Challenge](https://geminiliveagentchallenge.devpost.com/) (see Devpost for current schedule)

---

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ibtisamafzal/voyance

Awesome Lists containing this project

README

Live Demo

Dev.to Blog