{"id":48573350,"url":"https://github.com/ibtisamafzal/voyance","last_synced_at":"2026-04-08T15:34:35.393Z","repository":{"id":341313381,"uuid":"1169667310","full_name":"ibtisamafzal/voyance","owner":"ibtisamafzal","description":"AI visual web research agent — natural language → Gemini vision navigates live sites → spoken briefing + comparison table. UI Navigator @ Gemini Live Agent Challenge.","archived":false,"fork":false,"pushed_at":"2026-03-09T20:58:17.000Z","size":16134,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-10T01:55:38.507Z","etag":null,"topics":["ai-agent","cloud-run","elevenlabs","fastapi","firecrawl","gemini","google-cloud","hackathon","perplexity","playwright","react","ui-navigator","vite","web-research"],"latest_commit_sha":null,"homepage":"https://voyance-beta.vercel.app","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ibtisamafzal.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-01T02:47:21.000Z","updated_at":"2026-03-09T20:58:21.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ibtisamafzal/voyance","commit_stats":null,"previous_names":["ibtisamafzal/voyance"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ibtisamafzal/voyance","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ibtisamafzal%2Fvoyance","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ibtisamafzal%2Fvoyance/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ibtisamafzal%2Fvoyance/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ibtisamafzal%2Fvoyance/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ibtisamafzal","download_url":"https://codeload.github.com/ibtisamafzal/voyance/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ibtisamafzal%2Fvoyance/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31562690,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"ssl_error","status_checked_at":"2026-04-08T14:31:17.202Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agent","cloud-run","elevenlabs","fastapi","firecrawl","gemini","google-cloud","hackathon","perplexity","playwright","react","ui-navigator","vite","web-research"],"created_at":"2026-04-08T15:34:34.763Z","updated_at":"2026-04-08T15:34:35.362Z","avatar_url":"https://github.com/ibtisamafzal.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Voyance\n\n**AI-powered visual web research agent** — speak a task, watch it navigate live sites with Gemini vision, get a spoken briefing and a comparison report.\n\n[![Gemini Live Agent Challenge 2026](https://img.shields.io/badge/Gemini%20Live%20Agent%20Challenge-2026-4285F4?style=flat\u0026logo=google)](https://geminiliveagentchallenge.devpost.com/)  \n**Track:** [UI Navigator](https://geminiliveagentchallenge.devpost.com/) · Visual UI understanding \u0026 interaction\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003ctd width=\"50%\" valign=\"top\"\u003e\n      \u003ch3\u003eLive Demo\u003c/h3\u003e\n      \u003cp\u003eSee Voyance research, verify, and narrate in a real end-to-end flow.\u003c/p\u003e\n      \u003cp\u003e\u003ca href=\"https://voyance-beta.vercel.app/\"\u003e\u003cstrong\u003eOpen Demo\u003c/strong\u003e\u003c/a\u003e\u003c/p\u003e\n    \u003c/td\u003e\n    \u003ctd width=\"50%\" valign=\"top\"\u003e\n      \u003ch3\u003eDev.to Blog\u003c/h3\u003e\n      \u003cp\u003eRead the architecture and implementation decisions behind Voyance.\u003c/p\u003e\n      \u003cp\u003e\u003ca href=\"https://dev.to/ibtisamafzal/how-we-built-voyance-an-ai-agent-that-researches-the-web-by-seeing-it-214h\"\u003e\u003cstrong\u003eRead Blog Post\u003c/strong\u003e\u003c/a\u003e\u003c/p\u003e\n    \u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n---\n\n## Table of contents\n\n- [Voyance](#voyance)\n  - [Table of contents](#table-of-contents)\n  - [What it does](#what-it-does)\n    - [Features](#features)\n    - [Screenshots](#screenshots)\n    - [Hackathon alignment](#hackathon-alignment)\n    - [Google Cloud Deployment](#google-cloud-deployment)\n  - [Quick start](#quick-start)\n    - [Prerequisites](#prerequisites)\n    - [1. Clone and install](#1-clone-and-install)\n    - [2. Backend](#2-backend)\n    - [3. Frontend](#3-frontend)\n    - [4. Run a research task](#4-run-a-research-task)\n  - [Tech stack](#tech-stack)\n  - [Architecture](#architecture)\n  - [Voyance mind map](#voyance-mind-map)\n  - [Environment variables](#environment-variables)\n  - [Deployment](#deployment)\n  - [Project structure](#project-structure)\n  - [Community \\\u0026 write-ups](#community--write-ups)\n  - [Contact](#contact)\n  - [License](#license)\n\n---\n\n## What it does\n\nVoyance turns **natural language** into **competitive intelligence** in minutes:\n\n| Step | Description |\n| ---- | ----------- |\n| **1. You say** | What you need — e.g. *\"Compare pricing for the top 5 CRM tools\"* |\n| **2. The agent** | Plans, visits 3–5 live websites, and “reads” pages with **Gemini multimodal vision** (screenshots only — no DOM scraping) |\n| **3. You get** | A sortable comparison table, CSV/HTML export, and **Vera** (ElevenLabs) reading the briefing aloud |\n\nNo DOM hacks, no site-specific APIs. Works across site redesigns. Backend is configured for deployment on **Google Cloud Run**.\n\n### Features\n\n- **Natural language input** — Describe your research task in plain English (e.g. compare pricing, features, or reviews).\n- **Multi-site research** — Agent visits 3–5 live websites per task with no DOM scraping or site-specific APIs.\n- **Gemini vision** — Screenshot-based page understanding; works across redesigns and any site.\n- **Comparison table** — Sortable results with company, segment, pricing, and key details.\n- **Export** — Download results as CSV or HTML.\n- **Spoken briefing (Vera)** — ElevenLabs TTS reads the summary aloud.\n- **Interrupt + replan** — During a run, you can submit a redirect instruction (text/voice); the agent queues it and replans on the next loop iteration.\n- **Fact verification** — Perplexity-backed claim checks where relevant.\n\n### Screenshots\n\n**Hero** — Enter your research query and start the agent.\n\n![Hero section](public/Hero-Section.png)\n\n**Output** — Comparison table, CSV/HTML export, and *Listen to Vera*.\n\n![Output section](public/Output-Section.png)\n\n### Hackathon alignment\n\n| Requirement | Voyance |\n| ----------- | ------- |\n| **Gemini model** | Gemini 2.0 Flash (planning, screenshot analysis, synthesis) |\n| **Google GenAI SDK / ADK** | **Google GenAI SDK** (`google-generativeai`): Gemini for planning, vision, synthesis. Custom agent loop (plan → navigate → extract → verify), not the ADK library. |\n| **Google Cloud service** | Backend deployment target is **Google Cloud Run** (`infra/cloudbuild.yaml`, `infra/main.tf`) |\n| **UI Navigator** | Screenshots analyzed by Gemini vision; agent outputs navigation and extraction actions |\n\n*Third-party: ElevenLabs (Vera TTS), Firecrawl (extraction), Perplexity (fact verification).*\n\n### Google Cloud Deployment\n\n- Live backend URL: [voyance-backend-712979751443.us-central1.run.app](https://voyance-backend-712979751443.us-central1.run.app)\n- Judge artifact: `Google-Cloud-Logs-Voyance.png` (Cloud Run logs screenshot)\n\n![Google Cloud Run logs proof](Google-Cloud-Logs-Voyance.png)\n\n---\n\n## Quick start\n\n### Prerequisites\n\n- **Node.js** 18+\n- **Python** 3.10+\n- **API keys:** [Google AI Studio](https://aistudio.google.com/) (Gemini), [ElevenLabs](https://elevenlabs.io/), [Firecrawl](https://firecrawl.dev/), [Perplexity](https://www.perplexity.ai/) — see `backend/.env.example`\n\n### 1. Clone and install\n\n```bash\ngit clone https://github.com/ibtisamafzal/voyance.git\ncd voyance\nnpm install\n```\n\n### 2. Backend\n\n```bash\ncd backend\npip install -r requirements.txt\nplaywright install chromium\ncp .env.example .env\n# Edit .env with your API keys\nuvicorn main:app --host 0.0.0.0 --port 8000 --reload\n```\n\n| Service | URL |\n| ------- | --- |\n| Backend | \u003chttp://localhost:8000\u003e |\n| API docs | \u003chttp://localhost:8000/api/docs\u003e |\n\n### 3. Frontend\n\nFrom the **repo root** (new terminal):\n\n```bash\nnpm run dev\n```\n\nFrontend: **\u003chttp://localhost:5173\u003e**\n\n### 4. Run a research task\n\n1. Enter a query in the hero (e.g. *\"Compare pricing for top 5 CRM tools\"*).\n2. Click **Research** — the agent plans, navigates, extracts, and verifies.\n3. In the Output section: sort the table, export **CSV** or **HTML**, and click **Listen to Vera** for the spoken briefing.\n\n---\n\n## Tech stack\n\n| Layer | Technology |\n| ----- | ---------- |\n| **AI \u0026 vision** | Gemini 2.0 Flash |\n| **Browser** | Playwright (headless Chromium), screenshot-based only |\n| **Extraction** | Firecrawl API → Gemini vision fallback |\n| **Verification** | Perplexity API |\n| **Voice** | ElevenLabs TTS (Vera) |\n| **Backend** | FastAPI, WebSockets; **Google Cloud Run** deployment target |\n| **Frontend** | React, Vite, Tailwind |\n| **Infra** | Docker, Cloud Build, Terraform (`infra/`) |\n\n---\n\n## Architecture\n\nUser and frontend → backend (Cloud Run target) → Gemini, Playwright, Firecrawl, Perplexity, ElevenLabs.\n\n[![Voyance architecture](https://github.com/ibtisamafzal/voyance/blob/main/Architecture%20diagram.png)](https://github.com/ibtisamafzal/voyance/blob/main/Architecture%20diagram.png)\n\n---\n\n## Voyance mind map\n\n\u003e **From idea to implementation at a glance.**\n\u003e\n\u003e This mind map captures the core of Voyance for the Gemini Live Agent Challenge — from the problem and solution, through key features and technical stack, to user personas and submission requirements.\n\n![Voyance mind map for Gemini Live Agent Challenge](public/Voyance-mind-map.png)\n\n---\n\n## Environment variables\n\nCopy `backend/.env.example` to `backend/.env` and set:\n\n| Variable | Purpose |\n| -------- | ------- |\n| `GEMINI_API_KEY` | Google AI Studio |\n| `ELEVENLABS_API_KEY` | Vera TTS |\n| `FIRECRAWL_API_KEY` | Fast extraction |\n| `PERPLEXITY_API_KEY` | Fact verification |\n| `GOOGLE_CLOUD_PROJECT` | Optional (Firestore); in-memory fallback if unset |\n| `CONTACT_EMAIL` | Contact form recipient email (set in server env) |\n| `CONTACT_EMAIL_APP_PASSWORD` | Gmail App Password for SMTP contact form sending |\n\n---\n\n## Deployment\n\n- **Backend:** Google Cloud Run. Deploy with `infra/cloudbuild.yaml` from repo root:\n\n  ```bash\n  gcloud builds submit --config=infra/cloudbuild.yaml .\n  ```\n\n  Default: 1 GiB memory, 1 CPU (increase to 2 GiB if needed for Playwright).\n- **Frontend:** Host on Vercel or any static host; set `VITE_API_URL` to your Cloud Run URL (no trailing slash).\n\n**Troubleshooting:** Stuck on \"Connecting…\" → set `VITE_API_URL` on your host. WebSocket 403 → ensure no trailing slash in `VITE_API_URL`. OOM → increase memory in `cloudbuild.yaml`.\n\n---\n\n## Project structure\n\n```text\n├── src/app/              # React frontend\n│   ├── components/       # HeroSection, ResearchOutputSection, Navbar, etc.\n│   └── context/          # ResearchContext (shared state)\n├── backend/              # FastAPI backend\n│   ├── app/\n│   │   ├── agent.py      # Research loop (plan → navigate → extract → verify)\n│   │   ├── routers/      # Research, voice, health, sessions\n│   │   └── services/     # Gemini, Firecrawl, Perplexity, Playwright, ElevenLabs\n│   └── main.py\n└── infra/                # GCP automation\n    ├── cloudbuild.yaml   # Build \u0026 deploy to Cloud Run\n    └── main.tf           # Terraform\n```\n\n---\n\n## Community \u0026 write-ups\n\n- **Deep-dive blog**: [How We Built Voyance (DEV.to)](https://dev.to/ibtisamafzal/how-we-built-voyance-an-ai-agent-that-researches-the-web-by-seeing-it-214h)\n- **Reddit build log**: [How we built Voyance — an AI agent that researches the web by “seeing” it](https://www.reddit.com/user/IbtisamAfzal/comments/1rhtivl/how_we_built_voyance_an_ai_agent_that_researches/?utm_source=share\u0026utm_medium=web3x\u0026utm_name=web3xcss\u0026utm_term=1\u0026utm_content=share_button)\n- **Hackathon submission**: [Gemini Live Agent Challenge — UI Navigator track](https://geminiliveagentchallenge.devpost.com/)\n- **Source code**: [Voyance on GitHub](https://github.com/ibtisamafzal/voyance)\n- **GDG profile**: [g.dev/IbtisamAfzal](https://g.dev/IbtisamAfzal)\n\n## Contact\n\n| | |\n| --- | --- |\n| **Contact** | Use the in-app contact form (`/contact`) |\n| **LinkedIn** | [linkedin.com/in/ibtisamafzal](https://linkedin.com/in/ibtisamafzal/) |\n\n**Blog:** [How We Built Voyance (DEV)](https://dev.to/ibtisamafzal/how-we-built-voyance-an-ai-agent-that-researches-the-web-by-seeing-it-214h) · **Hackathon:** [Gemini Live Agent Challenge](https://geminiliveagentchallenge.devpost.com/) (see Devpost for current schedule)\n\n---\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fibtisamafzal%2Fvoyance","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fibtisamafzal%2Fvoyance","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fibtisamafzal%2Fvoyance/lists"}