{"id":26210355,"url":"https://github.com/aneeshpatne/curiosity","last_synced_at":"2026-05-10T05:51:35.853Z","repository":{"id":278899217,"uuid":"928989074","full_name":"aneeshpatne/Curiosity","owner":"aneeshpatne","description":"Curiosity: Search Agent – Multi-agent system using LLMs (GPT, Gemini) with DuckDuckGo, Playwright, and LangChain for web search, scraping, and detailed summaries with follow-ups.","archived":false,"fork":false,"pushed_at":"2025-03-02T07:23:14.000Z","size":3731,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-02T07:26:02.188Z","etag":null,"topics":["ai","fastapi","nextjs","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aneeshpatne.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-07T15:54:24.000Z","updated_at":"2025-03-02T07:23:16.000Z","dependencies_parsed_at":"2025-02-22T12:26:18.483Z","dependency_job_id":"ae26e34a-872f-4f6c-8977-1586a358f3ed","html_url":"https://github.com/aneeshpatne/Curiosity","commit_stats":null,"previous_names":["aneeshpatne/curiosity"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aneeshpatne%2FCuriosity","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aneeshpatne%2FCuriosity/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aneeshpatne%2FCuriosity/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aneeshpatne%2FCuriosity/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aneeshpatne","download_url":"https://codeload.github.com/aneeshpatne/Curiosity/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243176623,"owners_count":20248698,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","fastapi","nextjs","webscraping"],"created_at":"2025-03-12T07:28:59.276Z","updated_at":"2026-05-10T05:51:35.827Z","avatar_url":"https://github.com/aneeshpatne.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# 🔍 Curiosity\n\n### AI-Powered Search \u0026 News Intelligence Platform\n\n![Curiosity Bot](Frontend/curiosity/public/assets/bot.png)\n\n[![Next.js](https://img.shields.io/badge/Next.js-15.1.7-black?style=flat\u0026logo=next.js)](https://nextjs.org/)\n[![React](https://img.shields.io/badge/React-19.0-61DAFB?style=flat\u0026logo=react)](https://react.dev/)\n[![Python](https://img.shields.io/badge/Python-3.9+-3776AB?style=flat\u0026logo=python)](https://www.python.org/)\n[![FastAPI](https://img.shields.io/badge/FastAPI-Latest-009688?style=flat\u0026logo=fastapi)](https://fastapi.tiangolo.com/)\n[![License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)\n\n**An intelligent search agent that combines real-time web scraping, LLM-powered analysis, and automated news digests to deliver comprehensive, cited answers to your queries.**\n\n[Features](#-features) • [Installation](#-installation) • [Usage](#-usage) • [How It Works](#-how-it-works)\n\n\u003c/div\u003e\n\n---\n\n## 📋 Table of Contents\n\n- [Overview](#-overview)\n- [Features](#-features)\n- [Technology Stack](#-technology-stack)\n- [Installation](#-installation)\n- [Usage](#-usage)\n- [Project Structure](#-project-structure)\n- [How It Works](#-how-it-works)\n- [Components](#-components)\n- [Configuration](#-configuration)\n- [Contributing](#-contributing)\n- [License](#-license)\n\n---\n\n## 🌟 Overview\n\n**Curiosity** is a cutting-edge AI-powered search platform that revolutionizes how you gather and process information. Unlike traditional search engines that provide links, Curiosity scrapes, analyzes, and synthesizes content from multiple sources to deliver comprehensive, citation-backed answers in real-time.\n\nThe platform features two main components:\n\n1. **🔎 Curiosity Search** - An interactive chat interface with multiple search modes\n2. **📰 Curiosity Newsletter** - An automated daily news digest delivered to your inbox\n\n---\n\n## ✨ Features\n\n### 🔍 Curiosity Search\n\n#### Multiple Search Modes\n\n- **Normal Search** - Quick searches analyzing 7 sources with standard depth\n- **Pro Search** - Enhanced search examining 25 sources for comprehensive results\n- **Deep Search** - Recursive multi-level search that:\n  - Explores follow-up questions automatically\n  - Synthesizes information from 100+ sources\n  - Provides in-depth analysis from multiple perspectives\n\n#### Intelligent Features\n\n- **🔄 Real-time Updates** - Live status indicators showing search, scraping, and analysis progress\n- **📚 Source Citations** - Every claim is backed by numbered citations linking to original sources\n- **🎯 Smart Follow-ups** - AI-generated follow-up questions to explore topics deeper\n- **💬 Conversational Memory** - Maintains context across multiple queries\n- **⚡ Live Source Display** - See sources as they're discovered with favicon previews\n- **📱 Responsive UI** - Modern, dark-mode interface built with shadcn/ui\n\n### 📰 Curiosity Newsletter\n\n#### Automated News Intelligence\n\n- **🌍 Global News Coverage** - Automatically fetches top stories from multiple sources\n- **🤖 AI Summarization** - Condenses 20+ articles into structured, readable summaries\n- **📧 Email Delivery** - Beautiful HTML-formatted newsletters sent daily\n- **🔄 Deep Analysis** - Uses recursive search to provide context and depth\n- **⏰ Scheduled Execution** - Automated via cron jobs for daily delivery\n- **🎨 Rich Formatting** - Professionally styled email templates with responsive design\n\n---\n\n## 🛠 Technology Stack\n\n### Frontend\n\n| Technology                                       | Version | Purpose                               |\n| ------------------------------------------------ | ------- | ------------------------------------- |\n| [Next.js](https://nextjs.org/)                   | 15.1.7  | React framework with App Router       |\n| [React](https://react.dev/)                      | 19.0    | UI library                            |\n| [Socket.io Client](https://socket.io/)           | 4.8.1   | Real-time bidirectional communication |\n| [Tailwind CSS](https://tailwindcss.com/)         | 3.4.1   | Utility-first CSS framework           |\n| [shadcn/ui](https://ui.shadcn.com/)              | Latest  | High-quality UI components            |\n| [Marked](https://marked.js.org/)                 | 15.0.7  | Markdown parser and renderer          |\n| [DOMPurify](https://github.com/cure53/DOMPurify) | 3.2.4   | XSS sanitizer for HTML                |\n| [Lucide React](https://lucide.dev/)              | 0.475.0 | Icon library                          |\n\n### Backend\n\n| Technology                                                       | Purpose                     |\n| ---------------------------------------------------------------- | --------------------------- |\n| [Python](https://www.python.org/)                                | Core backend language       |\n| [FastAPI](https://fastapi.tiangolo.com/)                         | Modern async web framework  |\n| [Socket.io](https://socket.io/)                                  | Real-time server            |\n| [Playwright](https://playwright.dev/)                            | Headless browser automation |\n| [DuckDuckGo Search](https://github.com/deedy5/duckduckgo_search) | Privacy-focused search API  |\n| [LangChain](https://python.langchain.com/)                       | LLM orchestration framework |\n| [Pydantic](https://docs.pydantic.dev/)                           | Data validation             |\n\n### AI Models\n\n- **OpenAI GPT-4o-mini** - Fast summarization and agent reasoning\n- **OpenAI o1-mini** - Deep reasoning for complex queries\n- **Google Gemini 2.0 Flash** - High-speed content analysis\n- **Meta LLaMA 3.3** (via OpenRouter) - Alternative model support\n\n---\n\n## 🚀 Installation\n\n### Prerequisites\n\n- **Node.js** 18+ and npm/yarn\n- **Python** 3.9+\n- **OpenAI API Key**\n- **Google Gemini API Key** (optional)\n- **OpenRouter API Key** (optional)\n\n### Step 1: Clone the Repository\n\n```bash\ngit clone https://github.com/yourusername/curiosity.git\ncd curiosity\n```\n\n### Step 2: Backend Setup\n\n#### Install Python Dependencies\n\n```bash\n# Install required packages\npip install fastapi uvicorn socketio python-socketio playwright pydantic\npip install duckduckgo-search langchain langchain-openai langchain-google-genai\npip install python-dotenv markdown\n\n# Install Playwright browsers\nplaywright install chromium\n```\n\n#### Configure Environment Variables\n\nCreate a `.env` file in the root directory:\n\n```env\n# Required\nOPENAI_API_KEY=your_openai_api_key_here\n\n# Optional (for alternative models)\nGEMINI_API_KEY=your_gemini_api_key_here\nOPEN_ROUTER_KEY=your_openrouter_key_here\n\n# For Newsletter (Optional)\nSMTP_SERVER=smtp.gmail.com\nSMTP_PORT=587\nEMAIL_SENDER=your_email@gmail.com\nEMAIL_PASSWORD=your_app_password\nEMAIL_RECEIVER=recipient@email.com\n```\n\n#### Start the Backend Server\n\n```bash\n# From the Search directory\ncd Search\npython search-agent.py\n\n# Server will start on http://localhost:4000\n```\n\n### Step 3: Frontend Setup\n\n```bash\ncd Frontend/curiosity\n\n# Install dependencies\nnpm install\n\n# Start development server\nnpm run dev\n\n# Frontend will start on http://localhost:3000\n```\n\n### Step 4: Newsletter Setup (Optional)\n\n```bash\ncd News\n\n# Make the shell script executable\nchmod +x run_news_agent.sh\n\n# Run manually\npython news-agent.py\n\n# Or set up a cron job for daily execution\ncrontab -e\n# Add: 0 8 * * * /path/to/Curiosity/News/run_news_agent.sh\n```\n\n---\n\n## 📖 Usage\n\n### Starting the Application\n\n1. **Start Backend**:\n\n```bash\ncd Search\npython search-agent.py\n```\n\n2. **Start Frontend**:\n\n```bash\ncd Frontend/curiosity\nnpm run dev\n```\n\n3. **Access the Application**:\n   - Open your browser to `http://localhost:3000`\n\n### Using Different Search Modes\n\n#### Normal Search\n\n```\n1. Select \"Normal Search\" from dropdown\n2. Enter your query: \"What is quantum computing?\"\n3. Get results from ~7 sources with citations\n```\n\n#### Pro Search\n\n```\n1. Select \"Pro Search\" from dropdown\n2. Enter your query: \"Latest developments in AI research\"\n3. Get comprehensive results from ~25 sources\n```\n\n#### Deep Search\n\n```\n1. Select \"Deep Search\" from dropdown\n2. Enter complex query: \"Impact of climate change on global economy\"\n3. System will:\n   - Search initial query\n   - Generate 20 follow-up questions\n   - Recursively search each follow-up\n   - Synthesize 100+ sources into comprehensive answer\n```\n\n### Newsletter Usage\n\n```bash\n# Manual execution\npython News/news-agent.py\n\n# Automated daily execution (8 AM)\n# Add to crontab:\n0 8 * * * /path/to/Curiosity/News/run_news_agent.sh\n```\n\n---\n\n## 📂 Project Structure\n\n```\nCuriosity/\n├── Frontend/\n│   └── curiosity/\n│       ├── src/\n│       │   ├── app/\n│       │   │   ├── layout.js          # Root layout\n│       │   │   ├── page.js            # Home page\n│       │   │   └── globals.css        # Global styles\n│       │   ├── components/\n│       │   │   ├── chat.jsx           # Main chat interface\n│       │   │   └── ui/                # shadcn/ui components\n│       │   │       ├── button.jsx\n│       │   │       ├── input.jsx\n│       │   │       └── select.jsx\n│       │   └── lib/\n│       │       └── utils.js           # Utility functions\n│       ├── public/\n│       │   └── assets/                # Static assets\n│       ├── package.json\n│       ├── next.config.mjs\n│       ├── tailwind.config.mjs\n│       └── components.json\n│\n├── Search/\n│   ├── search-agent.py                # Main search agent with FastAPI server\n│   ├── deep-search.py                 # Standalone deep search implementation\n│   ├── combined_sources.txt           # Debug output (generated)\n│   └── Deprecated/                    # Legacy implementations\n│       ├── search.py\n│       ├── search-new.py\n│       ├── search_local.py\n│       └── test.py\n│\n├── News/\n│   ├── news-agent.py                  # Automated news summarization\n│   ├── run_news_agent.sh              # Shell script for cron execution\n│   ├── deepSearch.py                  # News-specific deep search (deprecated)\n│   ├── example.py\n│   ├── simple.py\n│   └── test.py\n│\n├── README.md\n└── .env                               # Environment variables (create this)\n```\n\n---\n\n## 🔬 How It Works\n\n### Search Flow\n\n```mermaid\nsequenceDiagram\n    participant User\n    participant Frontend\n    participant Backend\n    participant Scraper\n    participant LLM\n\n    User-\u003e\u003eFrontend: Enter query\n    Frontend-\u003e\u003eBackend: Send via WebSocket\n    Backend-\u003e\u003eBackend: Emit \"waiting\" status\n\n    Backend-\u003e\u003eDuckDuckGo: Search query\n    DuckDuckGo--\u003e\u003eBackend: Return URLs\n    Backend-\u003e\u003eFrontend: Emit sources\n    Backend-\u003e\u003eBackend: Emit \"scraping\" status\n\n    par Parallel Scraping\n        Backend-\u003e\u003eScraper: Scrape URL 1\n        Backend-\u003e\u003eScraper: Scrape URL 2\n        Backend-\u003e\u003eScraper: Scrape URL N\n    end\n\n    Scraper--\u003e\u003eBackend: Return content\n    Backend-\u003e\u003eBackend: Emit \"thinking\" status\n    Backend-\u003e\u003eLLM: Summarize with citations\n    LLM--\u003e\u003eBackend: Return summary + follow-ups\n    Backend-\u003e\u003eFrontend: Emit final response\n    Frontend-\u003e\u003eUser: Display with citations\n```\n\n### Deep Search Flow\n\n```mermaid\ngraph TD\n    A[User Query] --\u003e B[Initial Search]\n    B --\u003e C[Scrape 5 URLs]\n    C --\u003e D[Summarize]\n    D --\u003e E[Generate 20 Follow-ups]\n\n    E --\u003e F1[Follow-up 1]\n    E --\u003e F2[Follow-up 2]\n    E --\u003e F20[Follow-up 20]\n\n    F1 --\u003e G1[Scrape 5 URLs]\n    F2 --\u003e G2[Scrape 5 URLs]\n    F20 --\u003e G20[Scrape 5 URLs]\n\n    G1 --\u003e H1[Summarize]\n    G2 --\u003e H2[Summarize]\n    G20 --\u003e H20[Summarize]\n\n    H1 --\u003e I[Combine All Summaries]\n    H2 --\u003e I\n    H20 --\u003e I\n\n    I --\u003e J[Final LLM Synthesis]\n    J --\u003e K[Comprehensive Answer]\n```\n\n### Component Details\n\n#### 1. Web Scraping\n\n```python\n# Concurrent scraping with semaphore control\nasync def scrape_page(context, url: str) -\u003e str:\n    async with semaphore:  # Limit to 7 concurrent requests\n        page = await context.new_page()\n        # Block images, stylesheets, fonts for speed\n        await page.route(\"**/*\", block_requests)\n        await page.goto(url, wait_until='domcontentloaded')\n        # Extract text content from semantic elements\n        text_blocks = await page.locator(\"body p, h1, h2, h3\").all_text_contents()\n        return cleaned_text[:5000]  # First 5KB of content\n```\n\n#### 2. LLM Summarization\n\n```python\n# Structured output with citations and follow-ups\nclass SummaryFormat(BaseModel):\n    content: str  # Markdown summary with [1] [2] citations\n    moreQtn: list[str]  # 5-20 follow-up questions\n\n# Chain: Prompt → LLM → Parser → Retry on Error\nchain = prompt | llm | StrOutputParser()\nretry_parser = RetryWithErrorOutputParser(parser=parser, max_retries=3)\n```\n\n#### 3. Real-time Communication\n\n```javascript\n// Frontend emits query\nsocket.emit(\"message\", { id, text: query, searchType });\n\n// Backend emits updates\nawait sio.emit(\"status\", { id, status: \"searching\" });\nawait sio.emit(\"sources\", { id, sources: urls });\nawait sio.emit(\"message\", { id, text: summary, status: \"finished\" });\n```\n\n---\n\n## 🧩 Components\n\n### Backend Components\n\n#### `search-agent.py`\n\nThe main FastAPI server that orchestrates the entire search process:\n\n- **FastAPI Server** - Handles HTTP and WebSocket connections\n- **Socket.io Integration** - Real-time bidirectional communication\n- **Search Orchestration** - Manages search, scrape, summarize pipeline\n- **LLM Chain Management** - Coordinates multiple LLM calls with retry logic\n- **Memory Management** - Maintains conversation context\n- **Deep Search Engine** - Recursive multi-level search implementation\n\nKey Functions:\n\n- `follow_up()` - Main query handler with search type routing\n- `deep_search()` - Recursive search with depth control\n- `scrape_contents()` - Parallel web scraping\n- `summarize()` - LLM-powered summarization with citations\n- `generate_final_summary()` - Deep search synthesis\n\n#### `deep-search.py`\n\nStandalone implementation of deep search for testing and development:\n\n- Source tracking with global citation counter\n- Recursive question exploration\n- Citation preservation across levels\n- Final synthesis from all sources\n\n#### `news-agent.py`\n\nAutomated news aggregation and email delivery:\n\n- Global news search\n- Recursive deep search for context\n- HTML email generation with styling\n- SMTP email delivery\n- Browser preview for testing\n\n### Frontend Components\n\n#### `chat.jsx`\n\nMain chat interface with real-time updates:\n\n- **Message Management** - State handling for sent/received messages\n- **Socket.io Integration** - Event listeners for status, sources, messages\n- **Search Type Selection** - Dropdown for Normal/Pro/Deep modes\n- **Real-time Status** - Loading indicators and progress updates\n- **Source Display** - Live URL cards with favicons\n- **Markdown Rendering** - Safe HTML rendering with DOMPurify\n- **Citation Linking** - Interactive superscript citations\n- **Follow-up Questions** - Clickable suggestions\n\nComponents:\n\n- `Chat` - Main container component\n- `SentMessage` - User query display\n- `ReceivedMessage` - AI response with sources and citations\n- `MarkdownRenderer` - Safe markdown to HTML conversion\n- `Citation` - Interactive citation superscripts\n- `Sources` - URL preview cards\n- `FollowUp` - Follow-up question suggestions\n\n---\n\n## ⚙ Configuration\n\n### LLM Model Selection\n\nEdit the model configuration in `search-agent.py`:\n\n```python\n# For faster, cheaper responses\nagent_llm = ChatOpenAI(model='gpt-5-mini', api_key=SecretStr(api_key))\nsummary_llm = ChatOpenAI(model='gpt-5-mini', api_key=SecretStr(api_key))\n\n# For higher quality, deeper reasoning\ndeep_search_llm = ChatOpenAI(model='gpt-5', api_key=SecretStr(api_key))\n\n# For alternative providers\nsummary_llm = ChatOpenAI(\n    base_url='https://openrouter.ai/api/v1',\n    model='meta-llama/llama-3.3-70b-instruct:nitro',\n    api_key=SecretStr(openRouterKey)\n)\n```\n\n### Search Parameters\n\nCustomize search depth and source count:\n\n```python\n# Number of concurrent scraping tasks\nsemaphore = asyncio.Semaphore(7)  # Adjust based on system resources\n\n# Search result counts\nnormal_search_results = 7\npro_search_results = 25\ndeep_search_results = 5  # Per query level\n\n# Deep search recursion depth\ndeep_search_depth = 2  # Levels of follow-up questions\n\n# Number of follow-up questions\nfollow_up_questions = 20  # For deep search\n```\n\n### Frontend Configuration\n\nEdit Socket.io connection in `chat.jsx`:\n\n```javascript\n// Change backend URL\nconst socket = io(\"http://localhost:4000\");\n\n// For production\nconst socket = io(process.env.NEXT_PUBLIC_BACKEND_URL);\n```\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n### ⭐ Star this repository if you find it helpful!\n\n**Made with ❤️ and curiosity**\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faneeshpatne%2Fcuriosity","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faneeshpatne%2Fcuriosity","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faneeshpatne%2Fcuriosity/lists"}