{"id":43173209,"url":"https://github.com/pollinations/search.pollinations","last_synced_at":"2026-02-01T02:37:31.501Z","repository":{"id":297381637,"uuid":"996583978","full_name":"pollinations/search.pollinations","owner":"pollinations","description":"Research Based Project on Crawlers and Rankers with a LLM Supported Native Search Engine ","archived":false,"fork":false,"pushed_at":"2026-01-14T10:03:55.000Z","size":47464,"stargazers_count":11,"open_issues_count":2,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-14T11:24:26.682Z","etag":null,"topics":["cloud-computing","docker","langchain","langgraph","python"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pollinations.png","metadata":{"files":{"readme":"README.md","changelog":"news_fetch/.env.example","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-05T06:50:05.000Z","updated_at":"2026-01-14T10:04:00.000Z","dependencies_parsed_at":"2025-07-11T17:20:38.899Z","dependency_job_id":"17787911-9a18-4764-8f7b-9bfeb5a91639","html_url":"https://github.com/pollinations/search.pollinations","commit_stats":null,"previous_names":["circuit-overtime/elixpo-search-agent","circuit-overtime/search.pollinations","pollinations/search.pollinations"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/pollinations/search.pollinations","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pollinations%2Fsearch.pollinations","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pollinations%2Fsearch.pollinations/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pollinations%2Fsearch.pollinations/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pollinations%2Fsearch.pollinations/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pollinations","download_url":"https://codeload.github.com/pollinations/search.pollinations/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pollinations%2Fsearch.pollinations/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28965430,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-01T02:14:24.993Z","status":"ssl_error","status_checked_at":"2026-02-01T02:13:55.706Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cloud-computing","docker","langchain","langgraph","python"],"created_at":"2026-02-01T02:37:30.856Z","updated_at":"2026-02-01T02:37:31.492Z","avatar_url":"https://github.com/pollinations.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Elixpo Search Agent\n\n![Elixpo Logo](https://github.com/user-attachments/assets/98fb5606-2466-49cc-836b-bc4cf088e283)\n\nA Python-based web search and synthesis API that processes user queries, performs web and YouTube searches, scrapes content, and generates detailed Markdown answers with sources and images. Built for extensibility, robust error handling, and efficient information retrieval using modern async APIs and concurrency.\n\n**NEW: Now features an IPC-based embedding model server for optimized GPU resource usage and better scalability!**\n\n---\n\n\n### Before (Legacy):\n```\nApp Worker 1 → Local Embedding Model (GPU Memory: ~1GB)\nApp Worker 2 → Local Embedding Model (GPU Memory: ~1GB)  \nApp Worker 3 → Local Embedding Model (GPU Memory: ~1GB)\nTotal GPU Usage: ~6GB\n```\n\n### After (IPC):\n```\nApp Worker 1 ──┐\nApp Worker 2 ──┤→ IPC → Embedding Server (GPU Memory: ~2GB)\nApp Worker 3 ──┘\nTotal GPU Usage: ~2GB (67% reduction!)\n```\n\n\n## Architecture Overview\n\nThe system uses an Inter-Process Communication (IPC) architecture with browser automation and agent pooling to optimize resource usage and enable horizontal scaling:\n\n```mermaid\ngraph TB\n  subgraph \"Client Layer\"\n    A1[App Worker 1\u003cbr/\u003ePort: 5000\u003cbr/\u003e⚡ Async Queue]\n    A2[App Worker 2\u003cbr/\u003ePort: 5001\u003cbr/\u003e⚡ Async Queue]  \n    A3[App Worker N\u003cbr/\u003ePort: 500X\u003cbr/\u003e⚡ Async Queue]\n  end\n  \n  subgraph \"IPC Communication Layer\"\n    IPC[IPC Manager\u003cbr/\u003eBaseManager\u003cbr/\u003ePort: 5002]\n  end\n  \n  subgraph \"Model Server Layer\"\n    ES[Embedding Server\u003cbr/\u003e🔥 GPU Optimized]\n    SAP[Search Agent Pool\u003cbr/\u003e🌐 Browser Automation]\n    PM[Port Manager\u003cbr/\u003e🔌 Port: 9000-9999]\n  end\n  \n  subgraph \"Embedding Services\"\n    ES --\u003e EM[SentenceTransformer\u003cbr/\u003eall-MiniLM-L6-v2\u003cbr/\u003e💾 ThreadPoolExecutor]\n    ES --\u003e CS[Cosine Similarity\u003cbr/\u003e🎯 Top-K Matching]\n  end\n  \n  subgraph \"Search Agents\"\n    SAP --\u003e YTA[Yahoo Text Agents\u003cbr/\u003e🔍 Max 20 tabs/agent]\n    SAP --\u003e YIA[Yahoo Image Agents\u003cbr/\u003e🖼️ Max 20 tabs/agent]\n    YTA --\u003e P1[Playwright Instance 1\u003cbr/\u003ePort: 9XXX]\n    YTA --\u003e P2[Playwright Instance 2\u003cbr/\u003ePort: 9XXX]\n    YIA --\u003e P3[Playwright Instance 3\u003cbr/\u003ePort: 9XXX]\n    YIA --\u003e P4[Playwright Instance 4\u003cbr/\u003ePort: 9XXX]\n  end\n  \n  subgraph \"External Services\"\n    YS[Yahoo Search Results]\n    YI[Yahoo Image Search]\n    WEB[Web Scraping]\n    YT[YouTube Transcripts\u003cbr/\u003e📹 Rate Limited: 20/min]\n    LLM[Pollinations LLM API\u003cbr/\u003e🤖 AI Synthesis]\n  end\n  \n  subgraph \"Request Processing\"\n    RQ[Request Queue\u003cbr/\u003e📦 Max: 100]\n    PS[Processing Semaphore\u003cbr/\u003e🚦 Max: 15 concurrent]\n    AR[Active Requests\u003cbr/\u003e📊 Tracking \u0026 Stats]\n  end\n  \n  A1 -.-\u003e|TCP:5002\u003cbr/\u003eauthkey| IPC\n  A2 -.-\u003e|TCP:5002\u003cbr/\u003eauthkey| IPC\n  A3 -.-\u003e|TCP:5002\u003cbr/\u003eauthkey| IPC\n  \n  A1 --\u003e RQ\n  A2 --\u003e RQ\n  A3 --\u003e RQ\n  RQ --\u003e PS\n  PS --\u003e AR\n  \n  IPC \u003c--\u003e ES\n  IPC \u003c--\u003e SAP\n  SAP \u003c--\u003e PM\n  \n  P1 --\u003e YS\n  P2 --\u003e YS\n  P3 --\u003e YI\n  P4 --\u003e YI\n  \n  A1 --\u003e WEB\n  A2 --\u003e WEB\n  A3 --\u003e WEB\n  \n  A1 --\u003e YT\n  A2 --\u003e YT\n  A3 --\u003e YT\n  \n  A1 --\u003e LLM\n  A2 --\u003e LLM\n  A3 --\u003e LLM\n  \n  classDef serverNode fill:#e1f5fe,stroke:#01579b,stroke-width:2px\n  classDef workerNode fill:#f3e5f5,stroke:#4a148c,stroke-width:2px\n  classDef modelNode fill:#fff3e0,stroke:#e65100,stroke-width:3px\n  classDef externalNode fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px\n  classDef browserNode fill:#fce4ec,stroke:#880e4f,stroke-width:2px\n  classDef queueNode fill:#f1f8e9,stroke:#33691e,stroke-width:2px\n  \n  class ES,EM,CS modelNode\n  class A1,A2,A3 workerNode\n  class IPC serverNode\n  class YS,YI,WEB,YT,LLM externalNode\n  class SAP,YTA,YIA,P1,P2,P3,P4,PM browserNode\n  class RQ,PS,AR queueNode\n```\n\n### Key Architectural Components:\n\n1. **🔄 Request Processing Pipeline**\n   - Async request queue (max 100 pending)\n   - Processing semaphore (max 15 concurrent)\n   - Active request tracking with statistics\n\n2. **🌐 Browser Automation Pool**\n   - Pre-warmed Playwright agents for immediate use\n   - Automatic agent rotation after 20 tabs\n   - Dynamic port allocation (9000-9999 range)\n   - Separate pools for text and image search\n\n3. **🧠 IPC Embedding System**\n   - Single GPU instance with ThreadPoolExecutor\n   - Thread-safe operations with semaphore control\n   - Cosine similarity for semantic matching\n\n4. **📊 Performance Monitoring**\n   - Real-time request statistics\n   - Agent pool status tracking\n   - Port usage monitoring\n   - Health check endpoints\n\n\n### Key Benefits of IPC Architecture:\n\n1. **🎯 Single GPU Instance**: Only one embedding model loads on GPU, reducing memory usage\n2. **⚡ Concurrent Processing**: Multiple app workers can use embeddings simultaneously\n3. **🔄 Load Balancing**: Requests are queued and processed efficiently\n4. **💰 Cost Optimization**: Significantly reduced GPU memory requirements\n5. **📈 Horizontal Scaling**: Easy to add more app workers without additional GPU load\n6. **🛡️ Fault Isolation**: Embedding server failures don't crash app workers\n7. **🔧 Hot Reloading**: Can restart app workers without reloading heavy embedding model\n\n---\n\n## Features\n\n### 1. **Advanced Search \u0026 Synthesis**\n- Accepts user queries and processes them using web search, YouTube transcript analysis, and AI-powered synthesis.\n- Produces comprehensive Markdown responses with inline citations and images.\n- Handles complex, multi-step queries with iterative tool use.\n\n### 2. **Web Search \u0026 Scraping**\n- Scrapes main text and images from selected URLs (after evaluating snippets).\n- Avoids scraping irrelevant or search result pages.\n\n### 3. **YouTube Integration**\n- Extracts metadata and transcripts from YouTube videos.\n- Presents transcripts as clean, readable text.\n\n### 4. **AI-Powered Reasoning**\n- Uses Pollinations API for LLM-based planning and synthesis.\n- Iteratively calls tools (web search, scraping, YouTube, timezone) as needed.\n- Gathers evidence from multiple sources before answering.\n\n### 5. **REST API (Quart)**\n- Exposes `/search` (JSON) and `/search/sse` (Server-Sent Events) endpoints.\n- Supports both GET and POST requests, including OpenAI-compatible message format.\n- CORS enabled for web front-ends.\n\n### 6. **Concurrency \u0026 Performance**\n- Uses async and thread pools for parallel web scraping and YouTube processing.\n- Handles multiple requests efficiently.\n\n---\n\n## File Structure\n\n- **`app.py`**  \n  Main Quart API server. Handles `/search`, `/search/sse`, and OpenAI-compatible `/v1/chat/completions` endpoints. Manages async event streams and JSON responses.\n\n- **`searchPipeline.py`**  \n  Core pipeline logic. Orchestrates tool calls (web search, scraping, YouTube, timezone), interacts with Pollinations LLM API, and formats Markdown answers with sources and images.\n\n### 🆕 IPC Embedding System:\n- **`modelServer.py`**  \n  The new IPC-based embedding server that runs on port 5002. Handles SentenceTransformer model, FAISS indexing, and web search with embeddings.\n\n- **`embeddingClient.py`**  \n  Client module for connecting to the embedding server. Provides thread-safe access with automatic reconnection.\n\n- **`textEmbedModel.py`**  \n  Updated legacy module with backward compatibility. Automatically switches between IPC and local models based on configuration.\n\n- **`start_embedding_server.py`**  \n  Startup script for launching the embedding server with proper monitoring and graceful shutdown.\n\n- **`test_embedding_ipc.py`**  \n  Test suite for validating IPC connection and embedding functionality.\n\n### Other modules:  \n- `clean_query.py`, `search.py`, `scrape.py`, `getYoutubeDetails.py`, `tools.py`, `getTimeZone.py`: Tool implementations for query cleaning, web search, scraping, YouTube, and timezone handling.\n- `.env`: Environment variables for API tokens and model config.\n- `requirements.txt`: Python dependencies.\n- `Dockerfile`, `docker-compose.yml`: Containerization and deployment.\n\n---\n\n## Usage\n\n### Prerequisites\n\n- Python 3.12\n- Install dependencies:\n  ```bash\n  pip install -r requirements.txt\n  ```\n- Set up `.env` with required API tokens.\n\n### 🚀 Running with IPC Embedding Server (Recommended)\n\n#### 1. Start the Embedding Server\n```bash\n# Terminal 1: Start the embedding server\ncd search/PRODUCTION\npython start_embedding_server.py\n```\n\nThe embedding server will start on port 5002 and load the SentenceTransformer model onto available GPU.\n\n#### 2. Test the IPC Connection\n```bash\n# Terminal 2: Test the embedding server\npython test_embedding_ipc.py\n```\n\n#### 3. Start App Workers\n```bash\n# Terminal 3: Start first app worker\ncd src\npython app.py\n\n# Terminal 4: Start additional workers on different ports\nPORT=5001 python app.py\nPORT=5002 python app.py\n```\n\n### 📊 Monitoring\n\n- **Embedding Server**: Monitor GPU usage and active operations through logs\n- **App Workers**: Each worker connects independently to the embedding server\n- **Health Check**: Use the test script to verify IPC connectivity\n\n### 🔧 Configuration\n\nSet environment variables:\n```bash\n# Enable/disable IPC embedding (default: true)\nexport USE_IPC_EMBEDDING=true\n\n# Embedding server configuration\nexport EMBEDDING_SERVER_HOST=localhost\nexport EMBEDDING_SERVER_PORT=5002\n```\n\n### 🔄 Fallback Mode\n\nIf the embedding server is unavailable, the system automatically falls back to local embedding models, ensuring service continuity.\n\n### Running Locally (Legacy Mode)\n\n```bash\n# Disable IPC and use local models\nexport USE_IPC_EMBEDDING=false\npython app.py\n```\n- API available at `http://127.0.0.1:5000/search`\n\n### Example API Queries\n\n#### Simple POST (JSON)\n```bash\ncurl -X POST http://localhost:5000/search \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"query\": \"What are the latest trends in AI research? Summarize this YouTube video https://www.youtube.com/watch?v=dQw4w9WgXcQ\"}'\n```\n\n#### OpenAI-Compatible POST\n```bash\ncurl -X POST http://localhost:5000/search \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"messages\": [\n      {\"role\": \"user\", \"content\": \"Tell me about the history of the internet.\"}\n    ]\n  }'\n```\n\n#### SSE Streaming\n```bash\ncurl -N -X POST http://localhost:5000/search/sse \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"query\": \"weather in London tomorrow\"}'\n```\n\n---\n\n## API Endpoints\n\n- **`/search`**  \n  - `POST`/`GET`  \n  - Accepts `{\"query\": \"...\"}`\n  - Also supports OpenAI-style `{\"messages\": [...]}`\n\n- **`/search/sse`**  \n  - `POST`  \n  - Streams results as Server-Sent Events (SSE)\n\n- **`/v1/chat/completions`**  \n  - OpenAI-compatible chat completions endpoint\n\n---\n\n## Configuration\n\n### Environment Variables\n\nSet environment variables in `.env`:\n```bash\n# Pollinations API\nTOKEN=your_pollinations_token\nMODEL=your_model_name\nREFERRER=your_referrer\n\n# IPC Embedding Configuration\nUSE_IPC_EMBEDDING=true\nEMBEDDING_SERVER_HOST=localhost\nEMBEDDING_SERVER_PORT=5002\n\n# Worker Configuration  \nPORT=5000\nMAX_CONCURRENT_OPERATIONS=3\n```\n\n### Scaling Configuration\n\n- **Embedding Server**: Adjust `MAX_CONCURRENT_OPERATIONS` in `modelServer.py`\n- **App Workers**: Set different `PORT` values for multiple workers\n- **Memory Management**: Configure batch sizes and GPU memory fractions as needed\n\n---\n\n## Performance Optimizations\n\n### GPU Memory Management\n- Single embedding model instance shared across all workers\n- Automatic GPU memory cleanup after operations\n- Configurable batch sizes for large document processing\n\n### Concurrency Controls\n- Semaphore-based operation limiting\n- Thread-safe GPU operations\n- Automatic retry logic with exponential backoff\n\n### Caching \u0026 Efficiency\n- LRU cache for frequently accessed embeddings\n- Connection pooling for web requests\n- Async processing for I/O operations\n\n---\n\n## API Endpoints\n\n- **`/search`**  \n  - `POST`/`GET`  \n  - Accepts `{\"query\": \"...\"}`\n  - Also supports OpenAI-style `{\"messages\": [...]}`\n\n- **`/search/sse`**  \n  - `POST`  \n  - Streams results as Server-Sent Events (SSE)\n\n- **`/v1/chat/completions`**  \n  - OpenAI-compatible chat completions endpoint\n\n### Health Check Endpoints\n- **`/health`** - App worker health status\n- **`/embedding/health`** - Embedding server connectivity status\n- **`/embedding/stats`** - Active operations and performance metrics\n\n---\n\n## Deployment\n\n### Docker Deployment\n```bash\n# Build and run with docker-compose\ndocker-compose up --build\n\n# Scale app workers\ndocker-compose up --scale search-app=3\n```\n\n### Kubernetes Deployment\n```yaml\n# Example scaling configuration\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: search-embedding-server\nspec:\n  replicas: 1  # Single embedding server\n  selector:\n    matchLabels:\n      app: embedding-server\n---\napiVersion: apps/v1  \nkind: Deployment\nmetadata:\n  name: search-app-workers\nspec:\n  replicas: 5  # Multiple app workers\n  selector:\n    matchLabels:\n      app: search-app\n```\n\n---\n\n## Troubleshooting\n\n### Common Issues\n\n1. **Embedding Server Connection Failed**\n   ```bash\n   # Check if server is running\n   netstat -tulpn | grep 5002\n   \n   # Test connection\n   python test_embedding_ipc.py\n   ```\n\n2. **GPU Out of Memory**\n   ```bash\n   # Reduce batch size in modelServer.py\n   # Lower MAX_CONCURRENT_OPERATIONS\n   # Check GPU memory: nvidia-smi\n   ```\n\n3. **High Latency**\n   ```bash\n   # Monitor active operations\n   # Scale up app workers if needed\n   # Check network latency between workers and embedding server\n   ```\n\n### Logs and Monitoring\n- Embedding server logs: Check `modelServer.py` output\n- App worker logs: Check individual `app.py` instances  \n- System metrics: Monitor GPU usage, memory, and CPU\n- Connection health: Use test scripts regularly\n\n---\n\n## Migration Guide\n\n### From Legacy to IPC System\n\n1. **Backup Current Setup**\n2. **Install New Dependencies**: `pip install loguru`\n3. **Start Embedding Server**: `python start_embedding_server.py`\n4. **Test Connection**: `python test_embedding_ipc.py`\n5. **Update Environment**: Set `USE_IPC_EMBEDDING=true`\n6. **Restart App Workers**: They will automatically use IPC\n7. **Monitor Performance**: Check logs and resource usage\n\n### Rollback Plan\nSet `USE_IPC_EMBEDDING=false` to return to local embedding models.\n\n---\n\n## Quick Start 🚀\n\n### Option 1: Automated Service Manager (Recommended)\n\n#### Linux/macOS:\n```bash\ncd search/PRODUCTION\npython service_manager.py --workers 3 --port 5000\n```\n\n#### Windows:\n```powershell\ncd search/PRODUCTION\n.\\start_services.ps1 -Workers 3 -BasePort 5000\n```\n\n### Option 2: Manual Setup\n\n1. **Start Embedding Server**:\n   ```bash\n   cd search/PRODUCTION\n   python start_embedding_server.py\n   ```\n\n2. **Test Connection**:\n   ```bash\n   python test_embedding_ipc.py\n   ```\n\n3. **Start App Workers**:\n   ```bash\n   cd src\n   PORT=5000 python app.py \u0026\n   PORT=5001 python app.py \u0026\n   PORT=5002 python app.py \u0026\n   ```\n\n### Access Points\n- **Search API**: `http://localhost:5000/search`\n- **Health Check**: `http://localhost:5000/health`\n- **Embedding Health**: `http://localhost:5000/embedding/health`\n- **Embedding Stats**: `http://localhost:5000/embedding/stats`\n\n---\n\n## Limitations\n\n- Relies on Pollinations API for LLM responses (subject to their rate limits).\n- Requires internet connectivity for search and scraping.\n- YouTube transcript extraction depends on third-party services.\n- **NEW**: Embedding server requires sufficient GPU memory for optimal performance.\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpollinations%2Fsearch.pollinations","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpollinations%2Fsearch.pollinations","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpollinations%2Fsearch.pollinations/lists"}