{"id":37071782,"url":"https://github.com/caesar0301/wizsearch","last_synced_at":"2026-01-14T08:25:50.348Z","repository":{"id":319073089,"uuid":"1077462229","full_name":"caesar0301/wizsearch","owner":"caesar0301","description":"A unified Python library for searching across multiple search engines with a consistent interface.","archived":false,"fork":false,"pushed_at":"2025-10-16T14:50:25.000Z","size":64,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-28T21:20:15.618Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/caesar0301.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-16T09:30:11.000Z","updated_at":"2025-10-16T14:50:29.000Z","dependencies_parsed_at":"2025-10-18T01:41:39.711Z","dependency_job_id":null,"html_url":"https://github.com/caesar0301/wizsearch","commit_stats":null,"previous_names":["caesar0301/wizsearch"],"tags_count":1,"template":false,"template_full_name":"mirasurf/pytemplate","purl":"pkg:github/caesar0301/wizsearch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caesar0301%2Fwizsearch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caesar0301%2Fwizsearch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caesar0301%2Fwizsearch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caesar0301%2Fwizsearch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/caesar0301","download_url":"https://codeload.github.com/caesar0301/wizsearch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caesar0301%2Fwizsearch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28413936,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T08:16:59.381Z","status":"ssl_error","status_checked_at":"2026-01-14T08:13:45.490Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-14T08:25:49.710Z","updated_at":"2026-01-14T08:25:50.341Z","avatar_url":"https://github.com/caesar0301.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# WizSearch\n\n[![CI](https://github.com/caesar0301/wizsearch/actions/workflows/ci.yml/badge.svg)](https://github.com/caesar0301/wizsearch/actions/workflows/ci.yml)\n[![PyPI version](https://img.shields.io/pypi/v/wizsearch.svg)](https://pypi.org/project/wizsearch/)\n\nA unified Python library for searching across multiple search engines with a consistent interface. WizSearch enables concurrent multi-engine searches with intelligent result merging, page crawling capabilities, and optional semantic search integration.\n\n## Features\n\n- **Multiple Search Engines**: Baidu, Bing, Brave, DuckDuckGo, Google, Google AI, SearxNG, Tavily, WeChat (with sogou engine)\n- **Unified Interface**: Single API for all search engines with consistent `SearchResult` format\n- **Multi-Engine Aggregation**: Concurrent searches across multiple engines with round-robin result merging\n- **Page Crawling**: Built-in web page content extraction using Crawl4AI\n- **Semantic Search**: Optional vector-based semantic search with local storage and web fallback\n- **Async/Await Support**: Full asynchronous API for high performance\n\n## Installation\n\n```bash\n# Basic installation\npip install wizsearch\n\n# With development dependencies\npip install wizsearch[dev]\n```\n\n## Quick Start\n\n### Basic Single Engine Search\n\n```python\nimport asyncio\nfrom wizsearch import DuckDuckGoSearch, DuckDuckGoSearchConfig\n\nasync def search_example():\n    # Initialize a single search engine\n    config = DuckDuckGoSearchConfig(max_results=5)\n    searcher = DuckDuckGoSearch(config=config)\n\n    # Perform search\n    results = await searcher.search(\"Python async programming\")\n\n    # Access results\n    print(f\"Query: {results.query}\")\n    print(f\"Found {len(results.sources)} results\\n\")\n\n    for source in results.sources:\n        print(f\"Title: {source.title}\")\n        print(f\"URL: {source.url}\")\n        print(f\"Content: {source.content[:100]}...\")\n        print()\n\nasyncio.run(search_example())\n```\n\n### Multi-Engine Search with WizSearch\n\nWizSearch automatically discovers and runs searches across multiple engines concurrently, then merges results using a round-robin approach to maintain diversity.\n\n```python\nimport asyncio\nfrom wizsearch import WizSearch, WizSearchConfig\n\nasync def multi_engine_search():\n    # Auto-enable all available engines\n    wizsearch = WizSearch()\n\n    # Or configure specific engines\n    config = WizSearchConfig(\n        enabled_engines=[\"duckduckgo\", \"tavily\", \"brave\"],\n        max_results_per_engine=10,\n        timeout=30,\n        fail_silently=True  # Continue even if some engines fail\n    )\n    wizsearch = WizSearch(config=config)\n\n    # Search across all enabled engines\n    results = await wizsearch.search(\"machine learning tutorials\")\n\n    print(f\"Total unique results: {len(results.sources)}\")\n    print(f\"Response time: {results.response_time:.2f}s\")\n\n    for i, source in enumerate(results.sources[:5], 1):\n        print(f\"{i}. {source.title}\")\n        print(f\"   {source.url}\\n\")\n\nasyncio.run(multi_engine_search())\n```\n\n## Detailed Usage\n\n### Available Search Engines\n\nWizSearch supports the following search engines, each with its own configuration:\n\n| Engine | Class Name | API Key Required | Notes |\n|--------|-----------|-----------------|-------|\n| DuckDuckGo | `DuckDuckGoSearch` | No | Free, no rate limits |\n| Tavily | `TavilySearch` | Yes | AI-optimized search, requires `TAVILY_API_KEY` |\n| Google AI | `GoogleAISearch` | Yes | Requires `GOOGLE_API_KEY` |\n| SearxNG | `SearxNGSearch` | No | Self-hosted metasearch engine |\n| Baidu | `BaiduSearch` | No | Chinese search engine (via tarzi) |\n| WeChat | `WeChatSearch` | No | WeChat article search (via tarzi) |\n| Brave | `BraveSearch` | No | Browser-based scraping (via tarzi) |\n| Bing | `BingSearch` | No | Browser-based scraping, anti-bot protection (via tarzi) |\n| Google | `GoogleSearch` | No | Browser-based scraping, anti-bot protection (via tarzi) |\n\n### Engine-Specific Examples\n\n#### DuckDuckGo Search\n\n```python\nfrom wizsearch import DuckDuckGoSearch, DuckDuckGoSearchConfig\n\nconfig = DuckDuckGoSearchConfig(\n    max_results=10,\n    region=\"us-en\",  # Region setting\n    safesearch=\"moderate\",  # \"on\", \"moderate\", or \"off\"\n    timelimit=\"m\",  # Time limit: \"d\" (day), \"w\" (week), \"m\" (month), \"y\" (year)\n    backend=\"auto\"\n)\nsearcher = DuckDuckGoSearch(config=config)\nresults = await searcher.search(\"climate change\")\n```\n\n#### Tavily Search (Advanced Features)\n\n```python\nfrom wizsearch import TavilySearch, TavilySearchConfig\nimport os\n\n# Set API key\nos.environ[\"TAVILY_API_KEY\"] = \"your-api-key\"\n\nconfig = TavilySearchConfig(\n    max_results=5,\n    search_depth=\"advanced\",  # \"basic\" or \"advanced\"\n    include_domains=[\"arxiv.org\", \"scholar.google.com\"],\n    exclude_domains=[\"youtube.com\"],\n    include_answer=True,  # Get AI-generated answer\n    include_images=True\n)\n\nsearcher = TavilySearch(config=config)\nresults = await searcher.search(\n    query=\"quantum computing breakthroughs\",\n    search_depth=\"advanced\",\n    include_domains=[\"nature.com\", \"science.org\"]\n)\n\n# Access AI-generated answer\nif results.answer:\n    print(f\"Answer: {results.answer}\")\n\n# Access images\nfor image_url in results.images:\n    print(f\"Image: {image_url}\")\n```\n\n#### Google AI Search\n\n```python\nfrom wizsearch import GoogleAISearch\nimport os\n\n# Set API key (or set GOOGLE_API_KEY environment variable)\nos.environ[\"GOOGLE_API_KEY\"] = \"your-google-api-key\"\n\nsearcher = GoogleAISearch()\nresults = await searcher.search(\n    query=\"neural network architectures\",\n    num_results=5\n)\n\n# Image search\nimage_results = await searcher.search(\n    query=\"data visualization examples\",\n    search_type=\"image\",\n    num_results=10\n)\n```\n\n### WizSearch Configuration\n\n```python\nfrom wizsearch import WizSearch, WizSearchConfig\n\n# Get all available engines\navailable = WizSearch.get_available_engines()\nprint(f\"Available engines: {available}\")\n\n# Custom configuration\nconfig = WizSearchConfig(\n    enabled_engines=[\"duckduckgo\", \"tavily\", \"brave\"],\n    max_results_per_engine=10,  # Results per engine\n    timeout=30,  # Timeout in seconds\n    fail_silently=True  # Don't raise if some engines fail\n)\n\nwizsearch = WizSearch(config=config)\n\n# Check enabled engines\nprint(f\"Enabled: {wizsearch.get_enabled_engines()}\")\n\n# Get configuration\nprint(wizsearch.get_config())\n\n# Perform search\nresults = await wizsearch.search(\"Python best practices\")\n```\n\n### Page Crawling\n\nExtract full page content from search results using crawl4ai-powered page crawler:\n\n```python\nfrom wizsearch import PageCrawler\n\ncrawler = PageCrawler(\n    url=\"https://example.com/article\",\n    content_format=\"markdown\",  # \"markdown\", \"html\", or \"text\"\n    external_links=False,\n    adaptive_crawl=False,\n    depth=1,\n    word_count_threshold=5,\n    user_agent=\"Mozilla/5.0...\",\n    wait_for=None,  # CSS selector to wait for\n    screenshot=False,\n    bypass_cache=False,\n    only_text=True\n)\n\n# Crawl the page\ncontent = await crawler.crawl()\nprint(content)\n```\n\n### Semantic Search (Advanced | Preview)\n\nCombine web search with local vector storage for enhanced semantic search capabilities. The semantic search interface is synchronous.\n\n```python\nfrom wizsearch.semsearch import SemanticSearch, SemanticSearchConfig\nfrom wizsearch import TavilySearch\n\n# Configure semantic search\nconfig = SemanticSearchConfig(\n    vector_store_provider=\"weaviate\",  # or \"pgvector\"\n    collection_name=\"DocumentChunks\",\n    embedding_model=\"nomic-embed-text:latest\",\n    local_search_limit=10,\n    web_search_limit=5,\n    fallback_threshold=5,  # Min local results before web search\n    enable_caching=True,\n    cache_ttl_hours=24,\n    auto_store_web_results=True  # Automatically store web results\n)\n\n# Initialize with Tavily as web search engine\nweb_search = TavilySearch()\nsemantic_search = SemanticSearch(\n    web_search_engine=web_search,\n    config=config\n)\n\n# Connect to vector store\nsemantic_search.connect()\n\n# Perform semantic search\n# First searches local vector store, falls back to web if needed\nresult = semantic_search.search(\n    query=\"machine learning best practices\",\n    limit=10,\n    force_web_search=False\n)\n\nprint(f\"Total results: {result.total_results}\")\nprint(f\"Local: {result.local_results}, Web: {result.web_results}\")\nprint(f\"Search time: {result.search_time:.2f}s\")\n\n# Access chunks with scores\nfor chunk, score in result.chunks[:5]:\n    print(f\"\\n[{score:.3f}] {chunk.source_title}\")\n    print(f\"Content: {chunk.content[:200]}...\")\n\n# Manually store documents\nsemantic_search.store_document(\n    content=\"Your document content here...\",\n    source_url=\"https://example.com\",\n    source_title=\"Example Document\",\n    metadata={\"category\": \"tutorial\"}\n)\n\n# Get statistics\nstats = semantic_search.get_stats()\nprint(stats)\n```\n\n### Working with Search Results\n\nAll search engines return a consistent `SearchResult` object:\n\n```python\n# SearchResult structure\nresults = await searcher.search(\"query\")\n\n# Basic attributes\nprint(results.query)           # Original query\nprint(results.answer)          # AI-generated answer (if available)\nprint(results.images)          # List of image URLs\nprint(results.response_time)   # Response time in seconds\nprint(results.raw_response)    # Raw API response\n\n# Source items\nfor source in results.sources:\n    print(source.url)          # URL\n    print(source.title)        # Title\n    print(source.content)      # Extracted content/snippet\n    print(source.score)        # Relevance score (if available)\n    print(source.raw_content)  # Raw content\n```\n\n### Custom Engine Registration\n\nRegister your own custom search engine:\n\n```python\nfrom wizsearch import WizSearch, WizSearchConfig, BaseSearch, SearchResult, SourceItem\nfrom pydantic import BaseModel\n\nclass CustomSearchConfig(BaseModel):\n    max_results: int = 10\n    api_key: str = \"\"\n\nclass CustomSearch(BaseSearch):\n    def __init__(self, config: CustomSearchConfig):\n        self.config = config\n\n    async def search(self, query: str, **kwargs) -\u003e SearchResult:\n        # Implement your search logic\n        # Example: return mock results\n        sources = [\n            SourceItem(\n                url=\"https://example.com\",\n                title=\"Example Result\",\n                content=\"This is example content\",\n                score=0.95\n            )\n        ]\n        return SearchResult(\n            query=query,\n            sources=sources,\n            answer=None\n        )\n\n# Register the engine\nWizSearch.register_custom_engine(\n    name=\"custom\",\n    engine_class=CustomSearch,\n    config_class=CustomSearchConfig\n)\n\n# Use it with WizSearch\nconfig = WizSearchConfig(enabled_engines=[\"custom\", \"duckduckgo\"])\nwizsearch = WizSearch(config=config)\nresults = await wizsearch.search(\"test query\")\n```\n\n## Examples\n\nCheck the `examples/` directory for comprehensive examples:\n\n- `wizsearch_demo.py` - Multi-engine search demonstrations\n- `tavily_search_demo.py` - Tavily-specific features\n- `google_ai_search_demo.py` - Google AI search examples\n- `ddg_search_demo.py` - DuckDuckGo search examples\n- Individual engine demos for each supported search engine\n\nRun examples:\n\n```bash\n# Basic demo\nuv run python examples/wizsearch_demo.py\n\n# Tavily demo (requires API key)\nexport TAVILY_API_KEY=\"your-key\"\nuv run python examples/tavily_search_demo.py\n```\n\n## API Reference\n\n### Core Classes\n\n- **`WizSearch`**: Multi-engine search aggregator\n  - `search(query, **kwargs)`: Perform concurrent search\n  - `get_available_engines()`: List all available engines\n  - `get_enabled_engines()`: List enabled engines\n  - `get_config()`: Get current configuration\n  - `register_custom_engine(name, engine_class, config_class)`: Register custom engine\n\n- **`WizSearchConfig`**: Configuration for WizSearch\n  - `enabled_engines`: List of engine names to enable\n  - `max_results_per_engine`: Max results per engine (1-50)\n  - `timeout`: Request timeout in seconds (1-60)\n  - `fail_silently`: Continue if engines fail (default: True)\n\n- **`BaseSearch`**: Abstract base class for search engines\n  - `search(query, **kwargs)`: Async search method\n\n- **`SearchResult`**: Unified search result format\n  - `query`: Original query string\n  - `answer`: AI-generated answer (optional)\n  - `images`: List of image URLs\n  - `sources`: List of `SourceItem` objects\n  - `response_time`: Response time in seconds\n  - `raw_response`: Raw API response\n\n- **`SourceItem`**: Individual search result\n  - `url`: Result URL\n  - `title`: Result title\n  - `content`: Extracted content/snippet\n  - `score`: Relevance score (optional)\n  - `raw_content`: Raw content (optional)\n\n- **`PageCrawler`**: Web page content crawler\n  - `crawl()`: Async crawl method\n\n- **`SemanticSearch`**: Semantic search with vector storage\n  - `connect()`: Connect to vector store\n  - `search(query, limit, force_web_search, filters)`: Semantic search\n  - `store_document(content, source_url, source_title, metadata)`: Store document\n  - `get_stats()`: Get system statistics\n  - `clear_cache()`: Clear query cache\n\n## Environment Variables\n\nSome search engines require API keys set as environment variables:\n\n```bash\n# Tavily (required for TavilySearch)\nexport TAVILY_API_KEY=\"your-tavily-api-key\"\n\n# Google AI (required for GoogleAISearch)\nexport GOOGLE_API_KEY=\"your-google-api-key\"\n```\n\n## Development\n\n```bash\n# Clone repository\ngit clone https://github.com/caesar0301/wizsearch.git\ncd wizsearch\n\n# Install development dependencies\npip install -e \".[dev]\"\n\n# Run tests\nmake test\n\n# Run linting\nmake lint\n\n# Format code\nmake format\n```\n\n## Architecture\n\n```\n┌─────────────────┐\n│   WizSearch     │  Multi-engine orchestrator\n└────────┬────────┘\n         │\n    ┌────┴────┐\n    ▼         ▼\n┌────────┐ ┌────────┐\n│Engine 1│ │Engine 2│  Individual search engines\n└────────┘ └────────┘\n    │         │\n    └────┬────┘\n         ▼\n   ┌──────────┐\n   │  Merger  │  Round-robin result merging\n   └──────────┘\n         │\n         ▼\n  ┌─────────────┐\n  │SearchResult │  Unified result format\n  └─────────────┘\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add some amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n## Links\n\n- **Homepage**: https://github.com/caesar0301/wizsearch\n- **PyPI**: https://pypi.org/project/wizsearch/\n- **Issues**: https://github.com/caesar0301/wizsearch/issues\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcaesar0301%2Fwizsearch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcaesar0301%2Fwizsearch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcaesar0301%2Fwizsearch/lists"}