{"id":48645073,"url":"https://github.com/khamel83/argus","last_synced_at":"2026-04-24T05:02:40.531Z","repository":{"id":348441204,"uuid":"1197772439","full_name":"Khamel83/argus","owner":"Khamel83","description":"Multi-provider web search broker for AI agents with budget-aware routing and 5,000+ free monthly queries.","archived":false,"fork":false,"pushed_at":"2026-04-21T06:05:26.000Z","size":648,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-21T07:30:17.490Z","etag":null,"topics":["ai-agents","brave-search","cli","content-extraction","duckduckgo","fastapi","llm-tools","mcp","mcp-registry","mcp-server","python","search-api","search-broker","searxng","tavily","web-search"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/argus-search/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Khamel83.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-31T21:50:02.000Z","updated_at":"2026-04-21T06:05:30.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Khamel83/argus","commit_stats":null,"previous_names":["khamel83/argus"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/Khamel83/argus","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Khamel83%2Fargus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Khamel83%2Fargus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Khamel83%2Fargus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Khamel83%2Fargus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Khamel83","download_url":"https://codeload.github.com/Khamel83/argus/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Khamel83%2Fargus/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32209897,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-24T03:15:14.334Z","status":"ssl_error","status_checked_at":"2026-04-24T03:15:11.608Z","response_time":64,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","brave-search","cli","content-extraction","duckduckgo","fastapi","llm-tools","mcp","mcp-registry","mcp-server","python","search-api","search-broker","searxng","tavily","web-search"],"created_at":"2026-04-10T02:17:15.413Z","updated_at":"2026-04-24T05:02:40.525Z","avatar_url":"https://github.com/Khamel83.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Argus\n\n\u003c!-- mcp-name: io.github.Khamel83/argus --\u003e\n\n[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-brightgreen)](https://www.python.org/downloads/)\n[![PyPI Version](https://img.shields.io/pypi/v/argus-search)](https://pypi.org/project/argus-search/)\n[![PyPI Downloads](https://img.shields.io/pepy/dt/argus-search)](https://pepy.tech/projects/argus-search)\n[![License: MIT](https://img.shields.io/badge/license-MIT-green)](LICENSE)\n[![CI](https://github.com/Khamel83/argus/actions/workflows/ci.yml/badge.svg)](https://github.com/Khamel83/argus/actions/workflows/ci.yml)\n[![MCP Registry](https://img.shields.io/badge/MCP-Registry-blue)](https://registry.modelcontextprotocol.io/servers/io.github.Khamel83/argus)\n[![Docker](https://img.shields.io/badge/ghcr.io-khamel83%2Fargus-blue)](https://github.com/Khamel83/argus/pkgs/container/argus)\n\nMulti-provider web search broker for AI agents. 14 providers, budget-aware routing, content extraction — one API so your agent doesn't need to stitch search results together.\n\n**Features at a glance:**\n\n- **14 providers, one API** — free-first tier routing, budget-exhausted providers skipped automatically\n- **Zero-key start** — `pip install argus-search` gives you DuckDuckGo + Yahoo immediately, no accounts needed\n- **SearXNG self-host = 70+ engines** — Google, Bing, Yahoo, Startpage, Ecosia, Qwant and more via one Docker container\n- **9-step content extraction** — returns full page text with quality gates, not just links\n- **Multi-turn sessions** — pass `session_id` for conversational context across searches\n- **4 search modes** — discovery, research, recovery, grounding\n- **Dead URL recovery** — `/recover-url` with Wayback Machine and archive fallbacks\n- **4 integration paths** — HTTP API, CLI, MCP server, Python SDK\n\n_Built for AI agent builders, RAG pipelines, and ops teams who need reliable search without stitching APIs together._\n\n## Contents\n\n- [Quickstart](#quickstart)\n- [Providers](#providers)\n- [HTTP API](#http-api)\n- [Integration](#integration)\n  - [CLI](#cli)\n  - [MCP](#mcp)\n  - [Python](#python)\n- [Content Extraction](#content-extraction)\n- [Architecture](#architecture)\n- [Configuration](#configuration)\n- [FAQ](#faq)\n\n## Quickstart\n\n### Mode 1: Local CLI (zero config)\n\n```bash\npip install argus-search \u0026\u0026 argus search -q \"python web frameworks\"\n```\n\nThat's it. DuckDuckGo handles the search — no accounts, no keys, no containers. You get unlimited free search from your laptop right now. Add API keys whenever you want more providers, or don't.\n\n```bash\nargus extract -u \"https://example.com/article\"       # extract clean text from any URL\n```\n\nWorks on any machine with Python 3.11+ — laptop, Mac Mini, Raspberry Pi, cloud VM. Nothing to host.\n\n**For MCP (Claude Code, Cursor, VS Code):**\n\n```bash\npipx install argus-search[mcp] \u0026\u0026 argus mcp serve\n```\n\nThen add to your MCP config:\n\n```json\n{\"mcpServers\": {\"argus\": {\"command\": \"argus\", \"args\": [\"mcp\", \"serve\"]}}}\n```\n\nOr install from the [MCP Registry](https://registry.modelcontextprotocol.io/servers/io.github.Khamel83/argus):\n\n```json\n{\n  \"mcpServers\": {\n    \"argus\": {\n      \"registryType\": \"pypi\",\n      \"identifier\": \"argus-search\",\n      \"runtimeHint\": \"uvx\"\n    }\n  }\n}\n```\n\nOne command to install, one JSON block to connect. No server to run, no keys to configure.\n\n### Mode 2: Full Stack Server\n\nGot a Raspberry Pi running Pi-hole? A Mac Mini on your desk? An old laptop? That's enough to run the full stack — SearXNG (your own private search engine) plus local JS-rendering content extraction.\n\n```bash\ndocker compose up -d    # SearXNG + Argus\n```\n\n| What you have | What you get |\n|--------------|-------------|\n| **Any machine with Python 3.11+** | DuckDuckGo + API providers (no server) |\n| **Raspberry Pi 4 / old laptop** (4GB+) | Everything — SearXNG, all providers, Crawl4AI |\n| **Mac Mini M1+** (8GB+) | Full stack with headroom |\n| **Free cloud VM** (1GB) | SearXNG + search providers (skip Crawl4AI) |\n\nSearXNG takes 512MB of RAM and gives you a private Google-style search engine that nobody can rate-limit, block, or charge for. It runs alongside Pi-hole on hardware millions of people already own.\n\n## Providers\n\n| Provider | Credit type | Free capacity | Setup |\n|----------|------------|---------------|-------|\n| DuckDuckGo | Free (scraped) | Unlimited | None |\n| Yahoo | Free (scraped) | Unlimited | None — fragile, auto-skipped if broken |\n| SearXNG | Free (self-hosted) | Unlimited — 70+ engines¹ | Docker |\n| GitHub | Free (API) | Unlimited | None (token for higher rate limit) |\n| WolframAlpha | Free (API key) | 2,000 queries/month | [free key](https://developer.wolframalpha.com/) |\n| Brave Search | Monthly recurring | 2,000 queries/month | [dashboard](https://brave.com/search/api/) |\n| Tavily | Monthly recurring | 1,000 queries/month | [signup](https://app.tavily.com/sign-up) |\n| Exa | Monthly recurring | 1,000 queries/month | [signup](https://dashboard.exa.ai/signup) |\n| Linkup | Monthly recurring | 1,000 queries/month | [signup](https://linkup.so) |\n| Serper | One-time signup | 2,500 credits | [signup](https://serper.dev/signup) |\n| Parallel AI | One-time signup | 4,000 credits | [signup](https://parallel.ai) |\n| You.com | One-time signup | $20 credit | [platform](https://you.com/platform) |\n| Valyu | One-time signup | $10 credit | [platform](https://platform.valyu.ai) |\n\n¹ SearXNG aggregates Google, Bing, Yahoo, Startpage, Ecosia, Qwant, Wikipedia, and 60+ more — all behind a single self-hosted endpoint. Run `docker compose up -d` on any machine with 512MB of free RAM.\n\n² WolframAlpha returns **computed answers** (math, unit conversions, factual lookups), not web search results. It only activates in `grounding` and `research` modes. Queries it can't compute (general web searches) return empty — no error, no health penalty.\n\n**7,000+ free queries/month** from recurring free-tier providers alone (WolframAlpha 2k + Brave 2k + Tavily 1k + Exa 1k + Linkup 1k). DuckDuckGo, Yahoo, SearXNG, and GitHub have no monthly cap. Routing priority: **Tier 0** (free: SearXNG, DuckDuckGo, Yahoo, GitHub, WolframAlpha) → **Tier 1** (monthly recurring: Brave, Tavily, Exa, Linkup) → **Tier 3** (one-time: Serper, Parallel, You.com, Valyu, SearchAPI). Budget-exhausted providers are skipped automatically.\n\n## HTTP API\n\nAll endpoints prefixed with `/api`. OpenAPI docs at `http://localhost:8000/docs`.\n\n```bash\n# Search\ncurl -X POST http://localhost:8000/api/search \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"query\": \"python web frameworks\", \"mode\": \"discovery\", \"max_results\": 5}'\n\n# Multi-turn search (conversational refinement)\ncurl -X POST http://localhost:8000/api/search \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"query\": \"what about async?\", \"session_id\": \"my-session\"}'\n\n# Extract content from a working URL\ncurl -X POST http://localhost:8000/api/extract \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"url\": \"https://example.com/article\"}'\n\n# Recover a dead or moved URL\ncurl -X POST http://localhost:8000/api/recover-url \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"url\": \"https://example.com/old-page\", \"title\": \"Example Article\"}'\n\n# Health \u0026 budgets\ncurl http://localhost:8000/api/health/detail\ncurl http://localhost:8000/api/budgets\n```\n\n#### Search modes\n\n| Mode | Use for | Example |\n|------|---------|---------|\n| `discovery` | Related pages, canonical sources | \"Find the official docs for X\" |\n| `research` | Broad exploratory retrieval | \"Latest approaches to Y?\" |\n| `recovery` | Finding moved/dead content | \"This URL is 404\" |\n| `grounding` | Fact-checking with live sources | \"Verify this claim about Z\" |\n\nTier-based routing always applies first. Within each tier, the mode selects provider order.\n\n#### Response format\n\n```json\n{\n  \"query\": \"python web frameworks\",\n  \"mode\": \"discovery\",\n  \"results\": [\n    {\"url\": \"https://fastapi.tiangolo.com\", \"title\": \"FastAPI\", \"snippet\": \"Modern Python web framework\", \"score\": 0.942}\n  ],\n  \"total_results\": 1,\n  \"cached\": false,\n  \"traces\": [\n    {\"provider\": \"duckduckgo\", \"status\": \"success\", \"results_count\": 5, \"latency_ms\": 312}\n  ]\n}\n```\n\nEach result includes `url`, `title`, `snippet`, `domain`, `provider`, and `score`. The `traces` array shows which providers were called and their outcomes.\n\n#### Budgets\n\n```json\n{\n  \"budgets\": {\n    \"brave\": {\"remaining\": 1847, \"monthly_usage\": 153, \"usage_count\": 153, \"exhausted\": false},\n    \"duckduckgo\": {\"remaining\": 0, \"monthly_usage\": 0, \"usage_count\": 42, \"exhausted\": false}\n  },\n  \"token_balances\": {\"jina\": 9833638}\n}\n```\n\nEach provider tracks usage. Tier 1 (monthly) uses a 30-day rolling window; tier 3 (one-time) uses a lifetime counter that never resets. When a provider hits its budget, Argus skips it and moves to the next. Free providers (SearXNG, DuckDuckGo, GitHub) have no limit. Set `ARGUS_*_MONTHLY_BUDGET_USD` to enforce custom limits per provider.\n\n## Integration\n\n### CLI\n\n```bash\nargus search -q \"python web framework\"              # zero-config, uses DuckDuckGo\nargus search -q \"python web framework\" --mode research -n 20\nargus search -q \"fastapi\" --session my-session       # multi-turn context\nargus extract -u \"https://example.com/article\"       # extract clean text\nargus extract -u \"https://example.com/article\" -d nytimes.com  # auth extraction\nargus recover-url -u \"https://dead.link\" -t \"Title\"\nargus health                                         # provider status\nargus budgets                                        # budget + token balances\nargus set-balance -s jina -b 9833638                 # track token balance\nargus test-provider -p brave                         # smoke-test a provider\nargus serve                                          # start API server\nargus mcp serve                                      # start MCP server\n```\n\nAll commands support `--json` for structured output.\n\n\u003cdetails\u003e\n\u003csummary\u003eHow sessions work\u003c/summary\u003e\n\nPass `session_id` to any search call. Argus stores each query and extracted URL in a SQLite-backed session. Reusing the same `session_id` gives the broker context from prior queries — follow-up searches are automatically refined using earlier conversation context. Sessions persist across restarts. Omit `session_id` for stateless, one-shot searches.\n\n\u003c/details\u003e\n\n### MCP\n\nAdd to your MCP client config:\n\n```json\n{\n  \"mcpServers\": {\n    \"argus\": {\n      \"command\": \"argus\",\n      \"args\": [\"mcp\", \"serve\"]\n    }\n  }\n}\n```\n\nWorks with **Claude Code**, **Cursor**, **VS Code**, and any MCP-compatible client.\n\n**Option B — Self-hosted server (homelab / always-on machine)**\n\nRun Argus once on a server, connect every client to it over the network. No local install needed on client machines.\n\nOn the server (`docker compose up -d` starts both):\n```bash\nargus mcp serve --transport sse --host 0.0.0.0 --port 8001\n```\n\nOn each client machine, add to `~/.claude/claude_desktop_config.json` (or equivalent):\n```json\n{\n  \"mcpServers\": {\n    \"argus\": {\n      \"url\": \"http://\u003cyour-server\u003e:8271/sse\"\n    }\n  }\n}\n```\n\nWith [Tailscale](https://tailscale.com), `\u003cyour-server\u003e` is your machine's Tailscale hostname (e.g. `homelab-ts`). One server, every machine on your network gets search.\n\nAvailable tools: `search_web`, `extract_content`, `recover_url`, `expand_links`, `search_health`, `search_budgets`, `test_provider`, `cookie_health`, `valyu_answer`\n\n### Python\n\n```python\nfrom argus.broker.router import create_broker\nfrom argus.models import SearchQuery, SearchMode\nfrom argus.extraction import extract_url\n\nbroker = create_broker()\n\nresponse = await broker.search(\n    SearchQuery(query=\"python web frameworks\", mode=SearchMode.DISCOVERY, max_results=10)\n)\nfor r in response.results:\n    print(f\"{r.title}: {r.url} (score: {r.score:.3f})\")\n\ncontent = await extract_url(response.results[0].url)\nprint(content.title)\nprint(content.text)\n```\n\n## Content Extraction\n\nArgus tries up to nine methods to extract content from any URL: first local (trafilatura, Crawl4AI, Playwright), then external APIs (Jina, Valyu Contents, Firecrawl, You.com, Wayback, archive.is). Each attempt is quality-checked for garbage output. See [docs/providers.md](docs/providers.md) for the full extractor comparison.\n\n**Extract** gets the full text of a working URL. **Recover-URL** finds alternatives when a URL is dead, paywalled, or radically changed by querying archival sources (Wayback, archive.is) and running a question-guided extraction loop.\n\n## Architecture\n\n```\nCaller (CLI/HTTP/MCP/Python) → SearchBroker → tier-sorted providers → RRF ranking → response\n                                     ↕ SessionStore (optional)\n                            Extractor (on demand) → 9-step fallback chain with quality gates\n```\n\n| Module | Responsibility |\n|--------|---------------|\n| `argus/broker/` | Tier-based routing, ranking, dedup, caching, health, budgets |\n| `argus/providers/` | Provider adapters (one per search API) |\n| `argus/extraction/` | 9-step URL extraction fallback chain with quality gates |\n| `argus/sessions/` | Multi-turn session store and query refinement |\n| `argus/api/` | FastAPI HTTP endpoints |\n| `argus/cli/` | Click CLI commands |\n| `argus/mcp/` | MCP server for LLM integration |\n| `argus/persistence/` | PostgreSQL query/result storage |\n\nAdd new providers or extractors with a single adapter file. See [CONTRIBUTING.md](CONTRIBUTING.md) for the interface.\n\n### How a Query Works\n\n```\nquery arrives → cache? → build provider queue → execute sequentially → RRF fuse → dedup → respond\n```\n\n1. **Cache check.** `SearchCache` hashes `normalized_query:mode` (SHA256). Hit returns immediately with a TTL of 168 hours (7 days).\n\n2. **Provider queue.** `resolve_routing()` takes the mode-specific preference list and stable-sorts by tier: tier 0 (free) first, tier 1 (monthly) next, tier 3 (one-time) last. Example for discovery mode:\n   ```\n   searxng → duckduckgo → yahoo → github → brave → exa → tavily → linkup → serper → parallel → you → valyu\n   ```\n\n3. **Sequential execution with gates.** Each provider is checked in order. Four gates must pass before an API call:\n   - **Config** — is the provider enabled and configured (API key present)?\n   - **Health** — has it failed 5+ consecutive times (triggers 60-minute cooldown)?\n   - **Budget** — for tier 1+: is the budget exhausted? For tier 1 (monthly), pacing checks if the 7-day usage rate would drain the remaining budget in under a week — empty days bank headroom. For tier 3 (one-time), a lifetime counter gates access — exhaustion is the sole check.\n   - **Execute** — the actual HTTP call. Successes reset failure counters; failures increment them.\n\n4. **RRF fusion.** Results from all queried providers are merged using Reciprocal Rank Fusion (`k=60`). Each result's score is the sum of `1/(k + rank)` across every provider that returned it. Results appearing in multiple providers rank higher.\n\n5. **Dedup and truncate.** URLs are normalized (stripped `www.`, tracking params like `utm_*`, trailing slashes) and deduplicated. The merged list is truncated to `max_results` (default 10).\n\n6. **Cache and persist.** The final response is written to the in-memory cache and persisted to PostgreSQL. Provider traces (which were called, which were skipped and why) are included in the response for observability.\n\n## Configuration\n\nAll config via environment variables. See `.env.example` for the full list. Missing keys degrade gracefully — providers are skipped, not errors.\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `ARGUS_SEARXNG_BASE_URL` | `http://127.0.0.1:8080` | SearXNG endpoint |\n| `ARGUS_BRAVE_API_KEY` | — | Brave Search API key |\n| `ARGUS_SERPER_API_KEY` | — | Serper API key |\n| `ARGUS_TAVILY_API_KEY` | — | Tavily API key |\n| `ARGUS_EXA_API_KEY` | — | Exa API key |\n| `ARGUS_LINKUP_API_KEY` | — | Linkup API key |\n| `ARGUS_PARALLEL_API_KEY` | — | Parallel AI API key |\n| `ARGUS_YOU_API_KEY` | — | You.com API key |\n| `ARGUS_VALYU_API_KEY` | — | Valyu API key (search, contents, answer) |\n| `ARGUS_FIRECRAWL_API_KEY` | — | Firecrawl API key (content extraction) |\n| `ARGUS_GITHUB_API_KEY` | — | GitHub token (higher rate limit) |\n| `ARGUS_*_MONTHLY_BUDGET_USD` | 0 (unlimited) | Query-count budget per provider |\n| `ARGUS_CRAWL4AI_ENABLED` | false | Enable Crawl4AI extraction step |\n| `ARGUS_YOU_CONTENTS_ENABLED` | false | Enable You.com Contents API extraction |\n| `ARGUS_CACHE_TTL_HOURS` | 168 | Result cache TTL |\n\n## FAQ\n\n**How is this different from calling Tavily/Serper directly?**\nArgus calls them for you — plus 13 other providers. You get one ranked, deduplicated result set instead of managing multiple API keys and stitching results together. Free providers are tried first, so you only burn credits when needed.\n\n**Can I run only one provider?**\nYes. Set only the API key for the provider you want. All others are silently skipped. For zero-config, just install and go — DuckDuckGo + Yahoo handle search with no keys.\n\n**Do I need Docker?**\nNo. `pip install argus-search` works immediately on any machine with Python 3.11+. Docker is only needed for SearXNG (self-hosted search, aggregates 70+ engines) or Crawl4AI (local JS rendering).\n\n## License\n\nMIT — see [CHANGELOG.md](CHANGELOG.md) for release history.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkhamel83%2Fargus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkhamel83%2Fargus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkhamel83%2Fargus/lists"}