{"id":48456146,"url":"https://github.com/ypollak2/llm-router","last_synced_at":"2026-05-27T12:01:33.178Z","repository":{"id":347671187,"uuid":"1194834223","full_name":"ypollak2/llm-router","owner":"ypollak2","description":"  Universal LLM router for AI coding tools. Works with Claude Code, Cursor, Codex, Gemini CLI, Copilot and more.        Free-first fallback chain keeps costs 70–85% lower.","archived":false,"fork":false,"pushed_at":"2026-05-25T19:40:57.000Z","size":54294,"stargazers_count":27,"open_issues_count":10,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-05-25T21:27:15.800Z","etag":null,"topics":["ai-routing","anthropic","claude","claude-code","cost-optimization","gemini","litellm","llm","llm-router","mcp-server","model-router","ollama","openai"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ypollak2.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":".github/SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-28T21:49:28.000Z","updated_at":"2026-05-25T19:41:00.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ypollak2/llm-router","commit_stats":null,"previous_names":["ypollak2/llm-router"],"tags_count":133,"template":false,"template_full_name":null,"purl":"pkg:github/ypollak2/llm-router","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ypollak2%2Fllm-router","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ypollak2%2Fllm-router/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ypollak2%2Fllm-router/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ypollak2%2Fllm-router/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ypollak2","download_url":"https://codeload.github.com/ypollak2/llm-router/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ypollak2%2Fllm-router/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33564850,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-27T02:00:06.184Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-routing","anthropic","claude","claude-code","cost-optimization","gemini","litellm","llm","llm-router","mcp-server","model-router","ollama","openai"],"created_at":"2026-04-06T23:03:15.836Z","updated_at":"2026-05-27T12:01:33.159Z","avatar_url":"https://github.com/ypollak2.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"docs/readme/hero-dark.svg\"\u003e\n    \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"docs/readme/hero-light.svg\"\u003e\n    \u003cimg src=\"docs/readme/hero-light.svg\" alt=\"llm-router routes AI coding prompts across free, budget, and premium model tiers.\" width=\"100%\"/\u003e\n  \u003c/picture\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003ellm-router\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eMake Claude Code, Codex, and Gemini CLI use the cheapest model that can still do the job well.\u003c/strong\u003e\u003cbr/\u003e\n  Save 35-80% on routine prompts, protect premium quota, and fall back automatically when providers fail.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://pypi.org/project/llm-routing/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/llm-routing?style=flat-square\u0026color=4F46E5\" alt=\"PyPI\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/llm-routing/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/dm/llm-routing?style=flat-square\u0026color=4F46E5\" alt=\"Downloads\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/ypollak2/llm-router/actions\"\u003e\u003cimg src=\"https://img.shields.io/github/actions/workflow/status/ypollak2/llm-router/ci.yml?style=flat-square\u0026label=tests\" alt=\"Tests\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/ypollak2/llm-router/stargazers\"\u003e\u003cimg src=\"https://img.shields.io/github/stars/ypollak2/llm-router?style=flat-square\u0026color=FBBF24\" alt=\"Stars\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/llm-routing/\"\u003e\u003cimg src=\"https://img.shields.io/badge/python-3.10+-3572A5?style=flat-square\" alt=\"Python\"\u003e\u003c/a\u003e\n  \u003ca href=\"LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/license-MIT-10B981?style=flat-square\" alt=\"License\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eLocal-first.\u003c/strong\u003e No hosted proxy. No account required.\n\u003c/p\u003e\n\n---\n\n## Why People Install This\n\nAI coding tools send too many prompts to premium models by default.\n\nThat means:\n\n- You waste paid tokens on simple questions\n- You burn through Claude, Gemini, or OpenAI quota faster than necessary\n- You stop working when one provider is rate-limited or down\n\n`llm-router` sits between your coding tool and your model providers. It classifies each prompt, tries the cheapest capable model first, and falls back automatically when needed.\n\nYou keep the same workflow. The router changes the model choice underneath.\n\n\u003cp align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"docs/readme/why-route-dark.svg\"\u003e\n    \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"docs/readme/why-route-light.svg\"\u003e\n    \u003cimg src=\"docs/readme/why-route-light.svg\" alt=\"Animated benefits panel for llm-router showing cheaper routing, preserved quality, quota protection, and low-config setup.\" width=\"100%\"/\u003e\n  \u003c/picture\u003e\n\u003c/p\u003e\n\n---\n\n## What You Get\n\n- Route trivial prompts to free or cheap models first\n- Keep premium models for the prompts that actually need them\n- Fall back across providers automatically\n- Track usage and estimated savings locally\n- Run everything on your own machine\n\n---\n\n## Quick Start\n\n### 1. Install\n\n```bash\npip install llm-routing\nllm-router install\n```\n\n\u003e Package name: `llm-routing` on PyPI. CLI command: `llm-router`.\n\n### 2. Add providers (optional)\n\n```bash\nexport OPENAI_API_KEY=\"sk-...\"      # GPT-4o, o3\nexport GEMINI_API_KEY=\"AIza...\"     # Gemini Flash/Pro (free tier available)\nexport OLLAMA_BASE_URL=\"http://localhost:11434\"  # Local models (free)\n```\n\nWorks with **zero API keys** on Claude Code Pro/Max subscriptions — routing uses MCP tools that call external models only when beneficial.\n\n### 3. Verify\n\n```bash\nllm-router health            # Check provider connectivity\n```\n\nIf you already use Claude Code, Codex, or Gemini CLI, keep your existing workflow and let `llm-router` choose models underneath it.\n\n---\n\n## Example Routing\n\n| Prompt | Routed to |\n|--------|-----------|\n| \"What does this Python error mean?\" | Ollama / Gemini Flash / Codex |\n| \"Refactor this endpoint\" | GPT-4o / Gemini Pro |\n| \"Design a distributed tracing strategy\" | o3 / Claude Opus |\n\nThe exact chain depends on your configured providers, budget profile, and routing policy.\n\n---\n\n## Works With\n\n| Tool | Mode | Typical Savings |\n|------|------|-----------------|\n| **Claude Code** | Full auto-routing via hooks | 60–80% |\n| **Codex CLI** | Full auto-routing via hooks | 60–80% |\n| **Gemini CLI** | Full auto-routing via hooks | 50–70% |\n| **VS Code / Cursor** | Manual MCP tools | 30–50% |\n| **Any MCP client** | Manual MCP tools | Varies |\n\n\u003cp align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"docs/readme/editors-dark.svg\"\u003e\n    \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"docs/readme/editors-light.svg\"\u003e\n    \u003cimg src=\"docs/readme/editors-light.svg\" alt=\"Animated host support cards for Claude Code, Codex CLI, Gemini CLI, Pi, VS Code, Cursor, and any MCP client.\" width=\"100%\"/\u003e\n  \u003c/picture\u003e\n\u003c/p\u003e\n\n- **Full auto-routing** means hooks intercept prompts and route automatically with no workflow change.\n- **Manual MCP tools** means routing is available on demand through tools such as `llm_query`.\n\n```bash\nllm-router install                    # Claude Code (default)\nllm-router install --host codex       # Codex CLI\nllm-router install --host gemini-cli  # Gemini CLI\nllm-router install --host vscode      # VS Code\nllm-router install --host cursor      # Cursor\n```\n\nSee [docs/HOST_SUPPORT_MATRIX.md](docs/HOST_SUPPORT_MATRIX.md) for full details on each host.\n\n---\n\n## How It Works\n\n```\nUser prompt\n    │\n    ▼\n┌──────────────────────┐\n│ Complexity Classifier │  ← Heuristic (free, instant) or Ollama/Flash ($0.0001)\n└──────────┬───────────┘\n           │\n           ▼\n┌──────────────────────┐\n│  Free-First Router   │  ← Tries cheapest model first, walks up the chain\n│                      │\n│  Ollama (free)       │\n│  → Codex (prepaid)   │\n│  → Gemini Flash      │\n│  → GPT-4o / Claude   │\n└──────────┬───────────┘\n           │\n           ▼\n┌──────────────────────┐\n│  Guards (parallel)   │  ← Circuit breaker, budget pressure, quality check\n└──────────┬───────────┘\n           │\n           ▼\n      Response + cost logged to local SQLite\n```\n\nClassification is free for many tasks (regex heuristics catch ~70%) or near-free for ambiguous prompts when using local Ollama or Gemini Flash.\n\n---\n\n## What You Can Do\n\n| Use case | How |\n|----------|-----|\n| Route simple questions to free local models | Auto (hooks) or `llm_query` |\n| Protect Claude subscription quota | Budget pressure monitoring + auto-downgrade |\n| Fall back across providers on failure | Automatic chain with circuit breakers |\n| Track token spend and savings | `llm_usage`, `llm_savings`, session-end reports |\n| Enforce routing policy for your team | `LLM_ROUTER_POLICY=aggressive` |\n| Generate images/video/audio | `llm_image`, `llm_video`, `llm_audio` |\n| Run multi-step research pipelines | `llm_orchestrate` with templates |\n| Bulk-edit files with cheap models | `llm_fs_edit_many` |\n\n---\n\n## Providers\n\nRouting chains are built from your configured providers. You only need one.\n\n### Text LLM Providers\n\n| Provider | Models | Cost | Setup |\n|----------|--------|------|-------|\n| **Ollama** | gemma4, qwen3.5, llama3, etc. | Free (local) | `OLLAMA_BASE_URL` |\n| **OpenAI** | GPT-4o, o3, GPT-4o-mini | Paid API | `OPENAI_API_KEY` |\n| **Google** | Gemini Flash, Pro | Free tier + paid | `GEMINI_API_KEY` |\n| **Anthropic** | Claude Sonnet, Opus, Haiku | Paid API or subscription | `ANTHROPIC_API_KEY` or subscription |\n| **xAI** | Grok-3 | Paid API | `XAI_API_KEY` |\n| **DeepSeek** | DeepSeek Chat, Reasoner | Paid API (ultra-cheap) | `DEEPSEEK_API_KEY` |\n| **Mistral** | Mistral Large, Small | Paid API | `MISTRAL_API_KEY` |\n| **Cohere** | Command R+ | Paid API | `COHERE_API_KEY` |\n| **Perplexity** | Sonar Pro (web-grounded) | Paid API | `PERPLEXITY_API_KEY` |\n| **Groq** | Fast inference (Llama, Mixtral) | Free tier | `GROQ_API_KEY` |\n| **Together** | Open-source models | Paid API | `TOGETHER_API_KEY` |\n| **HuggingFace** | Open-source models | Free tier + paid | `HF_TOKEN` |\n| **Codex** | GPT-5.4, o3 (prepaid desktop) | Included with Codex CLI | Auto-detected |\n\n### Media Providers\n\n| Provider | Type | Setup |\n|----------|------|-------|\n| **fal** | Image (Flux), Video (Kling) | `FAL_KEY` |\n| **Stability** | Image (Stable Diffusion 3) | `STABILITY_API_KEY` |\n| **ElevenLabs** | Audio / TTS | `ELEVENLABS_API_KEY` |\n| **Runway** | Video (Gen-3) | `RUNWAY_API_KEY` |\n| **Replicate** | Various open-source models | `REPLICATE_API_TOKEN` |\n\nSee [docs/PROVIDERS.md](docs/PROVIDERS.md) for setup instructions and model recommendations.\n\n---\n\n## Routing Policies\n\nControl how aggressively the router offloads to cheap models.\n\n| Policy | Confidence Threshold | Typical Savings | Best For |\n|--------|:-------------------:|:---------------:|----------|\n| **Aggressive** | 2 | 60–75% | Maximum cost reduction |\n| **Balanced** (default) | 4 | 35–45% | Cost/quality tradeoff |\n| **Conservative** | 6 | 10–15% | Quality over cost |\n\n```bash\nexport LLM_ROUTER_POLICY=aggressive     # Or: balanced, conservative\nexport LLM_ROUTER_ENFORCE=smart          # smart | hard | soft | off\nexport LLM_ROUTER_PROFILE=balanced       # budget | balanced | premium\n```\n\n`LLM_ROUTER_ENFORCE` controls how strictly the auto-route hook blocks direct model use:\n- `smart` — route when confident, pass through when uncertain\n- `hard` — always route, block unrouted tool calls\n- `soft` — suggest routing, never block\n- `off` — disable hook enforcement\n\n---\n\n## MCP Tools (60)\n\nllm-router exposes 60 MCP tools organized by function:\n\n| Category | Tools | Examples |\n|----------|:-----:|---------|\n| Routing \u0026 classification | 7 | `llm_route`, `llm_classify`, `llm_auto`, `llm_stream` |\n| Text generation | 6 | `llm_query`, `llm_code`, `llm_analyze`, `llm_research` |\n| Media generation | 3 | `llm_image`, `llm_video`, `llm_audio` |\n| Pipeline orchestration | 2 | `llm_orchestrate`, `llm_pipeline_templates` |\n| Admin \u0026 monitoring | 20+ | `llm_usage`, `llm_budget`, `llm_health`, `llm_savings` |\n| Filesystem operations | 4 | `llm_fs_find`, `llm_fs_edit_many` |\n| Subscription tracking | 3 | `llm_check_usage`, `llm_refresh_claude_usage` |\n\n**Slim mode** (`LLM_ROUTER_SLIM=routing` or `core`) reduces registered tools to save context tokens in constrained environments.\n\n[Full Tool Reference](docs/TOOLS.md)\n\n---\n\n## Savings: How It Works\n\n\u003cp align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"docs/readme/savings-dark.svg\"\u003e\n    \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"docs/readme/savings-light.svg\"\u003e\n    \u003cimg src=\"docs/readme/savings-light.svg\" alt=\"Animated savings breakdown showing 60-80% typical cost reduction with token distribution across free, budget, and premium tiers.\" width=\"100%\"/\u003e\n  \u003c/picture\u003e\n\u003c/p\u003e\n\nSavings are calculated by comparing actual spend against a baseline of routing every task to Claude Sonnet/Opus.\n\n**Methodology:**\n1. Each routed task logs: model used, tokens consumed, estimated cost\n2. A baseline cost is computed as if the same tokens were processed by the most expensive model in the chain\n3. Savings = `(baseline - actual) / baseline`\n\n**Assumptions and limitations:**\n- Baseline assumes you would have used Opus/Sonnet for everything (worst case)\n- Token estimates use `len(text) / 4` approximation, not exact tokenizer counts\n- Cost data comes from LiteLLM's pricing tables (may lag provider price changes)\n- Savings vary significantly by workload — code-heavy sessions route more to cheap models\n- The router itself adds small overhead (classification costs ~$0.0001 per ambiguous task)\n\n**Observed range:** 35–80% savings depending on policy and task mix. The \"87%\" figure in some docs represents a single-user peak over a specific development period, not a guaranteed outcome.\n\n---\n\n## Trust, Privacy, and Local-First Design\n\nllm-router runs entirely on your machine. There is no hosted proxy, no telemetry, no account required.\n\n| What | Where | Details |\n|------|-------|---------|\n| **Your prompts** | Sent to configured providers | Exactly like using those providers directly |\n| **API keys** | `.env` or `~/.llm-router/config.yaml` | Local files, never transmitted |\n| **Usage logs** | `~/.llm-router/usage.db` | Unencrypted SQLite (filesystem permissions) |\n| **Classification cache** | In-memory | Cleared on process restart |\n| **Hook scripts** | `~/.claude/hooks/` | Local shell scripts, inspectable |\n\n**What we do:**\n- Scrub API keys from structured logs\n- Detect hook deadlocks before installation\n- Store all data locally in `~/.llm-router/`\n- Respect provider rate limits and TOS\n\n**What you should know:**\n- Prompts are sent to whichever provider the router selects — review your provider's privacy policy\n- Usage logs (SQLite) are not encrypted at rest — use full-disk encryption if needed\n- The router cannot prevent model jailbreaks or prompt injection at the provider level\n\nSee [SECURITY.md](SECURITY.md) for responsible disclosure policy and [docs/SECURITY_DESIGN.md](docs/SECURITY_DESIGN.md) for the full threat model.\n\n---\n\n## Configuration\n\nMinimal setup — only configure what you have:\n\n```bash\n# Provider keys (set any combination)\nexport OPENAI_API_KEY=\"sk-proj-...\"\nexport GEMINI_API_KEY=\"AIza...\"\nexport OLLAMA_BASE_URL=\"http://localhost:11434\"\nexport OLLAMA_BUDGET_MODELS=\"gemma4:latest,qwen3.5:latest\"\n\n# Routing behavior\nexport LLM_ROUTER_PROFILE=\"balanced\"       # budget | balanced | premium\nexport LLM_ROUTER_POLICY=\"balanced\"        # aggressive | balanced | conservative\nexport LLM_ROUTER_ENFORCE=\"smart\"          # smart | hard | soft | off\n```\n\nFor teams or environments where `.env` is restricted:\n\n```bash\n# User-level config (no project .env needed)\nmkdir -p ~/.llm-router \u0026\u0026 chmod 700 ~/.llm-router\ncat \u003e ~/.llm-router/config.yaml \u003c\u003c 'EOF'\nopenai_api_key: \"sk-proj-...\"\ngemini_api_key: \"AIza...\"\nollama_base_url: \"http://localhost:11434\"\nllm_router_profile: \"balanced\"\nEOF\nchmod 600 ~/.llm-router/config.yaml\n```\n\n---\n\n## Documentation\n\n| Document | Purpose |\n|----------|---------|\n| [Quick Start (2 min)](docs/QUICKSTART_2MIN.md) | Fastest path to working routing |\n| [Getting Started](docs/GETTING_STARTED.md) | Full setup walkthrough |\n| [Host Support Matrix](docs/HOST_SUPPORT_MATRIX.md) | Per-host feature comparison |\n| [Providers](docs/PROVIDERS.md) | Provider setup and model recommendations |\n| [Tool Reference](docs/TOOLS.md) | All 60 MCP tools with examples |\n| [Architecture](docs/ARCHITECTURE.md) | Internal design and module structure |\n| [Troubleshooting](docs/TROUBLESHOOTING.md) | Common issues and fixes |\n| [Security Design](docs/SECURITY_DESIGN.md) | Threat model and data handling |\n\n---\n\n## Contributing\n\nContributions welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for full guidelines.\n\n```bash\ngit clone https://github.com/ypollak2/llm-router.git\ncd llm-router\nuv sync --extra dev\nuv run pytest tests/ -q         # Run tests (1700+)\nuv run ruff check src/ tests/   # Lint\n```\n\n---\n\n## Package Names\n\n| Name | What it is |\n|------|-----------|\n| `llm-routing` | Current PyPI package (`pip install llm-routing`) |\n| `llm-router` | CLI command and GitHub repo name |\n| `claude-code-llm-router` | Deprecated legacy package (redirects to `llm-routing`) |\n\n---\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/ypollak2/llm-router/issues\"\u003eIssues\u003c/a\u003e · \u003ca href=\"https://pypi.org/project/llm-routing/\"\u003ePyPI\u003c/a\u003e · \u003ca href=\"CHANGELOG.md\"\u003eChangelog\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\u003csub\u003eMIT License\u003c/sub\u003e\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fypollak2%2Fllm-router","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fypollak2%2Fllm-router","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fypollak2%2Fllm-router/lists"}