https://github.com/m4stanuj/mast-llm-router
Task-aware MCP LLM fallback router: 13 provider routes, 10 chains, semantic cache, auto fallback, $0/month.
https://github.com/m4stanuj/mast-llm-router
ai-agents claude-code codex cursor fallback-router free-tier gemini groq llm-router local-first m4st mcp openrouter python windsurf
Last synced: 26 days ago
JSON representation
Task-aware MCP LLM fallback router: 13 provider routes, 10 chains, semantic cache, auto fallback, $0/month.
- Host: GitHub
- URL: https://github.com/m4stanuj/mast-llm-router
- Owner: m4stanuj
- License: mit
- Created: 2026-05-29T06:12:09.000Z (26 days ago)
- Default Branch: main
- Last Pushed: 2026-05-29T07:13:01.000Z (26 days ago)
- Last Synced: 2026-05-29T08:25:12.548Z (26 days ago)
- Topics: ai-agents, claude-code, codex, cursor, fallback-router, free-tier, gemini, groq, llm-router, local-first, m4st, mcp, openrouter, python, windsurf
- Language: Python
- Homepage: https://github.com/m4stanuj/mast-llm-router
- Size: 331 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Agents: AGENTS.md
Awesome Lists containing this project
README
# π MAST LLM Router β Intelligent LLM Request Distribution Engine
[](https://github.com/m4stanuj/mast-llm-router/actions)
[](https://www.python.org/)
[](https://modelcontextprotocol.io)
[](LICENSE)
[](/#)
[](/#)
[](PRESENTATION.md)
[](SOCIAL.md)
[](mast-llm-router.zip)
[](https://github.com/m4stanuj/mast-llm-router)
> **π Task-aware LLM fallback router β 13 provider routes Β· 10 chains Β· 6 fallbacks Β· $0/month**
> Works with Claude Code, Cursor, Windsurf, Continue.dev, Codex CLI, and any MCP-compatible client.
---
```ascii
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Every AI pipeline breaks when a provider hits a rate β
β limit. This one doesn't. β
β β
β βββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β Task ββββΆβ Chain ββββΆβ Fallback ββββΆβ Response β β
β β Detect β β Select β β Loop x6 β β β β
β βββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β
β π One fails β Next takes over β resilient by design β
β π° 13 provider routes Β· free-tier APIs Β· $0/month β
β π§ Semantic caching Β· Auto key detection Β· MCP native β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
---
## π At a Glance
| Metric | Value |
|--------|-------|
| **Provider Routes** | 13 (Groq, Cerebras, Gemini, DeepSeek, OpenRouter, SambaNova, Together, NVIDIA NIM, Mistral, xAI/Grok, HuggingFace, Kimi K2, Nemotron) |
| **Task Chains** | 10 (speed, reason, code, vision, research, write, agent, pentest, hinglish, vision_reason) |
| **Fallback Depth** | 6 models per chain β auto-failover on 429/503/empty response |
| **Reliability Strategy** | Best-first chain routing with automatic fallback across provider routes |
| **Cache Strategy** | Fuzzy semantic matching at 0.82 threshold with a 500-entry LRU cache |
| **Monthly Cost** | **$0.00** (100% free-tier APIs) |
| **Protocol** | MCP (Model Context Protocol) β stdio + HTTP |
| **Clients** | Claude Code, Cursor, Windsurf, Continue.dev, Codex CLI, Antigravity |
---
## π― What is this?
`mast-llm-router` is a **task-aware intelligent fallback router** for LLM requests that:
- **Detects** what kind of task you're doing from your prompt keywords
- **Routes** to the optimal model chain for that specific task
- **Auto-falls** to the next provider if one hits rate limits, errors out, or returns garbage
- **Caches** semantically similar prompts to eliminate redundant API calls
- **Detects** API keys automatically from their prefix β just paste and go
- **Costs** exactly nothing β runs entirely on free-tier quotas
Built as part of [M4ST](https://github.com/m4stanuj) β a personal AI OS running on an RTX 2060 Super in Bareilly, India.
---
## M4ST Ecosystem
| Repo | Role |
|------|------|
| [MAST](https://github.com/m4stanuj/MAST) | Flagship AI operator stack |
| [mast-llm-router](https://github.com/m4stanuj/mast-llm-router) | This repo: task-aware LLM fallback router |
| [semantic-cache-engine](https://github.com/m4stanuj/semantic-cache-engine) | Standalone semantic cache module |
| [openwork](https://github.com/m4stanuj/openwork) | Universal MCP workspace/config layer |
| [m4stclaw-legacy-archive](https://github.com/m4stanuj/m4stclaw-legacy-archive) | Historical archive and lineage |
---
## π¬ Quick Demo
```
User: "Write a Python script to scrape Hacker News"
Router: Detected task β code
Chain: kimi-k2 β qwen3-coder β mimo-pro β nvidia-deepseek β deepseek β sambanova
Result: Response from kimi-k2 in 1.2s (cache miss)
User: "yeh kya hai samjhao"
Router: Detected task β hinglish
Chain: sarvam β gemini-flash β groq-llama β cerebras β openrouter β mistral
Result: Response in Hindi-English mix
User: "Explain quantum computing in simple terms"
Router: Detected task β reason
Chain: deepseek-r1 β nemotron β gemini-pro β openrouter β together β mistral
Result: Response from deepseek-r1 in 3.4s (cached from similar query)
```
---
## π§ Algorithm: How It Works
### Step 1: Task Detection
```
Input: "Write a Python web scraper"
β
βΌ
βββββββββββββββββ
β Keyword Scan β
β β
β "Python" β π code
β "scraper" β π code
β "write" β π code
βββββββββ¬ββββββββ
β
βββββββββΌββββββββ
β Chain: CODE β
β Confidence 94%β
βββββββββββββββββ
```
### Step 2: Fallback Loop
```
Chain: CODE
β
βββββΌβββββββββββββ
β Provider 1 βββ 429 Rate Limited βββ
β Kimi K2 β β
ββββββββββββββββββ β
βΌ
ββββββββββββββββββ ββββββββββββββββββ
β Provider 2 βββ 503 Error ββΆ Fallback β
β Qwen3 Coder βββ βββββββββββΆ Loop auto- β
ββββββββββββββββββ selects next β
β
ββββββββββββββββββ β
β Provider 3 ββββββββββββββββββββββββ
β Mimo Pro βββ β
Success 1.2s
ββββββββββββββββββ
β
βββββββββΌββββββββ
β Response sent β
β to MCP Client β
βββββββββββββββββ
```
### Step 3: Semantic Cache
```
Prompt βββΆ Embedding βββΆ Fuzzy Match (>0.82) βββΆ Cache Hit? βββΆ Return cached
β
Miss βββΆ Call API βββΆ Store
```
---
## β¨ Feature Highlights
- Task-aware chain selection for code, reasoning, research, writing, agents, vision, pentest, and Hinglish flows
- Automatic fallback when a provider rate-limits, errors, or returns an empty response
- SMART_KEY detection so mixed API keys can be pasted without manual provider mapping
- Semantic cache for repeated or similar prompts, tuned with a 0.82 fuzzy-match threshold
---
## Feature Matrix
| Feature | Detail |
|---|---|
| **13 provider routes** | Groq, Cerebras, Gemini, OpenRouter, SambaNova, DeepSeek, Together, NVIDIA NIM, Mistral, xAI/Grok, HuggingFace, Kimi K2, Nemotron |
| **10 task chains** | speed, reason, code, vision, research, write, agent, pentest, hinglish, vision_reason |
| **6 models per chain** | Best-first, auto-falls to next on failure |
| **SMART_KEY detection** | Paste any API key β provider auto-detected by prefix |
| **Semantic cache** | Fuzzy match at 0.82 threshold, 500 entry LRU |
| **Thread-safe cooldowns** | Per-key 429/auth cooldown, not per-provider |
| **Both transports** | stdio (local) + HTTP (remote) |
| **$0/month** | 100% free-tier APIs |
---
## Task Chains
```
speed β groq β cerebras β gemini-flash β openrouter β sambanova β deepseek
reason β deepseek-r1 β nemotron β gemini-pro β openrouter β together β mistral
code β kimi-k2 β qwen3-coder β mimo-pro β nvidia β deepseek β sambanova
vision β gemini-vision β openrouter-vision β together-vision β ...
research β perplexity β gemini-pro β deepseek-r1 β openrouter β ...
write β gemini-pro β mistral β together β openrouter β groq β cerebras
agent β deepseek-r1 β gemini-pro β openrouter β together β groq β ...
pentest β nvidia-deepseek β nemotron β deepseek-r1 β glm β mistral β ...
hinglish β sarvam β gemini-flash β groq β cerebras β openrouter β mistral
vision_reason β gemini-vision β openrouter-vision β together-vision β ...
```
---
## MCP Tools Exposed
| Tool | Description |
|---|---|
| `llm_chat` | Single-turn prompt β best model |
| `llm_chat_multi_turn` | Full conversation history support |
| `llm_detect_task` | Preview which chain will handle your prompt |
| `llm_router_status` | Provider health, key counts, cooldowns |
| `llm_list_providers` | All providers + chains in JSON |
| `llm_cache_control` | Cache stats or clear |
---
## Installation
### 1. Clone
```bash
git clone https://github.com/m4stanuj/mast-llm-router.git
cd mast-llm-router
```
### 2. Install dependencies
```bash
pip install -r requirements.txt
```
### 3. Configure keys
```bash
cp .env.example .env
# Edit .env β paste your free-tier API keys
```
> **SMART_KEY tip:** Just paste any key into `SMART_KEY_1`, `SMART_KEY_2`, etc.
> The router detects the provider automatically from the key prefix.
### 4. Test it
```bash
python src/server.py --help
```
---
## Client Setup
### Claude Code
Add to `~/.claude/claude_desktop_config.json` (or via `claude mcp add`):
```json
{
"mcpServers": {
"mast-router": {
"command": "python",
"args": ["/absolute/path/to/mast-llm-router/src/server.py"]
}
}
}
```
Or one-liner:
```bash
claude mcp add mast-router python /absolute/path/to/mast-llm-router/src/server.py
```
### Cursor / Windsurf
Settings β MCP β Add Server:
```json
{
"name": "mast-router",
"type": "stdio",
"command": "python",
"args": ["/absolute/path/to/mast-llm-router/src/server.py"]
}
```
### Continue.dev
In `.continue/config.json`:
```json
{
"mcpServers": [
{
"name": "mast-router",
"command": "python",
"args": ["/absolute/path/to/mast-llm-router/src/server.py"]
}
]
}
```
### Codex CLI
```bash
codex --mcp-server "python /absolute/path/to/mast-llm-router/src/server.py"
```
### HTTP Mode (Antigravity, Magnus, remote clients)
```bash
python src/server.py --http --port 8000
```
Then point your client to: `http://localhost:8000/mcp`
---
## Environment Variables
| Variable | Description |
|---|---|
| `SMART_KEY_1` β¦ `SMART_KEY_30` | Auto-detected keys (recommended) |
| `GROQ_API_KEY` | Groq (+ `_1` through `_20` for rotation) |
| `CEREBRAS_API_KEY` | Cerebras |
| `GEMINI_API_KEY` | Google Gemini |
| `OPENROUTER_API_KEY` | OpenRouter |
| `NVIDIA_API_KEY` | NVIDIA NIM |
| `SAMBANOVA_API_KEY` | SambaNova |
| `DEEPSEEK_API_KEY` | DeepSeek |
| `TOGETHER_API_KEY` | Together AI |
| `MISTRAL_API_KEY` | Mistral |
| `GROKAI_API_KEY` | xAI / Grok |
| `HUGGINGFACE_API_KEY` | HuggingFace |
---
## How Key Detection Works
```
gsk_... β Groq
csk-... β Cerebras
AIza... β Gemini
sk-or-... β OpenRouter
nvapi-... β NVIDIA NIM
sk-... β DeepSeek / Together / Mistral (length-based split)
xai-... β Grok
hf_... β HuggingFace
```
---
## Architecture
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MCP Client β
β (Claude Code / Cursor / Codex / Antigravity / β¦) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β stdio / HTTP
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ
β server.py (FastMCP) β
β llm_chat β multi_turn β detect_task β status β¦ β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ
β llm_fallback.py (Core Router) β
β β
β βββββββββββββββ ββββββββββββ βββββββββββββββββββ β
β β Task Detect β β Cache β β Key Manager β β
β β (keyword) β β (fuzzy) β β (cooldown) β β
β ββββββββ¬βββββββ ββββββββββββ βββββββββββββββββββ β
β β β
β ββββββββΌβββββββββββββββββββββββββββββββββββββββ β
β β Task Chain Selector β β
β β speed/reason/code/vision/pentest/hinglishβ¦ β β
β ββββββββ¬βββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββΌβββββββββββββββββββββββββββββββββββββββ β
β β Fallback Loop (6 models) β β
β β Provider 1 β fail β Provider 2 β β¦ β β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
---
## Free Tier Limits (as of 2026)
| Provider | Free RPM | Free TPD |
|---|---|---|
| Groq | 30 | 14,400 |
| Cerebras | 30 | ~1M |
| Gemini Flash | 15 | 1M |
| NVIDIA NIM | 40 | β |
| SambaNova | 10 | β |
| OpenRouter (free models) | varies | varies |
| DeepSeek | 50 | β |
---
## Project Structure
```
mast-llm-router/
βββ src/
β βββ server.py # MCP server (FastMCP)
β βββ llm_fallback.py # Core router logic
βββ config/
β βββ claude_code.json # Claude Code config
β βββ cursor_windsurf.json
βββ .env.example # Key template
βββ .gitignore
βββ requirements.txt
βββ README.md
```
---
## Part of M4ST Ecosystem
```
M4ST OS
βββ llm_fallback.py β this repo
βββ mcp_servers/ 86 MCP tools
βββ OpenWork MCP-based AI workspace
βββ CAI Pentest agent layer
βββ voice / memory / browser automation
```
---
## π Project Resources
| Resource | Description |
|----------|-------------|
| [π PRESENTATION.md](./PRESENTATION.md) | Full slide deck β algorithm walkthrough, benchmarks, use cases |
| [π± SOCIAL.md](./SOCIAL.md) | Social media kit β tweets, LinkedIn posts, hashtags, captions |
| [π¬ DEMO_STORYBOARD.md](./DEMO_STORYBOARD.md) | GIF/video storyboard β task detection, fallback, cache hit |
| [π€ AGENTS.md](./AGENTS.md) | Guide for AI agents using this MCP server |
| [π CHANGELOG.md](./CHANGELOG.md) | Version history and roadmap |
| [π¦ mast-llm-router.zip](./mast-llm-router.zip) | Downloadable ZIP archive |
| [π GitHub Release](https://github.com/m4stanuj/mast-llm-router/releases) | Latest release with assets |
## π Why MAST LLM Router?
```
β
13 provider routes β Redundancy across free-tier LLM APIs
β
10 task chains β Optimal model for every use case
β
6 fallbacks β Best-first recovery when providers fail
β
$0/month β Free tiers only
β
SMART_KEY β Paste any key, auto-detected
β
Semantic cache β Repeated/similar prompts can return instantly
β
MCP native β Works with every major AI coding tool
β
Open source β MIT license, fork and build
```
## β Star History
[](https://star-history.com/#m4stanuj/mast-llm-router&Date)
## π’ Share
```markdown
**Twitter/X:**
π§΅ I built a $0/month LLM router with 13 provider routes and auto-fallback.
6 models per chain. Semantic cache. MCP native.
github.com/m4stanuj/mast-llm-router
#LLM #AI #OpenSource #MCP #Python
**LinkedIn:**
ποΈ MAST LLM Router β task-aware fallback router for 13 LLM provider routes.
100% free-tier. Zero config. Full code on GitHub.
https://github.com/m4stanuj/mast-llm-router
```
## License
MIT β use it, fork it, build on it.
---
*Built by [@m4stanuj](https://github.com/m4stanuj) | [LinkedIn](https://linkedin.com/in/mast-anuj) | RTX 2060 Super | Bareilly, India*
*Zero VC money. Zero monthly cost. Full control.*
## π Hashtags
```
#LLM #AI #OpenSource #MCP #Python #MachineLearning #DeveloperTools
#AIAgents #LLMRouter #FreeAPI #ArtificialIntelligence #PythonDev
#ModelContextProtocol #LLMFallback #MultiProvider #AIIndex
```