https://github.com/m4stanuj/mast-llm-router

Task-aware MCP LLM fallback router: 13 provider routes, 10 chains, semantic cache, auto fallback, $0/month.
https://github.com/m4stanuj/mast-llm-router

ai-agents claude-code codex cursor fallback-router free-tier gemini groq llm-router local-first m4st mcp openrouter python windsurf

Last synced: 26 days ago
JSON representation

Task-aware MCP LLM fallback router: 13 provider routes, 10 chains, semantic cache, auto fallback, $0/month.

Host: GitHub
URL: https://github.com/m4stanuj/mast-llm-router
Owner: m4stanuj
License: mit
Created: 2026-05-29T06:12:09.000Z (26 days ago)
Default Branch: main
Last Pushed: 2026-05-29T07:13:01.000Z (26 days ago)
Last Synced: 2026-05-29T08:25:12.548Z (26 days ago)
Topics: ai-agents, claude-code, codex, cursor, fallback-router, free-tier, gemini, groq, llm-router, local-first, m4st, mcp, openrouter, python, windsurf
Language: Python
Homepage: https://github.com/m4stanuj/mast-llm-router
Size: 331 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Agents: AGENTS.md

Awesome Lists containing this project

README

# 🚀 MAST LLM Router — Intelligent LLM Request Distribution Engine

[![CI](https://github.com/m4stanuj/mast-llm-router/actions/workflows/ci.yml/badge.svg)](https://github.com/m4stanuj/mast-llm-router/actions)
[![Python](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/)
[![MCP](https://img.shields.io/badge/MCP-compatible-green)](https://modelcontextprotocol.io)
[![License](https://img.shields.io/badge/license-MIT-brightgreen)](LICENSE)
[![Cost](https://img.shields.io/badge/monthly%20cost-%240-success)](/#)
[![Providers](https://img.shields.io/badge/providers-13-orange)](/#)
[![PRESENTATION](https://img.shields.io/badge/view-Presentation-blueviolet)](PRESENTATION.md)
[![SOCIAL](https://img.shields.io/badge/social-kit-ff69b4)](SOCIAL.md)
[![Download](https://img.shields.io/badge/download-zip-success)](mast-llm-router.zip)
[![Stars](https://img.shields.io/github/stars/m4stanuj/mast-llm-router?style=social)](https://github.com/m4stanuj/mast-llm-router)

> **🏆 Task-aware LLM fallback router — 13 provider routes · 10 chains · 6 fallbacks · $0/month**
> Works with Claude Code, Cursor, Windsurf, Continue.dev, Codex CLI, and any MCP-compatible client.

---

```ascii
╔══════════════════════════════════════════════════════════════╗
║ ║
║ Every AI pipeline breaks when a provider hits a rate ║
║ limit. This one doesn't. ║
║ ║
║ ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ║
║ │ Task │──▶│ Chain │──▶│ Fallback │──▶│ Response │ ║
║ │ Detect │ │ Select │ │ Loop x6 │ │ │ ║
║ └─────────┘ └──────────┘ └──────────┘ └──────────┘ ║
║ ║
║ 🔄 One fails → Next takes over → resilient by design ║
║ 💰 13 provider routes · free-tier APIs · $0/month ║
║ 🧠 Semantic caching · Auto key detection · MCP native ║
║ ║
╚══════════════════════════════════════════════════════════════╝
```

---

## 📊 At a Glance

| Metric | Value |
|--------|-------|
| **Provider Routes** | 13 (Groq, Cerebras, Gemini, DeepSeek, OpenRouter, SambaNova, Together, NVIDIA NIM, Mistral, xAI/Grok, HuggingFace, Kimi K2, Nemotron) |
| **Task Chains** | 10 (speed, reason, code, vision, research, write, agent, pentest, hinglish, vision_reason) |
| **Fallback Depth** | 6 models per chain — auto-failover on 429/503/empty response |
| **Reliability Strategy** | Best-first chain routing with automatic fallback across provider routes |
| **Cache Strategy** | Fuzzy semantic matching at 0.82 threshold with a 500-entry LRU cache |
| **Monthly Cost** | **$0.00** (100% free-tier APIs) |
| **Protocol** | MCP (Model Context Protocol) — stdio + HTTP |
| **Clients** | Claude Code, Cursor, Windsurf, Continue.dev, Codex CLI, Antigravity |

---

## 🎯 What is this?

`mast-llm-router` is a **task-aware intelligent fallback router** for LLM requests that:

- **Detects** what kind of task you're doing from your prompt keywords
- **Routes** to the optimal model chain for that specific task
- **Auto-falls** to the next provider if one hits rate limits, errors out, or returns garbage
- **Caches** semantically similar prompts to eliminate redundant API calls
- **Detects** API keys automatically from their prefix — just paste and go
- **Costs** exactly nothing — runs entirely on free-tier quotas

Built as part of [M4ST](https://github.com/m4stanuj) — a personal AI OS running on an RTX 2060 Super in Bareilly, India.

---

## M4ST Ecosystem

| Repo | Role |
|------|------|
| [MAST](https://github.com/m4stanuj/MAST) | Flagship AI operator stack |
| [mast-llm-router](https://github.com/m4stanuj/mast-llm-router) | This repo: task-aware LLM fallback router |
| [semantic-cache-engine](https://github.com/m4stanuj/semantic-cache-engine) | Standalone semantic cache module |
| [openwork](https://github.com/m4stanuj/openwork) | Universal MCP workspace/config layer |
| [m4stclaw-legacy-archive](https://github.com/m4stanuj/m4stclaw-legacy-archive) | Historical archive and lineage |

---

## 🎬 Quick Demo

```
User: "Write a Python script to scrape Hacker News"
Router: Detected task → code
Chain: kimi-k2 → qwen3-coder → mimo-pro → nvidia-deepseek → deepseek → sambanova
Result: Response from kimi-k2 in 1.2s (cache miss)

User: "yeh kya hai samjhao"
Router: Detected task → hinglish
Chain: sarvam → gemini-flash → groq-llama → cerebras → openrouter → mistral
Result: Response in Hindi-English mix

User: "Explain quantum computing in simple terms"
Router: Detected task → reason
Chain: deepseek-r1 → nemotron → gemini-pro → openrouter → together → mistral
Result: Response from deepseek-r1 in 3.4s (cached from similar query)
```

---

## 🧠 Algorithm: How It Works

### Step 1: Task Detection
```
Input: "Write a Python web scraper"
│
▼
┌───────────────┐
│ Keyword Scan │
│ │
│ "Python" → 📝 code
│ "scraper" → 📝 code
│ "write" → 📝 code
└───────┬───────┘
│
┌───────▼───────┐
│ Chain: CODE │
│ Confidence 94%│
└───────────────┘
```

### Step 2: Fallback Loop
```
Chain: CODE
│
┌───▼────────────┐
│ Provider 1 │── 429 Rate Limited ──┐
│ Kimi K2 │ │
└────────────────┘ │
▼
┌────────────────┐ ┌────────────────┐
│ Provider 2 │── 503 Error ─▶ Fallback │
│ Qwen3 Coder │── ──────────▶ Loop auto- │
└────────────────┘ selects next │
│
┌────────────────┐ │
│ Provider 3 │◀─────────────────────┘
│ Mimo Pro │── ✅ Success 1.2s
└────────────────┘
│
┌───────▼───────┐
│ Response sent │
│ to MCP Client │
└───────────────┘
```

### Step 3: Semantic Cache
```
Prompt ──▶ Embedding ──▶ Fuzzy Match (>0.82) ──▶ Cache Hit? ──▶ Return cached
│
Miss ──▶ Call API ──▶ Store
```

---

## ✨ Feature Highlights

- Task-aware chain selection for code, reasoning, research, writing, agents, vision, pentest, and Hinglish flows
- Automatic fallback when a provider rate-limits, errors, or returns an empty response
- SMART_KEY detection so mixed API keys can be pasted without manual provider mapping
- Semantic cache for repeated or similar prompts, tuned with a 0.82 fuzzy-match threshold

---

## Feature Matrix

| Feature | Detail |
|---|---|
| **13 provider routes** | Groq, Cerebras, Gemini, OpenRouter, SambaNova, DeepSeek, Together, NVIDIA NIM, Mistral, xAI/Grok, HuggingFace, Kimi K2, Nemotron |
| **10 task chains** | speed, reason, code, vision, research, write, agent, pentest, hinglish, vision_reason |
| **6 models per chain** | Best-first, auto-falls to next on failure |
| **SMART_KEY detection** | Paste any API key — provider auto-detected by prefix |
| **Semantic cache** | Fuzzy match at 0.82 threshold, 500 entry LRU |
| **Thread-safe cooldowns** | Per-key 429/auth cooldown, not per-provider |
| **Both transports** | stdio (local) + HTTP (remote) |
| **$0/month** | 100% free-tier APIs |

---

## Task Chains

```
speed → groq → cerebras → gemini-flash → openrouter → sambanova → deepseek
reason → deepseek-r1 → nemotron → gemini-pro → openrouter → together → mistral
code → kimi-k2 → qwen3-coder → mimo-pro → nvidia → deepseek → sambanova
vision → gemini-vision → openrouter-vision → together-vision → ...
research → perplexity → gemini-pro → deepseek-r1 → openrouter → ...
write → gemini-pro → mistral → together → openrouter → groq → cerebras
agent → deepseek-r1 → gemini-pro → openrouter → together → groq → ...
pentest → nvidia-deepseek → nemotron → deepseek-r1 → glm → mistral → ...
hinglish → sarvam → gemini-flash → groq → cerebras → openrouter → mistral
vision_reason → gemini-vision → openrouter-vision → together-vision → ...
```

---

## MCP Tools Exposed

| Tool | Description |
|---|---|
| `llm_chat` | Single-turn prompt → best model |
| `llm_chat_multi_turn` | Full conversation history support |
| `llm_detect_task` | Preview which chain will handle your prompt |
| `llm_router_status` | Provider health, key counts, cooldowns |
| `llm_list_providers` | All providers + chains in JSON |
| `llm_cache_control` | Cache stats or clear |

---

## Installation

### 1. Clone

```bash
git clone https://github.com/m4stanuj/mast-llm-router.git
cd mast-llm-router
```

### 2. Install dependencies

```bash
pip install -r requirements.txt
```

### 3. Configure keys

```bash
cp .env.example .env
# Edit .env — paste your free-tier API keys
```

> **SMART_KEY tip:** Just paste any key into `SMART_KEY_1`, `SMART_KEY_2`, etc.
> The router detects the provider automatically from the key prefix.

### 4. Test it

```bash
python src/server.py --help
```

---

## Client Setup

### Claude Code

Add to `~/.claude/claude_desktop_config.json` (or via `claude mcp add`):

```json
{
"mcpServers": {
"mast-router": {
"command": "python",
"args": ["/absolute/path/to/mast-llm-router/src/server.py"]
}
}
}
```

Or one-liner:
```bash
claude mcp add mast-router python /absolute/path/to/mast-llm-router/src/server.py
```

### Cursor / Windsurf

Settings → MCP → Add Server:

```json
{
"name": "mast-router",
"type": "stdio",
"command": "python",
"args": ["/absolute/path/to/mast-llm-router/src/server.py"]
}
```

### Continue.dev

In `.continue/config.json`:

```json
{
"mcpServers": [
{
"name": "mast-router",
"command": "python",
"args": ["/absolute/path/to/mast-llm-router/src/server.py"]
}
]
}
```

### Codex CLI

```bash
codex --mcp-server "python /absolute/path/to/mast-llm-router/src/server.py"
```

### HTTP Mode (Antigravity, Magnus, remote clients)

```bash
python src/server.py --http --port 8000
```

Then point your client to: `http://localhost:8000/mcp`

---

## Environment Variables

| Variable | Description |
|---|---|
| `SMART_KEY_1` … `SMART_KEY_30` | Auto-detected keys (recommended) |
| `GROQ_API_KEY` | Groq (+ `_1` through `_20` for rotation) |
| `CEREBRAS_API_KEY` | Cerebras |
| `GEMINI_API_KEY` | Google Gemini |
| `OPENROUTER_API_KEY` | OpenRouter |
| `NVIDIA_API_KEY` | NVIDIA NIM |
| `SAMBANOVA_API_KEY` | SambaNova |
| `DEEPSEEK_API_KEY` | DeepSeek |
| `TOGETHER_API_KEY` | Together AI |
| `MISTRAL_API_KEY` | Mistral |
| `GROKAI_API_KEY` | xAI / Grok |
| `HUGGINGFACE_API_KEY` | HuggingFace |

---

## How Key Detection Works

```
gsk_... → Groq
csk-... → Cerebras
AIza... → Gemini
sk-or-... → OpenRouter
nvapi-... → NVIDIA NIM
sk-... → DeepSeek / Together / Mistral (length-based split)
xai-... → Grok
hf_... → HuggingFace
```

---

## Architecture

```
┌─────────────────────────────────────────────────────┐
│ MCP Client │
│ (Claude Code / Cursor / Codex / Antigravity / …) │
└────────────────────┬────────────────────────────────┘
│ stdio / HTTP
┌────────────────────▼────────────────────────────────┐
│ server.py (FastMCP) │
│ llm_chat │ multi_turn │ detect_task │ status … │
└────────────────────┬────────────────────────────────┘
│
┌────────────────────▼────────────────────────────────┐
│ llm_fallback.py (Core Router) │
│ │
│ ┌─────────────┐ ┌──────────┐ ┌─────────────────┐ │
│ │ Task Detect │ │ Cache │ │ Key Manager │ │
│ │ (keyword) │ │ (fuzzy) │ │ (cooldown) │ │
│ └──────┬──────┘ └──────────┘ └─────────────────┘ │
│ │ │
│ ┌──────▼──────────────────────────────────────┐ │
│ │ Task Chain Selector │ │
│ │ speed/reason/code/vision/pentest/hinglish… │ │
│ └──────┬──────────────────────────────────────┘ │
│ │ │
│ ┌──────▼──────────────────────────────────────┐ │
│ │ Fallback Loop (6 models) │ │
│ │ Provider 1 → fail → Provider 2 → … │ │
│ └─────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
```

---

## Free Tier Limits (as of 2026)

| Provider | Free RPM | Free TPD |
|---|---|---|
| Groq | 30 | 14,400 |
| Cerebras | 30 | ~1M |
| Gemini Flash | 15 | 1M |
| NVIDIA NIM | 40 | — |
| SambaNova | 10 | — |
| OpenRouter (free models) | varies | varies |
| DeepSeek | 50 | — |

---

## Project Structure

```
mast-llm-router/
├── src/
│ ├── server.py # MCP server (FastMCP)
│ └── llm_fallback.py # Core router logic
├── config/
│ ├── claude_code.json # Claude Code config
│ └── cursor_windsurf.json
├── .env.example # Key template
├── .gitignore
├── requirements.txt
└── README.md
```

---

## Part of M4ST Ecosystem

```
M4ST OS
├── llm_fallback.py ← this repo
├── mcp_servers/ 86 MCP tools
├── OpenWork MCP-based AI workspace
├── CAI Pentest agent layer
└── voice / memory / browser automation
```

---

## 📂 Project Resources

| Resource | Description |
|----------|-------------|
| [📖 PRESENTATION.md](./PRESENTATION.md) | Full slide deck — algorithm walkthrough, benchmarks, use cases |
| [📱 SOCIAL.md](./SOCIAL.md) | Social media kit — tweets, LinkedIn posts, hashtags, captions |
| [🎬 DEMO_STORYBOARD.md](./DEMO_STORYBOARD.md) | GIF/video storyboard — task detection, fallback, cache hit |
| [🤖 AGENTS.md](./AGENTS.md) | Guide for AI agents using this MCP server |
| [📋 CHANGELOG.md](./CHANGELOG.md) | Version history and roadmap |
| [📦 mast-llm-router.zip](./mast-llm-router.zip) | Downloadable ZIP archive |
| [🔗 GitHub Release](https://github.com/m4stanuj/mast-llm-router/releases) | Latest release with assets |

## 🏆 Why MAST LLM Router?

```
✅ 13 provider routes → Redundancy across free-tier LLM APIs
✅ 10 task chains → Optimal model for every use case
✅ 6 fallbacks → Best-first recovery when providers fail
✅ $0/month → Free tiers only
✅ SMART_KEY → Paste any key, auto-detected
✅ Semantic cache → Repeated/similar prompts can return instantly
✅ MCP native → Works with every major AI coding tool
✅ Open source → MIT license, fork and build
```

## ⭐ Star History

[![Star History Chart](https://api.star-history.com/svg?repos=m4stanuj/mast-llm-router&type=Date)](https://star-history.com/#m4stanuj/mast-llm-router&Date)

## 📢 Share

```markdown
**Twitter/X:**
🧵 I built a $0/month LLM router with 13 provider routes and auto-fallback.
6 models per chain. Semantic cache. MCP native.
github.com/m4stanuj/mast-llm-router
#LLM #AI #OpenSource #MCP #Python

**LinkedIn:**
🏗️ MAST LLM Router — task-aware fallback router for 13 LLM provider routes.
100% free-tier. Zero config. Full code on GitHub.
https://github.com/m4stanuj/mast-llm-router
```

## License

MIT — use it, fork it, build on it.

---

*Built by [@m4stanuj](https://github.com/m4stanuj) | [LinkedIn](https://linkedin.com/in/mast-anuj) | RTX 2060 Super | Bareilly, India*
*Zero VC money. Zero monthly cost. Full control.*

## 🔖 Hashtags

```
#LLM #AI #OpenSource #MCP #Python #MachineLearning #DeveloperTools
#AIAgents #LLMRouter #FreeAPI #ArtificialIntelligence #PythonDev
#ModelContextProtocol #LLMFallback #MultiProvider #AIIndex
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/m4stanuj/mast-llm-router

Awesome Lists containing this project

README