{"id":50324993,"url":"https://github.com/jackccrawford/token-scout","last_synced_at":"2026-05-29T05:04:28.364Z","repository":{"id":349440652,"uuid":"1173185677","full_name":"jackccrawford/token-scout","owner":"jackccrawford","description":"For OpenClaw, Hermes and more. Find free and low-cost inference (LLM models). Use them directly. Provides both a CLI and MCP server that knows which free-tier LLM APIs exist, which ones you have keys for, and which one fits your task. Returns endpoints so can you call models directly. No proxy, no middleware, no latency tax.","archived":false,"fork":false,"pushed_at":"2026-04-05T23:15:39.000Z","size":34,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-06T01:21:36.058Z","etag":null,"topics":["agent","agentic-ai","ai-agents","claude-code","free","free-inference","llm","mcp","mcp-server","model-discovery","model-routing","ollama","openclaw","openrouter","token-optimization"],"latest_commit_sha":null,"homepage":"https://www.agentdoor.ai","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jackccrawford.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-05T05:03:47.000Z","updated_at":"2026-04-06T01:19:56.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/jackccrawford/token-scout","commit_stats":null,"previous_names":["jackccrawford/token-scout"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/jackccrawford/token-scout","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackccrawford%2Ftoken-scout","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackccrawford%2Ftoken-scout/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackccrawford%2Ftoken-scout/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackccrawford%2Ftoken-scout/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jackccrawford","download_url":"https://codeload.github.com/jackccrawford/token-scout/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackccrawford%2Ftoken-scout/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33637486,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","agentic-ai","ai-agents","claude-code","free","free-inference","llm","mcp","mcp-server","model-discovery","model-routing","ollama","openclaw","openrouter","token-optimization"],"created_at":"2026-05-29T05:04:15.247Z","updated_at":"2026-05-29T05:04:28.326Z","avatar_url":"https://github.com/jackccrawford.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Token Scout\n\n**Live LLM model discovery for AI agents. Free and cheap inference, routed safely.**\n\nToken Scout discovers LLM models in real time — querying cloud providers and probing local Ollama instances every time you ask. It returns endpoints with compatibility profiles and live pricing so your agent can route tasks to the cheapest viable model without breaking tool calls, clipping context, or corrupting reasoning formats.\n\nBuilt for autonomous AI agents, agentic frameworks, and multi-model workflows. Works with [Claude Code](https://claude.ai/claude-code), [OpenClaw](https://github.com/openclaw), [Hermes](https://github.com/nousresearch), [LangChain](https://github.com/langchain-ai/langchain), [CrewAI](https://github.com/crewai), or any system that needs to pick a model at runtime.\n\n**No proxy. No middleware. No latency tax.** Token Scout tells your agent where to go. The agent calls the model directly.\n\n---\n\n## Why Your Agent Needs This\n\nAI agents that hardcode model IDs are leaving money on the table. Right now there are **28+ free models on OpenRouter alone** — including Qwen3 Coder 480B, Nemotron 120B, and DeepSeek R1. Tomorrow that number will be different.\n\nToken Scout gives your agent:\n- **Live model discovery** — queries OpenRouter, Groq, Cerebras, Mistral, GitHub, Google, and local Ollama instances in real time\n- **Compatibility filtering** — won't route to a model that breaks your tool calls, clips your context, or uses incompatible reasoning tags\n- **Cost control** — set a max cost per 1K tokens; free models only, cheap models, or no limit\n- **Quota tracking** — tracks requests and tokens consumed per provider, filters out exhausted models\n\nOne API call. Your agent gets back a ranked list of models that are available right now, compatible with the task, and within budget.\n\n---\n\n## For AI Systems\n\nToken-efficient reference:\n\n| Tool | Args | Purpose |\n|------|------|---------|\n| `token_scout` | `query, prefer=\"\", require={}` | Find models by name, size, provider, capability, or compatibility |\n\n### token_scout()\n\n```\ntoken_scout(query=\"reasoning code\", require={\"reasoning_format\": \"inline_tags\", \"min_context\": 32000})\n-\u003e 33 models: Qwen3 Coder, DeepSeek R1 distills, Qwen3.6 Plus...\n\ntoken_scout(query=\"fast classification\")\n-\u003e Llama 3.1 8B on Groq, Llama 4 Scout on Cerebras...\n\ntoken_scout(query=\"\", prefer=\"context\")\n-\u003e all models ranked by context window size\n\ntoken_scout(query=\"\")\n-\u003e status: providers, model counts, live discovery results\n```\n\n**prefer** options: `quota` (most requests remaining), `speed` (fastest), `context` (largest window), `budget` (Claude budget-aware)\n\n**require** — hard constraints applied before ranking:\n\n| Field | Values | Purpose |\n|-------|--------|---------|\n| `reasoning_format` | `api_separated`, `inline_tags`, `hidden`, `none`, `any` | How the model exposes chain-of-thought |\n| `tool_format` | `anthropic`, `openai_function`, `ollama`, `none`, `any` | Tool/function calling format |\n| `tool_reliability` | `native`, `claimed`, `none`, `any` | Whether tool support actually works |\n| `min_context` | integer (tokens) | Minimum context window |\n| `min_completion` | integer (tokens) | Minimum output token limit |\n| `modality` | `text`, `text+image`, etc. | Required input modality |\n\nReturns: model ID, endpoint, API style, key env var, context window, strengths, pricing, compatibility profile, quota status. Everything your agent needs to make the call.\n\n### Cost Gate\n\nSet `TOKEN_SCOUT_MAX_COST` to control maximum cost per 1K tokens (prompt + completion averaged):\n\n- `0` — free models only\n- `0.001` — free + very cheap (default, ~$1/M tokens)\n- `0.01` — includes mid-tier models\n- Unset — defaults to `0.001`\n\n---\n\n## The Problem Token Scout Solves\n\nAgents that route tasks to LLMs face three compatibility walls:\n\n1. **Tool format fragmentation** — Anthropic, OpenAI, and Ollama all handle function calling differently. Routing to the wrong format breaks your agent's tool chain.\n2. **Context window clipping** — sending 200K tokens to a model with 32K context doesn't degrade gracefully. It's catastrophic data loss.\n3. **Reasoning tag corruption** — Claude uses API-separated thinking. DeepSeek R1 and Qwen3 use inline `\u003cthink\u003e` tags. Mixing these mid-workflow corrupts the session.\n\nToken Scout profiles every model for these compatibility dimensions and filters before ranking. Your agent can't accidentally route to a model that will break it.\n\n---\n\n## Providers\n\n### Cloud (free tier, no credit card required unless noted)\n\n| Provider | Models | Get a key |\n|----------|--------|-----------|\n| **Groq** | Llama 4 Scout/Maverick, Llama 3.3 70B, Kimi K2, Qwen3 32B, GPT-OSS 120B | [console.groq.com](https://console.groq.com) |\n| **Cerebras** | Llama 3.3 70B, Llama 4 Scout, Qwen3 32B | [cloud.cerebras.ai](https://cloud.cerebras.ai) |\n| **Mistral** | Mistral Small 3.1 24B | [console.mistral.ai](https://console.mistral.ai) |\n| **OpenRouter** | 28+ free, 600+ paid — **live discovery** | [openrouter.ai](https://openrouter.ai) |\n| **GitHub Models** | GPT-4o, DeepSeek R1, Grok 3 Mini | [github.com/marketplace/models](https://github.com/marketplace/models) |\n| **Google AI** | Gemini 2.0 Flash (1M context) | [aistudio.google.com](https://aistudio.google.com) |\n\n### Local (Ollama constellation — auto-discovered)\n\nToken Scout probes your local network for running Ollama instances. Set env vars to point to your machines:\n\n| Env var | Default | Purpose |\n|---------|---------|---------|\n| `OLLAMA_HOST` | `127.0.0.1` | Local Ollama |\n| `MARS_HOST` | — | Additional host |\n| `GALAXY_HOST` | — | GPU inference |\n| `LUNAR_HOST` | — | Light inference |\n| `EXPLORA_HOST` | — | Heavy compute (multi-GPU, nginx load-balanced) |\n\nLocal models are free (electricity only) and have unlimited quota.\n\n### Live Discovery\n\nOpenRouter models are discovered in real time via `GET /api/v1/models`. No API key needed for discovery — free models are browsable immediately. Models and pricing change frequently; Token Scout catches them as they appear and disappear.\n\n---\n\n## Quick Start\n\n### Rust CLI (recommended)\n\n```bash\ngit clone https://github.com/jackccrawford/token-scout.git\ncd token-scout\ncargo build --release\n\n# JSON-RPC over stdin/stdout\necho '{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"scout\",\"params\":{\"query\":\"reasoning\"}}' | ./target/release/token-scout\n```\n\n### Python MCP Server\n\n```bash\npip install -e .\n\n# Add to Claude Code\nclaude mcp add token-scout -- token-scout\n\n# Test it\ntoken-scout\n```\n\n### Add to Claude Desktop\n\n```json\n{\n  \"mcpServers\": {\n    \"token-scout\": {\n      \"command\": \"token-scout\",\n      \"env\": {\n        \"GROQ_API_KEY\": \"gsk_...\",\n        \"OPENROUTER_API_KEY\": \"sk-or-...\"\n      }\n    }\n  }\n}\n```\n\nSet API keys in your shell profile (`~/.zshrc`, `~/.bashrc`), or pass them in the config.\n\n---\n\n## How It Works\n\nToken Scout discovers models live. Every query reflects what's actually available right now — not what was available when the code was last updated.\n\nThree discovery layers run on first query:\n\n1. **OpenRouter live** — queries the OpenRouter API for all available models with real-time pricing. Free models appear and disappear hourly; Token Scout catches them as they come and go.\n2. **Ollama constellation** — probes your local network for running Ollama instances and inventories their loaded models.\n3. **Static fallback** — a curated set of known free-tier providers (Groq, Cerebras, Mistral, GitHub, Google) for when live discovery is unavailable.\n\nAfter discovery, every query:\n- Filters by cost gate (`TOKEN_SCOUT_MAX_COST`)\n- Filters by compatibility requirements (`require`)\n- Filters by quota availability\n- Ranks by relevance and `prefer` strategy\n- Returns everything your agent needs to call the model directly\n\n### Compatibility Profiles\n\nEvery discovered model gets a compatibility profile — inferred from model family, provider metadata, and live API fields:\n\n| Field | What it tells your agent |\n|-------|--------------------------|\n| `reasoning_format` | How thinking is exposed: `api_separated` (Claude, Gemini), `inline_tags` (DeepSeek R1, Qwen3+), `hidden` (OpenAI o-series), `none` |\n| `reasoning_tag` | The actual tag name if inline (e.g. `think`) — so your agent can parse or strip it |\n| `tool_format` | `anthropic`, `openai_function`, `ollama`, `none` |\n| `tool_reliability` | `native` (tested), `claimed` (API says yes), `none` |\n| `max_completion` | Output token limit |\n| `modality` | Input modalities: `text`, `text+image`, etc. |\n\n### Budget Awareness\n\nToken Scout reads `/tmp/claude-usage.json` (from `scripts/scrape-claude-usage.sh`) to track Claude session and weekly usage. When budget is tight, scout prioritizes free and local models automatically.\n\n---\n\n## Use Cases\n\n- **Agentic coding assistants** — route sub-tasks (summarize, search, draft) to free models while the main agent stays on a premium model\n- **Multi-model pipelines** — pick the right model for each stage: fast/cheap for classification, reasoning-capable for analysis, deep-context for synthesis\n- **Cost optimization** — stop paying for inference on tasks that free models handle fine\n- **Local-first AI** — discover and use Ollama models on your own hardware before touching cloud APIs\n- **Fleet coordination** — multiple agents share a Token Scout instance, quota tracking prevents any single agent from exhausting a provider\n\n---\n\n## Contributing\n\nPRs welcome. Especially:\n- New provider integrations (live discovery endpoints)\n- Compatibility profile corrections (tested tool support, reasoning format verification)\n- Ollama host configurations for different network setups\n- Budget integration improvements\n\n---\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjackccrawford%2Ftoken-scout","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjackccrawford%2Ftoken-scout","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjackccrawford%2Ftoken-scout/lists"}