{"id":30679429,"url":"https://github.com/vllm-project/semantic-router","last_synced_at":"2026-01-05T07:27:40.989Z","repository":{"id":312430249,"uuid":"1045247072","full_name":"vllm-project/semantic-router","owner":"vllm-project","description":"Intelligent Mixture-of-Models Router for Efficient LLM Inference","archived":false,"fork":false,"pushed_at":"2025-10-05T14:21:22.000Z","size":6737,"stargazers_count":1605,"open_issues_count":94,"forks_count":179,"subscribers_count":24,"default_branch":"main","last_synced_at":"2025-10-05T14:45:45.632Z","etag":null,"topics":["ai-gateway","bert-classification","envoy-ext-proc","envoyproxy","fine-tuning","golang","huggingface-candle","huggingface-transformers","kubernetes","llm-tool-call","mixture-of-models","pii-detection","prompt-engineering","prompt-guard","python","rust","semantic-router","vllm"],"latest_commit_sha":null,"homepage":"https://vllm-semantic-router.com","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vllm-project.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-26T21:49:50.000Z","updated_at":"2025-10-05T14:21:26.000Z","dependencies_parsed_at":"2025-09-28T01:17:07.385Z","dependency_job_id":null,"html_url":"https://github.com/vllm-project/semantic-router","commit_stats":null,"previous_names":["vllm-project/semantic-router"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/vllm-project/semantic-router","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vllm-project%2Fsemantic-router","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vllm-project%2Fsemantic-router/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vllm-project%2Fsemantic-router/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vllm-project%2Fsemantic-router/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vllm-project","download_url":"https://codeload.github.com/vllm-project/semantic-router/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vllm-project%2Fsemantic-router/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278470180,"owners_count":25992203,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-gateway","bert-classification","envoy-ext-proc","envoyproxy","fine-tuning","golang","huggingface-candle","huggingface-transformers","kubernetes","llm-tool-call","mixture-of-models","pii-detection","prompt-engineering","prompt-guard","python","rust","semantic-router","vllm"],"created_at":"2025-09-01T14:10:42.480Z","updated_at":"2026-01-05T07:27:40.984Z","avatar_url":"https://github.com/vllm-project.png","language":"Go","funding_links":[],"categories":["*Ops for AI","📚 Projects (1974 total)","Inference","Rust","Go","LLM Inference \u0026 Serving Tools"],"sub_categories":["LLMOps","Tools \u0026 Libraries","LLM Router"],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003cimg src=\"website/static/img/code.png\" alt=\"vLLM Semantic Router\" width=\"100%\"/\u003e\n\n[![Documentation](https://img.shields.io/badge/docs-read%20the%20docs-blue)](https://vllm-semantic-router.com)\n[![Hugging Face](https://img.shields.io/badge/🤗%20Hugging%20Face-Community-yellow)](https://huggingface.co/LLM-Semantic-Router)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)\n[![Crates.io](https://img.shields.io/crates/v/candle-semantic-router.svg)](https://crates.io/crates/candle-semantic-router)\n![Test And Build](https://github.com/vllm-project/semantic-router/workflows/Test%20And%20Build/badge.svg)\n[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/vllm-project/semantic-router)\n\n**📚 [Complete Documentation](https://vllm-semantic-router.com) | 🚀 [Quick Start](https://vllm-semantic-router.com/docs/installation) | 📣 [Blog](https://vllm-semantic-router.com/blog/) | 📖 [Publications](https://vllm-semantic-router.com/publications/)**\n\n\u003c/div\u003e\n\n---\n\n*Latest News* 🔥\n\n- [2025/12/16] Collaboration: [AMD × vLLM Semantic Router: Building the System Intelligence Together](https://blog.vllm.ai/2025/12/16/vllm-sr-amd.html)\n- [2025/12/15] New Blog: [Token-Level Truth: Real-Time Hallucination Detection for Production LLMs](https://blog.vllm.ai/2025/12/14/halugate.html)\n- [2025/11/19] New Blog: [Signal-Decision Driven Architecture: Reshaping Semantic Routing at Scale](https://blog.vllm.ai/2025/11/19/signal-decision.html)\n- [2025/11/03] Our paper [Category-Aware Semantic Caching for Heterogeneous LLM Workloads](https://arxiv.org/abs/2510.26835) published\n- [2025/10/27] New Blog: [Scaling Semantic Routing with Extensible LoRA](https://blog.vllm.ai/2025/10/27/semantic-router-modular.html)\n- [2025/10/12] Our paper [When to Reason: Semantic Router for vLLM](https://arxiv.org/abs/2510.08731) accepted by NeurIPS 2025 MLForSys.\n- [2025/10/08] Collaboration: vLLM Semantic Router with [vLLM Production Stack](https://github.com/vllm-project/production-stack) Team.\n- [2025/09/01] Released the project: [vLLM Semantic Router: Next Phase in LLM inference](https://blog.vllm.ai/2025/09/11/semantic-router.html).\n\n---\n\n## Goals\n\nWe are building the **System Level Intelligence** for Mixture-of-Models (MoM), bringing the **Collective Intelligence** into **LLM systems**, answering the following questions:\n\n1. How to capture the missing signals in request, response and context?\n2. How to combine the signals to make better decisions?\n3. How to collaborate more efficiently between different models?\n4. How to secure the real world and LLM system from jailbreaks, pii leaks, hallucinations?\n5. How to collect the valuable signals and build a self-learning system?\n\n![vLLM Semantic Router Banner](./website/static/img/banner.png)\n\n### Where it lives\n\nIt lives between the real world and models:\n\n![level](./website/static/img/level.png)\n\n### Architecture\n\nA quick overview of the current architecture:\n\n![architecture](./website/static/img/architecture.png)\n\n## Quick Start\n\n### Installation\n\n\u003e [!TIP]\n\u003e We recommend that you setup a Python virtual environment to manage dependencies.\n\n```bash\n$ python -m venv vsr\n$ source vsr/bin/activate\n$ pip install vllm-sr\n```\n\nInstalled successfully if you see the following help message:\n\n```bash\n$ vllm-sr\n\n       _ _     __  __       ____  ____\n__   _| | |_ _|  \\/  |     / ___||  _ \\\n\\ \\ / / | | | | |\\/| |_____\\___ \\| |_) |\n \\ V /| | | |_| | |  |_____|___) |  _ \u003c\n  \\_/ |_|_|\\__,_|_|  |     |____/|_| \\_\\\n\nvLLM Semantic Router - Intelligent routing for vLLM\n\nUsage: vllm-sr [OPTIONS] COMMAND [ARGS]...\n\n  vLLM Semantic Router CLI - Intelligent routing and caching for vLLM\n  endpoints.\n\nOptions:\n  --version  Show version and exit.\n  --help     Show this message and exit.\n\nCommands:\n  config  Print generated configuration.\n  init    Initialize vLLM Semantic Router configuration.\n  logs    Show logs from vLLM Semantic Router service.\n  serve   Start vLLM Semantic Router.\n  status  Show status of vLLM Semantic Router services.\n  stop    Stop vLLM Semantic Router.\n```\n\n\u003e [!TIP]\n\u003e You can specify the HF_ENDPOINT, HF_TOKEN, and HF_HOME environment variables to configure the Hugging Face credentials.\n\n```bash\n# Set environment variables (optional)\nexport HF_ENDPOINT=https://huggingface.co  # Or use mirror: https://hf-mirror.com\nexport HF_TOKEN=your_token_here  # Only for gated models\nexport HF_HOME=/path/to/cache  # Optional: custom cache directory\n\n# Start the service - models download automatically\n# Environment variables are automatically passed to the container\nvllm-sr serve\n```\n\n## Documentation 📖\n\nFor comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit:\n\nComplete Documentation at Read the **[Docs](https://vllm-semantic-router.com/)**\n\nThe documentation includes:\n\n- **[Installation Guide](https://vllm-semantic-router.com/docs/installation/)** - Complete setup instructions\n- **[System Architecture](https://vllm-semantic-router.com/docs/overview/architecture/system-architecture/)** - Technical deep dive\n- **[Model Training](https://vllm-semantic-router.com/docs/training/training-overview/)** - How classification models work\n- **[API Reference](https://vllm-semantic-router.com/docs/api/router/)** - Complete API documentation\n- **[Dashboard](https://vllm-semantic-router.com/docs/overview/dashboard)** - vLLM Semantic Router Dashboard\n\n## Community 👋\n\nFor questions, feedback, or to contribute, please join `#semantic-router` channel in vLLM Slack.\n\n### Community Meetings 📅\n\nWe host bi-weekly community meetings to sync up with contributors across different time zones:\n\n- **First Tuesday of the month**: 9:00-10:00 AM EST (accommodates US EST, EU, and Asia Pacific contributors)\n  - [Zoom Link](https://us05web.zoom.us/j/84122485631?pwd=BB88v03mMNLVHn60YzVk4PihuqBV9d.1)\n  - [Google Calendar Invite](https://us05web.zoom.us/meeting/tZAsdeuspj4sGdVraOOR4UaXSstrH2jjPYFq/calendar/google/add?meetingMasterEventId=4jjzUKSLSLiBHtIKZpGc3g)\n  - [ics file](https://drive.google.com/file/d/15wO8cg0ZjNxdr8OtGiZyAgkSS8_Wry0J/view?usp=sharing)\n- **Third Tuesday of the month**: 1:00-2:00 PM EST (accommodates US EST and California contributors)\n  - [Zoom Link](https://us06web.zoom.us/j/86871492845?pwd=LcTtXm9gtGu23JeWqXxbnLLCCvbumB.1)\n  - [Google Calendar Invite](https://us05web.zoom.us/meeting/tZIlcOispzkiHtH2dlkWlLym68bEqvuf3MU5/calendar/google/add?meetingMasterEventId=PqWz2vk7TOCszPXqconGAA)\n  - [ics file](https://drive.google.com/file/d/1T54mwYpXXoV9QfR76I56BFBPNbykSsTw/view?usp=sharing)\n- Meeting Recordings: [YouTube](https://www.youtube.com/@vLLMSemanticRouter/videos)\n\nJoin us to discuss the latest developments, share ideas, and collaborate on the project!\n\n## Citation\n\nIf you find Semantic Router helpful in your research or projects, please consider citing it:\n\n```\n@misc{semanticrouter2025,\n  title={vLLM Semantic Router},\n  author={vLLM Semantic Router Team},\n  year={2025},\n  howpublished={\\url{https://github.com/vllm-project/semantic-router}},\n}\n```\n\n## Star History 🔥\n\nWe opened the project at Aug 31, 2025. We love open source  and collaboration ❤️\n\n[![Star History Chart](https://api.star-history.com/svg?repos=vllm-project/semantic-router\u0026type=Date)](https://www.star-history.com/#vllm-project/semantic-router\u0026Date)\n\n## Sponsors 👋\n\nWe are grateful to our sponsors who support us:\n\n---\n\n[**AMD**](https://www.amd.com) provides us with GPU resources and [ROCm™](https://www.amd.com/en/products/software/rocm.html) Software for training and researching the frontier router models, enhancing e2e testing, and building online models playground.\n\n\u003cdiv align=\"center\"\u003e\n\u003ca href=\"https://www.amd.com\"\u003e\n  \u003cimg src=\"website/static/img/amd-logo.svg\" alt=\"AMD\" width=\"40%\"/\u003e\n\u003c/a\u003e\n\u003c/div\u003e\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvllm-project%2Fsemantic-router","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvllm-project%2Fsemantic-router","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvllm-project%2Fsemantic-router/lists"}