{"id":13439261,"url":"https://github.com/xorbitsai/inference","last_synced_at":"2026-04-25T12:04:13.058Z","repository":{"id":179394468,"uuid":"653496050","full_name":"xorbitsai/inference","owner":"xorbitsai","description":"Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.","archived":false,"fork":false,"pushed_at":"2026-04-21T21:07:27.000Z","size":73159,"stargazers_count":9251,"open_issues_count":51,"forks_count":821,"subscribers_count":61,"default_branch":"main","last_synced_at":"2026-04-21T22:03:25.554Z","etag":null,"topics":["artificial-intelligence","chatglm","deployment","flan-t5","gemma","ggml","glm4","inference","llama","llama3","llamacpp","llm","machine-learning","mistral","openai-api","pytorch","qwen","vllm","whisper","wizardlm"],"latest_commit_sha":null,"homepage":"https://inference.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xorbitsai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-06-14T07:05:04.000Z","updated_at":"2026-04-21T21:07:21.000Z","dependencies_parsed_at":"2023-12-25T05:20:13.153Z","dependency_job_id":"35373adf-f2af-4ce5-8f28-22bfb86bd8c8","html_url":"https://github.com/xorbitsai/inference","commit_stats":{"total_commits":1026,"total_committers":87,"mean_commits":"11.793103448275861","dds":0.8469785575048733,"last_synced_commit":"0ae27272d2c0bba88c1c264814afbbba14c902dc"},"previous_names":["xorbitsai/inference"],"tags_count":138,"template":false,"template_full_name":null,"purl":"pkg:github/xorbitsai/inference","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xorbitsai%2Finference","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xorbitsai%2Finference/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xorbitsai%2Finference/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xorbitsai%2Finference/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xorbitsai","download_url":"https://codeload.github.com/xorbitsai/inference/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xorbitsai%2Finference/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32261128,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-25T09:15:33.318Z","status":"ssl_error","status_checked_at":"2026-04-25T09:15:31.997Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","chatglm","deployment","flan-t5","gemma","ggml","glm4","inference","llama","llama3","llamacpp","llm","machine-learning","mistral","openai-api","pytorch","qwen","vllm","whisper","wizardlm"],"created_at":"2024-07-31T03:01:12.453Z","updated_at":"2026-04-25T12:04:13.024Z","avatar_url":"https://github.com/xorbitsai.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"./assets/xorbits-logo.png\" width=\"180px\" alt=\"xorbits\" /\u003e\n\n# Xorbits Inference: Model Serving Made Easy 🤖\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://xinference.io/en\"\u003eXinference Enterprise\u003c/a\u003e ·\n  \u003ca href=\"https://inference.readthedocs.io/en/latest/getting_started/installation.html#installation\"\u003eSelf-hosting\u003c/a\u003e ·\n  \u003ca href=\"https://inference.readthedocs.io/\"\u003eDocumentation\u003c/a\u003e\n\u003c/p\u003e\n\n[![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/)\n[![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)\n[![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main\u0026style=for-the-badge\u0026label=GITHUB%20ACTIONS\u0026logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main)\n[![Docker Pulls](https://img.shields.io/docker/pulls/xprobe/xinference?style=for-the-badge\u0026logo=docker)](https://hub.docker.com/r/xprobe/xinference)\n[![Discord](https://img.shields.io/badge/join_Discord-5462eb.svg?logo=discord\u0026style=for-the-badge\u0026logoColor=%23f5f5f5)](https://discord.gg/Xw9tszSkr5)\n[![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=x\u0026style=for-the-badge)](https://twitter.com/xorbitsio)\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"./README.md\"\u003e\u003cimg alt=\"README in English\" src=\"https://img.shields.io/badge/English-454545?style=for-the-badge\"\u003e\u003c/a\u003e\n  \u003ca href=\"./README_zh_CN.md\"\u003e\u003cimg alt=\"简体中文版自述文件\" src=\"https://img.shields.io/badge/中文介绍-d9d9d9?style=for-the-badge\"\u003e\u003c/a\u003e\n  \u003ca href=\"./README_ja_JP.md\"\u003e\u003cimg alt=\"日本語のREADME\" src=\"https://img.shields.io/badge/日本語-d9d9d9?style=for-the-badge\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003c/div\u003e\n\u003cbr /\u003e\n\n\nXorbits Inference(Xinference) is a powerful and versatile library designed to serve language, \nspeech recognition, and multimodal models. With Xorbits Inference, you can effortlessly deploy \nand serve your or state-of-the-art built-in models using just a single command. Whether you are a \nresearcher, developer, or data scientist, Xorbits Inference empowers you to unleash the full \npotential of cutting-edge AI models.\n\n\u003cdiv align=\"center\"\u003e\n\u003ci\u003e\u003ca href=\"https://discord.gg/Xw9tszSkr5\"\u003e👉 Join our Discord community!\u003c/a\u003e\u003c/i\u003e\n\u003c/div\u003e\n\n## 🔥 Hot Topics\n### Framework Enhancements\n- Agent-native Serving: Xinference integrates with [Xagent](https://github.com/xorbitsai/xagent) to enable dynamic planning, tool use, and autonomous multi-step reasoning — moving beyond static pipelines.\n- Auto batch: Multiple concurrent requests are automatically batched, significantly improving throughput: [#4197](https://github.com/xorbitsai/inference/pull/4197)\n- [Xllamacpp](https://github.com/xorbitsai/xllamacpp): New llama.cpp Python binding, maintained by Xinference team, supports continuous batching and is more production-ready.: [#2997](https://github.com/xorbitsai/inference/pull/2997)\n- Distributed inference: running models across workers: [#2877](https://github.com/xorbitsai/inference/pull/2877)\n- VLLM enhancement: Shared KV cache across multiple replicas: [#2732](https://github.com/xorbitsai/inference/pull/2732)\n### New Models\n- Built-in support for [Gemma-4](https://deepmind.google/models/gemma/gemma-4/): [#4768](https://github.com/xorbitsai/inference/pull/4768)\n- Built-in support for [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS): [#4781](https://github.com/xorbitsai/inference/pull/4781)\n- Built-in support for [Qwen-3.5](https://github.com/QwenLM/Qwen3.5): [#4639](https://github.com/xorbitsai/inference/pull/4639)\n- Built-in support for [GLM-5](https://github.com/zai-org/GLM-5): [#4638](https://github.com/xorbitsai/inference/pull/4638)\n- Built-in support for [MiniMax-M2.5](https://github.com/MiniMax-AI/MiniMax-M2.5): [#4630](https://github.com/xorbitsai/inference/pull/4630)\n- Built-in support for [Kimi-K2.5](https://github.com/MoonshotAI/Kimi-K2.5): [#4631](https://github.com/xorbitsai/inference/pull/4631)\n- Built-in support for [FLUX.2-Klein](https://bfl.ai/models/flux-2-klein): [#4596](https://github.com/xorbitsai/inference/pull/4596)\n- Built-in support for [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR): [#4581](https://github.com/xorbitsai/inference/pull/4581)\n### Integrations\n- [Xagent](https://github.com/xorbitsai/xagent): an enterprise agent platform for building and running AI agents with planning, memory, and tool use — not limited to rigid workflows.\n- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.\n- [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization.\n- [RAGFlow](https://github.com/infiniflow/ragflow): is an open-source RAG engine based on deep document understanding.\n- [MaxKB](https://github.com/1Panel-dev/MaxKB): MaxKB = Max Knowledge Brain, it is a powerful and easy-to-use AI assistant that integrates Retrieval-Augmented Generation (RAG) pipelines, supports robust workflows, and provides advanced MCP tool-use capabilities.\n\n\n## Key Features\n🌟 **Model Serving Made Easy**: Simplify the process of serving large language, speech \nrecognition, and multimodal models. You can set up and deploy your models\nfor experimentation and production with a single command.\n\n⚡️ **State-of-the-Art Models**: Experiment with cutting-edge built-in models using a single \ncommand. Inference provides access to state-of-the-art open-source models!\n\n🖥 **Heterogeneous Hardware Utilization**: Make the most of your hardware resources with\n[ggml](https://github.com/ggerganov/ggml). Xorbits Inference intelligently utilizes heterogeneous\nhardware, including GPUs and CPUs, to accelerate your model inference tasks.\n\n⚙️ **Flexible API and Interfaces**: Offer multiple interfaces for interacting\nwith your models, supporting OpenAI compatible RESTful API (including Function Calling API), RPC, CLI \nand WebUI for seamless model management and interaction.\n\n🌐 **Distributed Deployment**: Excel in distributed deployment scenarios, \nallowing the seamless distribution of model inference across multiple devices or machines.\n\n🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates\nwith popular third-party libraries including [LangChain](https://python.langchain.com/docs/integrations/providers/xinference), [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window), [Dify](https://docs.dify.ai/advanced/model-configuration/xinference), and [Chatbox](https://chatboxai.app/).\n\n## Why Xinference\n| Feature                                        | Xinference | FastChat | OpenLLM | RayLLM |\n|------------------------------------------------|------------|----------|---------|--------|\n| OpenAI-Compatible RESTful API                  | ✅ | ✅ | ✅ | ✅ |\n| vLLM Integrations                              | ✅ | ✅ | ✅ | ✅ |\n| More Inference Engines (GGML, TensorRT)        | ✅ | ❌ | ✅ | ✅ |\n| More Platforms (CPU, Metal)                    | ✅ | ✅ | ❌ | ❌ |\n| Multi-node Cluster Deployment                  | ✅ | ❌ | ❌ | ✅ |\n| Image Models (Text-to-Image)                   | ✅ | ✅ | ❌ | ❌ |\n| Text Embedding Models                          | ✅ | ❌ | ❌ | ❌ |\n| Multimodal Models                              | ✅ | ❌ | ❌ | ❌ |\n| Audio Models                                   | ✅ | ❌ | ❌ | ❌ |\n| More OpenAI Functionalities (Function Calling) | ✅ | ❌ | ❌ | ❌ |\n\n## Using Xinference\n\n- **Self-hosting Xinference Community Edition\u003c/br\u003e**\nQuickly get Xinference running in your environment with this [starter guide](#getting-started).\nUse our [documentation](https://inference.readthedocs.io/) for further references and more in-depth instructions.\n\n- **Xinference for enterprise / organizations\u003c/br\u003e**\nWe provide additional enterprise-centric features. [send us an email](mailto:business@xprobe.io?subject=[GitHub]Business%20License%20Inquiry) to discuss enterprise needs. \u003c/br\u003e\n\n## Staying Ahead\n\nStar Xinference on GitHub and be instantly notified of new releases.\n\n![star-us](assets/stay_ahead.gif)\n\n## Getting Started\n\n* [Docs](https://inference.readthedocs.io/en/latest/index.html)\n* [Built-in Models](https://inference.readthedocs.io/en/latest/models/builtin/index.html)\n* [Custom Models](https://inference.readthedocs.io/en/latest/models/custom.html)\n* [Deployment Docs](https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html)\n* [Examples and Tutorials](https://inference.readthedocs.io/en/latest/examples/index.html)\n\n### Jupyter Notebook\n\nThe lightest way to experience Xinference is to try our [Jupyter Notebook on Google Colab](https://colab.research.google.com/github/xorbitsai/inference/blob/main/examples/Xinference_Quick_Start.ipynb).\n\n### Docker \n\nNvidia GPU users can start Xinference server using [Xinference Docker Image](https://inference.readthedocs.io/en/latest/getting_started/using_docker_image.html). Prior to executing the installation command, ensure that both [Docker](https://docs.docker.com/get-docker/) and [CUDA](https://developer.nvidia.com/cuda-downloads) are set up on your system.\n\n```bash\ndocker run --name xinference -d -p 9997:9997 -e XINFERENCE_HOME=/data -v \u003c/on/your/host\u003e:/data --gpus all xprobe/xinference:latest xinference-local -H 0.0.0.0\n```\n\n### K8s via helm\n\nEnsure that you have GPU support in your Kubernetes cluster, then install as follows.\n\n```\n# add repo\nhelm repo add xinference https://xorbitsai.github.io/xinference-helm-charts\n\n# update indexes and query xinference versions\nhelm repo update xinference\nhelm search repo xinference/xinference --devel --versions\n\n# install xinference\nhelm install xinference xinference/xinference -n xinference --version 0.0.1-v\u003cxinference_release_version\u003e\n```\n\nFor more customized installation methods on K8s, please refer to the [documentation](https://inference.readthedocs.io/en/latest/getting_started/using_kubernetes.html).\n\n### Quick Start\n\nInstall Xinference by using pip as follows. (For more options, see [Installation page](https://inference.readthedocs.io/en/latest/getting_started/installation.html).)\n\n```bash\npip install \"xinference[all]\"\n```\n\nTo start a local instance of Xinference, run the following command:\n\n```bash\n$ xinference-local\n```\n\nOnce Xinference is running, there are multiple ways you can try it: via the web UI, via cURL,\n via the command line, or via the Xinference’s python client. Check out our [docs]( https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html#run-xinference-locally) for the guide.\n\n![web UI](assets/screenshot.png)\n\n## Getting involved\n\n| Platform                                                                                        | Purpose                                     |\n|-------------------------------------------------------------------------------------------------|---------------------------------------------|\n| [Github Issues](https://github.com/xorbitsai/inference/issues)                                  | Reporting bugs and filing feature requests. |\n| [Discord](https://discord.gg/Xw9tszSkr5) | Collaborating with other Xinference users.  |\n| [Twitter](https://twitter.com/xorbitsio)                                                        | Staying up-to-date on new features.         |\n\n## Citation\n\nIf this work is helpful, please kindly cite as:\n\n```bibtex\n@inproceedings{lu2024xinference,\n    title = \"Xinference: Making Large Model Serving Easy\",\n    author = \"Lu, Weizheng and Xiong, Lingfeng and Zhang, Feng and Qin, Xuye and Chen, Yueguo\",\n    booktitle = \"Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations\",\n    month = nov,\n    year = \"2024\",\n    address = \"Miami, Florida, USA\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2024.emnlp-demo.30\",\n    pages = \"291--300\",\n}\n```\n\n## Contributors\n\n\u003ca href=\"https://github.com/xorbitsai/inference/graphs/contributors\"\u003e\n  \u003cimg src=\"https://contrib.rocks/image?repo=xorbitsai/inference\" /\u003e\n\u003c/a\u003e\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=xorbitsai/inference\u0026type=Date)](https://star-history.com/#xorbitsai/inference\u0026Date)","funding_links":[],"categories":["HarmonyOS","Python","Serving","NLP","A01_文本生成_文本对话","🤖 AI \u0026 Machine Learning","Apps","推理 Inference","Summary","语言资源库","Repos","artificial-intelligence","ML / AI","📋 Contents","Inference","Open-Source Local LLM Projects"],"sub_categories":["Windows Manager","Frameworks/Servers for Serving","大语言对话模型及数据","AI","python","⚡ 3. Inference Engines \u0026 Serving","Inference Engine"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxorbitsai%2Finference","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxorbitsai%2Finference","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxorbitsai%2Finference/lists"}