https://github.com/ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
https://github.com/ModelTC/lightllm
deep-learning gpt llama llm model-serving nlp openai-triton
Last synced: 26 days ago
JSON representation
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
- Host: GitHub
- URL: https://github.com/ModelTC/lightllm
- Owner: ModelTC
- License: apache-2.0
- Created: 2023-07-22T08:11:15.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-29T04:42:19.000Z (6 months ago)
- Last Synced: 2024-10-29T10:03:22.518Z (6 months ago)
- Topics: deep-learning, gpt, llama, llm, model-serving, nlp, openai-triton
- Language: Python
- Homepage:
- Size: 2.39 MB
- Stars: 2,553
- Watchers: 23
- Forks: 200
- Open Issues: 67
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - ModelTC/lightllm
- Awesome-LLM-Inference - **LightLLM**
- awesome-production-machine-learning - LightLLM - LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance. (Deployment and Serving)
- awesome-ai-repositories - lightllm
- awesome-ai-repositories - lightllm
- awesome-llm-inference - **LightLLM**
README
![]()
---
[](https://lightllm-en.readthedocs.io/en/latest/)
[](https://github.com/ModelTC/lightllm/actions/workflows/docker-publish.yml)
[](https://github.com/ModelTC/lightllm)

[](https://discord.gg/WzzfwVSguU)
[](https://github.com/ModelTC/lightllm/blob/main/LICENSE)LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance. LightLLM harnesses the strengths of numerous well-regarded open-source implementations, including but not limited to FasterTransformer, TGI, vLLM, and FlashAttention.
[English Docs](https://lightllm-en.readthedocs.io/en/latest/) | [δΈζζζ‘£](https://lightllm-cn.readthedocs.io/en/latest/) | [Blogs](https://modeltc.github.io/lightllm-blog/)
## News
- [2025/02] π₯ LightLLM v1.0.0 release, achieving the **fastest DeepSeek-R1** serving performance on single H200 machine.## Get started
- [Install LightLLM](https://lightllm-en.readthedocs.io/en/latest/getting_started/installation.html)
- [Quick Start](https://lightllm-en.readthedocs.io/en/latest/getting_started/quickstart.html)
- [LLM Service](https://lightllm-en.readthedocs.io/en/latest/models/test.html#llama)
- [VLM Service](https://lightllm-en.readthedocs.io/en/latest/models/test.html#llava)## Performance
Learn more in the release blogs: [v1.0.0 blog](https://www.light-ai.top/lightllm-blog//by%20mtc%20team/2025/02/16/lightllm/).
## FAQ
Please refer to the [FAQ](https://lightllm-en.readthedocs.io/en/latest/faq.html) for more information.
## Projects using lightllm
We welcome any coopoeration and contribution. If there is a project requires lightllm's support, please contact us via email or create a pull request.
1. LazyLLM: Easyest and lazyest way for building multi-agent LLMs applications.
Once you have installed `lightllm` and `lazyllm`, and then you can use the following code to build your own chatbot:
~~~python
from lazyllm import TrainableModule, deploy, WebModule
# Model will be download automatically if you have an internet connection
m = TrainableModule('internlm2-chat-7b').deploy_method(deploy.lightllm)
WebModule(m).start().wait()
~~~Documents: https://lazyllm.readthedocs.io/
## Star History
[](https://star-history.com/#ModelTC/lightllm&Timeline)
## Community
For further information and discussion, [join our discord server](https://discord.gg/WzzfwVSguU). Welcome to be a member and look forward to your contribution!
## License
This repository is released under the [Apache-2.0](LICENSE) license.
## Acknowledgement
We learned a lot from the following projects when developing LightLLM.
- [Faster Transformer](https://github.com/NVIDIA/FasterTransformer)
- [Text Generation Inference](https://github.com/huggingface/text-generation-inference)
- [vLLM](https://github.com/vllm-project/vllm)
- [Flash Attention 1&2](https://github.com/Dao-AILab/flash-attention)
- [OpenAI Triton](https://github.com/openai/triton)