https://github.com/alibaba/rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
https://github.com/alibaba/rtp-llm
gpt inference llama llm llm-serving llmops model-serving
Last synced: 8 months ago
JSON representation
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
- Host: GitHub
- URL: https://github.com/alibaba/rtp-llm
- Owner: alibaba
- License: apache-2.0
- Created: 2023-12-27T08:22:59.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-10-06T01:27:51.000Z (8 months ago)
- Last Synced: 2025-10-07T11:19:35.494Z (8 months ago)
- Topics: gpt, inference, llama, llm, llm-serving, llmops, model-serving
- Language: C++
- Homepage:
- Size: 431 MB
- Stars: 874
- Watchers: 16
- Forks: 86
- Open Issues: 45
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Support: docs/supported_models/embedding_models.md
- Notice: NOTICE
Awesome Lists containing this project
- awesome-opensource-ai - RTP-LLM (Alibaba) - Alibaba's high-performance LLM inference acceleration engine. Powers production LLM services across Taobao, Tmall, and Alibaba's international AI platform. Supports PagedAttention, FlashAttention, FlashDecoding, INT8/INT4 quantization, and heterogeneous hardware (GPU/ARM CPU/Intel). Apache 2.0 licensed.  (3. Inference Engines & Serving)