Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lightning-ai/litserve
Deploy AI models at scale. High-throughput serving engine for AI/ML models that uses the latest state-of-the-art model deployment techniques.
https://github.com/lightning-ai/litserve
ai api serving
Last synced: 3 days ago
JSON representation
Deploy AI models at scale. High-throughput serving engine for AI/ML models that uses the latest state-of-the-art model deployment techniques.
- Host: GitHub
- URL: https://github.com/lightning-ai/litserve
- Owner: Lightning-AI
- License: apache-2.0
- Created: 2023-12-12T14:45:03.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-27T10:07:44.000Z (7 months ago)
- Last Synced: 2024-06-27T10:27:29.962Z (7 months ago)
- Topics: ai, api, serving
- Language: Python
- Homepage: https://lightning.ai/docs/litserve
- Size: 348 KB
- Stars: 123
- Watchers: 15
- Forks: 12
- Open Issues: 15
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# Easily serve AI models Lightning fast ⚡
Lightning-fast serving engine for AI models.
Easy. Flexible. Enterprise-scale.----
**LitServe** is an easy-to-use, flexible serving engine for AI models built on FastAPI. It augments FastAPI with features like batching, streaming, and GPU autoscaling eliminate the need to rebuild a FastAPI server per model.
LitServe is at least [2x faster](#performance) than plain FastAPI due to AI-specific multi-worker handling.
✅ (2x)+ faster serving ✅ Easy to use ✅ LLMs, non LLMs and more
✅ Bring your own model ✅ PyTorch/JAX/TF/... ✅ Built on FastAPI
✅ GPU autoscaling ✅ Batching, Streaming ✅ Self-host or ⚡️ managed
✅ Compound AI ✅ Integrate with vLLM and more[![Discord](https://img.shields.io/discord/1077906959069626439?label=Get%20help%20on%20Discord)](https://discord.gg/WajDThKAur)
![cpu-tests](https://github.com/Lightning-AI/litserve/actions/workflows/ci-testing.yml/badge.svg)
[![codecov](https://codecov.io/gh/Lightning-AI/litserve/graph/badge.svg?token=SmzX8mnKlA)](https://codecov.io/gh/Lightning-AI/litserve)
[![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)
# Quick start
Install LitServe via pip ([more options](https://lightning.ai/docs/litserve/home/install)):
```bash
pip install litserve
```
### Define a server
This toy example with 2 models (AI compound system) shows LitServe's flexibility ([see real examples](#examples)):```python
# server.py
import litserve as ls# (STEP 1) - DEFINE THE API (compound AI system)
class SimpleLitAPI(ls.LitAPI):
def setup(self, device):
# setup is called once at startup. Build a compound AI system (1+ models), connect DBs, load data, etc...
self.model1 = lambda x: x**2
self.model2 = lambda x: x**3def decode_request(self, request):
# Convert the request payload to model input.
return request["input"]def predict(self, x):
# Easily build compound systems. Run inference and return the output.
squared = self.model1(x)
cubed = self.model2(x)
output = squared + cubed
return {"output": output}def encode_response(self, output):
# Convert the model output to a response payload.
return {"output": output}# (STEP 2) - START THE SERVER
if __name__ == "__main__":
# scale with advanced features (batching, GPUs, etc...)
server = ls.LitServer(SimpleLitAPI(), accelerator="auto", max_batch_size=1)
server.run(port=8000)
```Now run the server via the command-line
```bash
python server.py
```
### Test the server
Run the auto-generated test client:
```bash
python client.py
```Or use this terminal command:
```bash
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"input": 4.0}'
```### LLM serving
LitServe isn’t *just* for LLMs like vLLM or Ollama; it serves any AI model with full control over internals ([learn more](https://lightning.ai/docs/litserve/features/serve-llms)).
For easy LLM serving, integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), or use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm) (built on LitServe).```
litgpt serve microsoft/phi-2
```### Summary
- LitAPI lets you easily build complex AI systems with one or more models ([docs](https://lightning.ai/docs/litserve/api-reference/litapi)).
- Use the setup method for one-time tasks like connecting models, DBs, and loading data ([docs](https://lightning.ai/docs/litserve/api-reference/litapi#setup)).
- LitServer handles optimizations like batching, GPU autoscaling, streaming, etc... ([docs](https://lightning.ai/docs/litserve/api-reference/litserver)).
- Self host on your own machines or use Lightning Studios for a fully managed deployment ([learn more](#hosting-options)).[Learn how to make this server 200x faster](https://lightning.ai/docs/litserve/home/speed-up-serving-by-200x).
# Featured examples
Use LitServe to deploy any model or AI service: (Compound AI, Gen AI, classic ML, embeddings, LLMs, vision, audio, etc...)
## Examples
Toy model: Hello world
LLMs: Llama 3.2, LLM Proxy server, Agent with tool use
RAG: vLLM RAG (Llama 3.2), RAG API (LlamaIndex)
NLP: Hugging face, BERT, Text embedding API
Multimodal: OpenAI Clip, MiniCPM, Phi-3.5 Vision Instruct, Qwen2-VL, Pixtral
Audio: Whisper, AudioCraft, StableAudio, Noise cancellation (DeepFilterNet)
Vision: Stable diffusion 2, AuraFlow, Flux, Image Super Resolution (Aura SR),
Background Removal, Control Stable Diffusion (ControlNet)
Speech: Text-speech (XTTS V2), Parler-TTS
Classical ML: Random forest, XGBoost
Miscellaneous: Media conversion API (ffmpeg), PyTorch + TensorFlow in one API, LLM proxy server[Browse 100+ community-built templates](https://lightning.ai/studios?section=serving)
# Features
State-of-the-art features:✅ [(2x)+ faster than plain FastAPI](#performance)
✅ [Bring your own model](https://lightning.ai/docs/litserve/features/full-control)
✅ [Build compound systems (1+ models)](https://lightning.ai/docs/litserve/home)
✅ [GPU autoscaling](https://lightning.ai/docs/litserve/features/gpu-inference)
✅ [Batching](https://lightning.ai/docs/litserve/features/batching)
✅ [Streaming](https://lightning.ai/docs/litserve/features/streaming)
✅ [Worker autoscaling](https://lightning.ai/docs/litserve/features/autoscaling)
✅ [Self-host on your machines](https://lightning.ai/docs/litserve/features/hosting-methods#host-on-your-own)
✅ [Host fully managed on Lightning AI](https://lightning.ai/docs/litserve/features/hosting-methods#host-on-lightning-studios)
✅ [Serve all models: (LLMs, vision, etc.)](https://lightning.ai/docs/litserve/examples)
✅ [Scale to zero (serverless)](https://lightning.ai/docs/litserve/features/streaming)
✅ [Supports PyTorch, JAX, TF, etc...](https://lightning.ai/docs/litserve/features/full-control)
✅ [OpenAPI compliant](https://www.openapis.org/)
✅ [Open AI compatibility](https://lightning.ai/docs/litserve/features/open-ai-spec)
✅ [Authentication](https://lightning.ai/docs/litserve/features/authentication)
✅ [Dockerization](https://lightning.ai/docs/litserve/features/dockerization-deployment)[10+ features...](https://lightning.ai/docs/litserve/features)
**Note:** We prioritize scalable, enterprise-level features over hype.
# Performance
LitServe is designed for AI workloads. Specialized multi-worker handling delivers a minimum **2x speedup over FastAPI**.Additional features like batching and GPU autoscaling can drive performance well beyond 2x, scaling efficiently to handle more simultaneous requests than FastAPI and TorchServe.
Reproduce the full benchmarks [here](https://lightning.ai/docs/litserve/home/benchmarks) (higher is better).
These results are for image and text classification ML tasks. The performance relationships hold for other ML tasks (embedding, LLM serving, audio, segmentation, object detection, summarization etc...).
***💡 Note on LLM serving:*** For high-performance LLM serving (like Ollama/vLLM), integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm), or build your custom vLLM-like server with LitServe. Optimizations like kv-caching, which can be done with LitServe, are needed to maximize LLM performance.
# Hosting options
LitServe can be hosted independently on your own machines or fully managed via Lightning Studios.Self-hosting is ideal for hackers, students, and DIY developers, while fully managed hosting is ideal for enterprise developers needing easy autoscaling, security, release management, and 99.995% uptime and observability.
| Feature | Self Managed | Fully Managed on Studios |
|----------------------------------|-----------------------------------|-------------------------------------|
| Deployment | ✅ Do it yourself deployment | ✅ One-button cloud deploy |
| Load balancing | ❌ | ✅ |
| Autoscaling | ❌ | ✅ |
| Scale to zero | ❌ | ✅ |
| Multi-machine inference | ❌ | ✅ |
| Authentication | ❌ | ✅ |
| Own VPC | ❌ | ✅ |
| AWS, GCP | ❌ | ✅ |
| Use your own cloud commits | ❌ | ✅ |
# Community
LitServe is a [community project accepting contributions](https://lightning.ai/docs/litserve/community) - Let's make the world's most advanced AI inference engine.💬 [Get help on Discord](https://discord.com/invite/XncpTy7DSt)
📋 [License: Apache 2.0](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)