https://github.com/lightning-ai/litserve
Deploy AI models at scale. High-throughput serving engine for AI/ML models that uses the latest state-of-the-art model deployment techniques.
https://github.com/lightning-ai/litserve
ai api serving
Last synced: 3 months ago
JSON representation
Deploy AI models at scale. High-throughput serving engine for AI/ML models that uses the latest state-of-the-art model deployment techniques.
- Host: GitHub
- URL: https://github.com/lightning-ai/litserve
- Owner: Lightning-AI
- License: apache-2.0
- Created: 2023-12-12T14:45:03.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-27T10:07:44.000Z (about 1 year ago)
- Last Synced: 2024-06-27T10:27:29.962Z (about 1 year ago)
- Topics: ai, api, serving
- Language: Python
- Homepage: https://lightning.ai/docs/litserve
- Size: 348 KB
- Stars: 123
- Watchers: 15
- Forks: 12
- Open Issues: 15
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# Easily serve AI models Lightning fast ⚡
Lightning-fast serving engine for AI models.
Easy. Flexible. Enterprise-scale.----
**LitServe** is an easy-to-use, flexible serving engine for AI models built on FastAPI. It augments FastAPI with features like batching, streaming, and GPU autoscaling eliminate the need to rebuild a FastAPI server per model.
LitServe is at least [2x faster](#performance) than plain FastAPI due to AI-specific multi-worker handling.
✅ (2x)+ faster serving ✅ Easy to use ✅ LLMs, non LLMs and more
✅ Bring your own model ✅ PyTorch/JAX/TF/... ✅ Built on FastAPI
✅ GPU autoscaling ✅ Batching, Streaming ✅ Self-host or ⚡️ managed
✅ Compound AI ✅ Integrate with vLLM, etc ✅ Serverless
[](https://discord.gg/WajDThKAur)

[](https://codecov.io/gh/Lightning-AI/litserve)
[](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)
# Quick start
Install LitServe via pip ([more options](https://lightning.ai/docs/litserve/home/install)):
```bash
pip install litserve
```
### Define a server
This toy example with 2 models (AI compound system) shows LitServe's flexibility ([see real examples](#examples)):```python
# server.py
import litserve as ls# (STEP 1) - DEFINE THE API (compound AI system)
class SimpleLitAPI(ls.LitAPI):
def setup(self, device):
# setup is called once at startup. Build a compound AI system (1+ models), connect DBs, load data, etc...
self.model1 = lambda x: x**2
self.model2 = lambda x: x**3def decode_request(self, request):
# Convert the request payload to model input.
return request["input"]def predict(self, x):
# Easily build compound systems. Run inference and return the output.
squared = self.model1(x)
cubed = self.model2(x)
output = squared + cubed
return {"output": output}def encode_response(self, output):
# Convert the model output to a response payload.
return {"output": output}# (STEP 2) - START THE SERVER
if __name__ == "__main__":
# scale with advanced features (batching, GPUs, etc...)
server = ls.LitServer(SimpleLitAPI(), accelerator="auto", max_batch_size=1)
server.run(port=8000)
```Now run the server anywhere (local or cloud) via the command-line.
```bash
# Deploy to the cloud of your choice via Lightning AI (serverless, autoscaling, etc.)
lightning serve server.py# Or run locally (self host anywhere)
lightning serve server.py --local
```
Learn more about managed hosting on [Lightning AI](#hosting-options).You can also run the server manually:
```bash
python server.py
```### Test the server
Run the auto-generated test client:
```bash
python client.py
```Or use this terminal command:
```bash
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"input": 4.0}'
```### LLM serving
LitServe isn’t *just* for LLMs like vLLM or Ollama; it serves any AI model with full control over internals ([learn more](https://lightning.ai/docs/litserve/features/serve-llms)).
For easy LLM serving, integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), or use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm) (built on LitServe).```
litgpt serve microsoft/phi-2
```### Summary
- LitAPI lets you easily build complex AI systems with one or more models ([docs](https://lightning.ai/docs/litserve/api-reference/litapi)).
- Use the setup method for one-time tasks like connecting models, DBs, and loading data ([docs](https://lightning.ai/docs/litserve/api-reference/litapi#setup)).
- LitServer handles optimizations like batching, GPU autoscaling, streaming, etc... ([docs](https://lightning.ai/docs/litserve/api-reference/litserver)).
- Self host on your machines or create a fully managed deployment with Lightning ([learn more](https://lightning.ai/docs/litserve/features/deploy-on-cloud)).[Learn how to make this server 200x faster](https://lightning.ai/docs/litserve/home/speed-up-serving-by-200x).
# Featured examples
Use LitServe to deploy any model or AI service: (Compound AI, Gen AI, classic ML, embeddings, LLMs, vision, audio, etc...)
Toy model: Hello world
LLMs: Llama 3.2, LLM Proxy server, Agent with tool use
RAG: vLLM RAG (Llama 3.2), RAG API (LlamaIndex)
NLP: Hugging face, BERT, Text embedding API
Multimodal: OpenAI Clip, MiniCPM, Phi-3.5 Vision Instruct, Qwen2-VL, Pixtral
Audio: Whisper, AudioCraft, StableAudio, Noise cancellation (DeepFilterNet)
Vision: Stable diffusion 2, AuraFlow, Flux, Image Super Resolution (Aura SR),
Background Removal, Control Stable Diffusion (ControlNet)
Speech: Text-speech (XTTS V2), Parler-TTS
Classical ML: Random forest, XGBoost
Miscellaneous: Media conversion API (ffmpeg), PyTorch + TensorFlow in one API, LLM proxy server[Browse 100+ community-built templates](https://lightning.ai/studios?section=serving)
# Hosting options
Self host LitServe anywhere or deploy to your favorite cloud via [Lightning AI](http://lightning.ai/deploy).https://github.com/user-attachments/assets/ff83dab9-0c9f-4453-8dcb-fb9526726344
Self-hosting is ideal for hackers, students, and DIY developers while fully managed hosting is ideal for enterprise developers needing easy autoscaling, security, release management, and 99.995% uptime and observability.
To host on [Lightning AI](https://lightning.ai/deploy), simply run the command, login and choose the cloud of your choice.
```bash
lightning serve server.py
```
| [Feature](https://lightning.ai/docs/litserve/features) | Self Managed | [Fully Managed on Lightning](https://lightning.ai/deploy) |
|----------------------------------------------------------------------|-----------------------------------|------------------------------------|
| Docker-first deployment | ✅ DIY | ✅ One-click deploy |
| Cost | ✅ Free (DIY) | ✅ Generous [free tier](https://lightning.ai/pricing) with pay as you go |
| Full control | ✅ | ✅ |
| Use any engine (vLLM, etc.) | ✅ | ✅ vLLM, Ollama, LitServe, etc. |
| Own VPC | ✅ (manual setup) | ✅ Connect your own VPC |
| [(2x)+ faster than plain FastAPI](#performance) | ✅ | ✅ |
| [Bring your own model](https://lightning.ai/docs/litserve/features/full-control) | ✅ | ✅ |
| [Build compound systems (1+ models)](https://lightning.ai/docs/litserve/home) | ✅ | ✅ |
| [GPU autoscaling](https://lightning.ai/docs/litserve/features/gpu-inference) | ✅ | ✅ |
| [Batching](https://lightning.ai/docs/litserve/features/batching) | ✅ | ✅ |
| [Streaming](https://lightning.ai/docs/litserve/features/streaming) | ✅ | ✅ |
| [Worker autoscaling](https://lightning.ai/docs/litserve/features/autoscaling) | ✅ | ✅ |
| [Serve all models: (LLMs, vision, etc.)](https://lightning.ai/docs/litserve/examples) | ✅ | ✅ |
| [Supports PyTorch, JAX, TF, etc...](https://lightning.ai/docs/litserve/features/full-control) | ✅ | ✅ |
| [OpenAPI compliant](https://www.openapis.org/) | ✅ | ✅ |
| [Open AI compatibility](https://lightning.ai/docs/litserve/features/open-ai-spec) | ✅ | ✅ |
| [Authentication](https://lightning.ai/docs/litserve/features/authentication) | ❌ DIY | ✅ Token, password, custom |
| GPUs | ❌ DIY | ✅ 8+ GPU types, H100s from $1.75 |
| Load balancing | ❌ | ✅ Built-in |
| Scale to zero (serverless) | ❌ | ✅ No machine runs when idle |
| Autoscale up on demand | ❌ | ✅ Auto scale up/down |
| Multi-node inference | ❌ | ✅ Distribute across nodes |
| Use AWS/GCP credits | ❌ | ✅ Use existing cloud commits |
| Versioning | ❌ | ✅ Make and roll back releases |
| Enterprise-grade uptime (99.95%) | ❌ | ✅ SLA-backed |
| SOC2 / HIPAA compliance | ❌ | ✅ Certified & secure |
| Observability | ❌ | ✅ Built-in, connect 3rd party tools|
| CI/CD ready | ❌ | ✅ Lightning SDK |
| 24/7 enterprise support | ❌ | ✅ Dedicated support |
| Cost controls & audit logs | ❌ | ✅ Budgets, breakdowns, logs |
| Debug on GPUs | ❌ | ✅ Studio integration |
| [20+ features](https://lightning.ai/docs/litserve/features) | - | - |
# Performance
LitServe is designed for AI workloads. Specialized multi-worker handling delivers a minimum **2x speedup over FastAPI**.Additional features like batching and GPU autoscaling can drive performance well beyond 2x, scaling efficiently to handle more simultaneous requests than FastAPI and TorchServe.
Reproduce the full benchmarks [here](https://lightning.ai/docs/litserve/home/benchmarks) (higher is better).
![]()
These results are for image and text classification ML tasks. The performance relationships hold for other ML tasks (embedding, LLM serving, audio, segmentation, object detection, summarization etc...).
***💡 Note on LLM serving:*** For high-performance LLM serving (like Ollama/vLLM), integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm), or build your custom vLLM-like server with LitServe. Optimizations like kv-caching, which can be done with LitServe, are needed to maximize LLM performance.
# Community
LitServe is a [community project accepting contributions](https://lightning.ai/docs/litserve/community) - Let's make the world's most advanced AI inference engine.💬 [Get help on Discord](https://discord.com/invite/XncpTy7DSt)
📋 [License: Apache 2.0](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)