https://github.com/lightning-ai/litserve

Deploy AI models at scale. High-throughput serving engine for AI/ML models that uses the latest state-of-the-art model deployment techniques.
https://github.com/lightning-ai/litserve
ai api serving
Last synced: 3 months ago
JSON representation
Deploy AI models at scale. High-throughput serving engine for AI/ML models that uses the latest state-of-the-art model deployment techniques.
Host: GitHub
URL: https://github.com/lightning-ai/litserve
Owner: Lightning-AI
License: apache-2.0
Created: 2023-12-12T14:45:03.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-06-27T10:07:44.000Z (about 1 year ago)
Last Synced: 2024-06-27T10:27:29.962Z (about 1 year ago)
Topics: ai, api, serving
Language: Python
Homepage: https://lightning.ai/docs/litserve
Size: 348 KB
Stars: 123
Watchers: 15
Forks: 12
Open Issues: 15
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project

README

        


# Easily serve AI models Lightning fast ⚡    



 

Lightning-fast serving engine for AI models.    

Easy. Flexible. Enterprise-scale.    



----

**LitServe** is an easy-to-use, flexible serving engine for AI models built on FastAPI. It augments FastAPI with features like batching, streaming, and GPU autoscaling eliminate the need to rebuild a FastAPI server per model.  

LitServe is at least [2x faster](#performance) than plain FastAPI due to AI-specific multi-worker handling.    



  


✅ (2x)+ faster serving  ✅ Easy to use               ✅ LLMs, non LLMs and more

✅ Bring your own model  ✅ PyTorch/JAX/TF/...        ✅ Built on FastAPI       

✅ GPU autoscaling       ✅ Batching, Streaming       ✅ Self-host or ⚡️ managed

✅ Compound AI           ✅ Integrate with vLLM, etc  ✅ Serverless             

   





[![Discord](https://img.shields.io/discord/1077906959069626439?label=Get%20help%20on%20Discord)](https://discord.gg/WajDThKAur)

![cpu-tests](https://github.com/Lightning-AI/litserve/actions/workflows/ci-testing.yml/badge.svg)

[![codecov](https://codecov.io/gh/Lightning-AI/litserve/graph/badge.svg?token=SmzX8mnKlA)](https://codecov.io/gh/Lightning-AI/litserve)

[![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)







  

    Quick start •

    Examples •

    Features •

    Performance •

    Hosting •

    Docs

  



 





  





  

# Quick start

Install LitServe via pip ([more options](https://lightning.ai/docs/litserve/home/install)):

```bash

pip install litserve

```

    

### Define a server    

This toy example with 2 models (AI compound system) shows LitServe's flexibility ([see real examples](#examples)):    

```python

# server.py

import litserve as ls

# (STEP 1) - DEFINE THE API (compound AI system)

class SimpleLitAPI(ls.LitAPI):

    def setup(self, device):

        # setup is called once at startup. Build a compound AI system (1+ models), connect DBs, load data, etc...

        self.model1 = lambda x: x**2

        self.model2 = lambda x: x**3

    def decode_request(self, request):

        # Convert the request payload to model input.

        return request["input"] 

    def predict(self, x):

        # Easily build compound systems. Run inference and return the output.

        squared = self.model1(x)

        cubed = self.model2(x)

        output = squared + cubed

        return {"output": output}

    def encode_response(self, output):

        # Convert the model output to a response payload.

        return {"output": output} 

# (STEP 2) - START THE SERVER

if __name__ == "__main__":

    # scale with advanced features (batching, GPUs, etc...)

    server = ls.LitServer(SimpleLitAPI(), accelerator="auto", max_batch_size=1)

    server.run(port=8000)

```

Now run the server anywhere (local or cloud) via the command-line.

```bash

# Deploy to the cloud of your choice via Lightning AI (serverless, autoscaling, etc.)

lightning serve server.py

# Or run locally (self host anywhere)

lightning serve server.py --local

```

Learn more about managed hosting on [Lightning AI](#hosting-options).

You can also run the server manually:

```bash 

python server.py

```

### Test the server

Run the auto-generated test client:        

```bash

python client.py    

```

Or use this terminal command:

```bash

curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"input": 4.0}'

```

### LLM serving

LitServe isn’t *just* for LLMs like vLLM or Ollama; it serves any AI model with full control over internals ([learn more](https://lightning.ai/docs/litserve/features/serve-llms)).    

For easy LLM serving, integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), or use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm) (built on LitServe). 

```

litgpt serve microsoft/phi-2

```

### Summary

- LitAPI lets you easily build complex AI systems with one or more models ([docs](https://lightning.ai/docs/litserve/api-reference/litapi)).

- Use the setup method for one-time tasks like connecting models, DBs, and loading data ([docs](https://lightning.ai/docs/litserve/api-reference/litapi#setup)).        

- LitServer handles optimizations like batching, GPU autoscaling, streaming, etc... ([docs](https://lightning.ai/docs/litserve/api-reference/litserver)).

- Self host on your machines or create a fully managed deployment with Lightning ([learn more](https://lightning.ai/docs/litserve/features/deploy-on-cloud)).

[Learn how to make this server 200x faster](https://lightning.ai/docs/litserve/home/speed-up-serving-by-200x).    

 

# Featured examples    

Use LitServe to deploy any model or AI service: (Compound AI, Gen AI, classic ML, embeddings, LLMs, vision, audio, etc...)       

  


Toy model:      Hello world

LLMs:           Llama 3.2, LLM Proxy server, Agent with tool use

RAG:            vLLM RAG (Llama 3.2), RAG API (LlamaIndex)

NLP:            Hugging face, BERT, Text embedding API

Multimodal:     OpenAI Clip, MiniCPM, Phi-3.5 Vision Instruct, Qwen2-VL, Pixtral

Audio:          Whisper, AudioCraft, StableAudio, Noise cancellation (DeepFilterNet)

Vision:         Stable diffusion 2, AuraFlow, Flux, Image Super Resolution (Aura SR),

                Background Removal, Control Stable Diffusion (ControlNet)

Speech:         Text-speech (XTTS V2), Parler-TTS

Classical ML:   Random forest, XGBoost

Miscellaneous:  Media conversion API (ffmpeg), PyTorch + TensorFlow in one API, LLM proxy server



[Browse 100+ community-built templates](https://lightning.ai/studios?section=serving)

 

# Hosting options   

Self host LitServe anywhere or deploy to your favorite cloud via [Lightning AI](http://lightning.ai/deploy).

https://github.com/user-attachments/assets/ff83dab9-0c9f-4453-8dcb-fb9526726344

Self-hosting is ideal for hackers, students, and DIY developers while fully managed hosting is ideal for enterprise developers needing easy autoscaling, security, release management, and 99.995% uptime and observability.

To host on [Lightning AI](https://lightning.ai/deploy), simply run the command, login and choose the cloud of your choice.

```bash

lightning serve server.py

```

 



| [Feature](https://lightning.ai/docs/litserve/features)               | Self Managed                      | [Fully Managed on Lightning](https://lightning.ai/deploy)         |

|----------------------------------------------------------------------|-----------------------------------|------------------------------------|

| Docker-first deployment          | ✅ DIY                             | ✅ One-click deploy                |

| Cost                             | ✅ Free (DIY)                      | ✅ Generous [free tier](https://lightning.ai/pricing) with pay as you go                |

| Full control                     | ✅                                 | ✅                                 |

| Use any engine (vLLM, etc.)      | ✅                                 | ✅ vLLM, Ollama, LitServe, etc.    |

| Own VPC                          | ✅ (manual setup)                  | ✅ Connect your own VPC            |

| [(2x)+ faster than plain FastAPI](#performance)                                               | ✅       | ✅                                 |

| [Bring your own model](https://lightning.ai/docs/litserve/features/full-control)              | ✅       | ✅                                 |

| [Build compound systems (1+ models)](https://lightning.ai/docs/litserve/home)                 | ✅       | ✅                                 |

| [GPU autoscaling](https://lightning.ai/docs/litserve/features/gpu-inference)                  | ✅       | ✅                                 |

| [Batching](https://lightning.ai/docs/litserve/features/batching)                              | ✅       | ✅                                 |

| [Streaming](https://lightning.ai/docs/litserve/features/streaming)                            | ✅       | ✅                                 |

| [Worker autoscaling](https://lightning.ai/docs/litserve/features/autoscaling)                 | ✅       | ✅                                 |

| [Serve all models: (LLMs, vision, etc.)](https://lightning.ai/docs/litserve/examples)         | ✅       | ✅                                 |

| [Supports PyTorch, JAX, TF, etc...](https://lightning.ai/docs/litserve/features/full-control) | ✅       | ✅                                 |

| [OpenAPI compliant](https://www.openapis.org/)                                                | ✅       | ✅                                 |

| [Open AI compatibility](https://lightning.ai/docs/litserve/features/open-ai-spec)             | ✅       | ✅                                 |

| [Authentication](https://lightning.ai/docs/litserve/features/authentication)                  | ❌ DIY   | ✅ Token, password, custom         |

| GPUs                             | ❌ DIY                             | ✅ 8+ GPU types, H100s from $1.75  |

| Load balancing                   | ❌                                 | ✅ Built-in                        |

| Scale to zero (serverless)       | ❌                                 | ✅ No machine runs when idle       |

| Autoscale up on demand           | ❌                                 | ✅ Auto scale up/down              |

| Multi-node inference             | ❌                                 | ✅ Distribute across nodes         |

| Use AWS/GCP credits              | ❌                                 | ✅ Use existing cloud commits      |

| Versioning                       | ❌                                 | ✅ Make and roll back releases     |

| Enterprise-grade uptime (99.95%) | ❌                                 | ✅ SLA-backed                      |

| SOC2 / HIPAA compliance          | ❌                                 | ✅ Certified & secure              |

| Observability                    | ❌                                 | ✅ Built-in, connect 3rd party tools|

| CI/CD ready                      | ❌                                 | ✅ Lightning SDK                   |

| 24/7 enterprise support          | ❌                                 | ✅ Dedicated support               |

| Cost controls & audit logs       | ❌                                 | ✅ Budgets, breakdowns, logs       |

| Debug on GPUs                    | ❌                                 | ✅ Studio integration              |

| [20+ features](https://lightning.ai/docs/litserve/features)                    | -                                 | -                                  |



 

# Performance  

LitServe is designed for AI workloads. Specialized multi-worker handling delivers a minimum **2x speedup over FastAPI**.    

Additional features like batching and GPU autoscaling can drive performance well beyond 2x, scaling efficiently to handle more simultaneous requests than FastAPI and TorchServe.

    

Reproduce the full benchmarks [here](https://lightning.ai/docs/litserve/home/benchmarks) (higher is better).  



  

 

These results are for image and text classification ML tasks. The performance relationships hold for other ML tasks (embedding, LLM serving, audio, segmentation, object detection, summarization etc...).   

    

***💡 Note on LLM serving:*** For high-performance LLM serving (like Ollama/vLLM), integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm), or build your custom vLLM-like server with LitServe. Optimizations like kv-caching, which can be done with LitServe, are needed to maximize LLM performance.

 

# Community

LitServe is a [community project accepting contributions](https://lightning.ai/docs/litserve/community) - Let's make the world's most advanced AI inference engine.

💬 [Get help on Discord](https://discord.com/invite/XncpTy7DSt)    

📋 [License: Apache 2.0](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lightning-ai/litserve

Awesome Lists containing this project

README