https://github.com/ajndkr/litserve-init

demo repository to use litserve
https://github.com/ajndkr/litserve-init

fastapi litserve

Last synced: 5 months ago
JSON representation

demo repository to use litserve

Host: GitHub
URL: https://github.com/ajndkr/litserve-init
Owner: ajndkr
License: mit
Created: 2024-08-18T02:26:10.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-08-18T16:51:42.000Z (11 months ago)
Last Synced: 2025-02-01T15:44:02.863Z (5 months ago)
Topics: fastapi, litserve
Language: Python
Homepage:
Size: 678 KB
Stars: 4
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# litserve-init

demo repository to use [`litserve`](https://lightning.ai/docs/litserve).

## setup

1. create virtual environment of your choice. I chose miniconda:

```shell
conda create -n litserve-init python=3.10
```

2. install dependencies:

```shell
pip install -r requirements.txt
```

> Disclaimer: `transformers[torch]` will install the latest `torch` version.
> If not compatible with your GPU, install the correct version of `torch` and
> `torchvision` separately.

## benchmark

below are the results of a simple benchmark comparison between FastAPI and
LitServe.

> **system requirements**: i used a 32-core linux machine with 2 NVIDIA GeForce
> RTX 3080 Ti GPUs.

each experiment sends `n` concurrent requests containing a single image for
classification. The image can be found [here](./cats-image.png).

in case of the LitServe application, I set the `max_batch_size` to 32 and
`batch_timeout` to 0.05. This means that LitServe will batch incoming requests
with a maximum batch size of 8 within 50 milliseconds for inference. The FastAPI
application will process all incoming requests separately.

### total runtime (per experiment)

| concurrent requests | FastAPI | LitServe |
| ------------------- | ------- | -------- |
| 10 | 0.163 s | 0.107 s |
| 50 | 0.720 s | 0.342 s |
| 100 | 1.409 s | 0.638 s |
| 200 | 2.952 s | 1.332 s |

### average response time (per request)

| concurrent requests | FastAPI | LitServe |
| ------------------- | ---------------------- | -------------------- |
| 10 | 81.604 ± 71.672 ms | 72.229 ± 12.598 ms |
| 50 | 366.057 ± 353.857 ms | 180.208 ± 87.027 ms |
| 100 | 725.106 ± 693.911 ms | 353.102 ± 190.356 ms |
| 200 | 1555.120 ± 1342.055 ms | 845.292 ± 287.755 ms |

to run the benchmarks yourself, run `./fastapi-benchmark.sh` and
`./litserve-benchmark.sh` commands on your terminal.

### conclusion

in terms of both total runtime and average response time, LitServe outperforms
FastAPI (near 2x faster). This can be attributed to the fact that LitServe
utilises all available GPUs by default and supports batching of concurrent
requests whereas FastAPI uses only one GPU and does not support batching. There
could be other factors at play as well, but these are the most significant ones
in my opinion.

## takeaways

- since LitServe is built on top of FastAPI, it can be a drop-in replacement for
FastAPI-based machine learning projects
- as i wanted to test the out-of-the-box performance of both frameworks, both
the applications are built without optimisation in mind. Even the machine
learning model I used is a simple pre-trained ResNet50. It is thus likely that
performance may vary for more complex applications/use-cases as LitServe is
still a relatively new project
- my favorite features till now: **GPU control** and **batching support**. As
opposed to the FastAPI application which used only one GPU, LitServe spins up
a worker per all available devices by default (can be configured further).
Batching is something very exciting for me as it drastically improves the
concurrency performance of any machine learning application under heavy load.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ajndkr/litserve-init

Awesome Lists containing this project

README