https://github.com/wtlow003/modal-llm-serving

Examples of serving LLM on Modal.
https://github.com/wtlow003/modal-llm-serving

llm lmdeploy modal model-serving openai openai-api sglang vllm

Last synced: about 1 month ago
JSON representation

Examples of serving LLM on Modal.

Host: GitHub
URL: https://github.com/wtlow003/modal-llm-serving
Owner: wtlow003
License: mit
Created: 2024-06-01T16:16:47.000Z (about 2 years ago)
Default Branch: master
Last Pushed: 2024-06-13T04:39:34.000Z (almost 2 years ago)
Last Synced: 2025-05-29T18:57:08.140Z (about 1 year ago)
Topics: llm, lmdeploy, modal, model-serving, openai, openai-api, sglang, vllm
Language: Python
Homepage:
Size: 41 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

Modal LLM Serving Examples and Benchmarks

## About

This repo contains a collections of examples for LLM Serving on [Modal](https://modal.com/). For comparison purposes on various serving frameworks, benchmarking setup heavily referenced from [vLLM](https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py) is also [provided](./benchmark/benchmark_server.py).

Currently, the following framework as been deployed and tested to be working via Modal [Deployments](https://modal.com/docs/guide/managing-deployments).

| Framework | GitHub Repo | Modal Script |
|---------------------------------|----------------------------------------------------------|------------------------------------|
| vLLM | https://github.com/vllm-project/vllm | [script](./src/vllm/server.py) |
| Text Generation Interface (TGI) | https://github.com/huggingface/text-generation-inference | [script](./src/tgi/server.py) |
| LMDeploy | https://github.com/InternLM/lmdeploy | [script](./src/lmdeploy/server.py) |

## Getting Started

To ensure for deploying the respective examples, you can setup the environment using the following commands.

This project uses [uv](https://github.com/astral-sh/uv) for dependency management. To install `uv`, please refer to this [guide](https://github.com/astral-sh/uv#getting-started):

```shell
# On macOS and Linux.
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows.
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# With pip.
pip install uv

# With pipx.
pipx install uv

# With Homebrew.
brew install uv

# With Pacman.
pacman -S uv
```

To install the required dependencies:

```shell
# create a virtual env
uv venv

# install dependencies
uv pip install -r requirements.txt # Install from a requirements.txt file.
```

If you are looking to contribute to the repo, you will also be required to install the pre-commit hooks to ensure that your code changes are linted and formatted accordingly:

```shell
pip install pre-commit

pre-commit install &&
pre-commit install --hook-type commit-msg
```

## Deployment

To deploy on **Modal**, simply use the [CLI](https://modal.com/docs/reference/changelog), and deploy the respective serving framework as desired.

For example to deploy a vLLM server:

```shell
source .venv/bin/activate

modal deploy src/vllm/server.py
```

Upon successfully deployment, you should see the following (similar) information on your terminal:

```shell
┌───────────────────
│ 📁 ~/c/modal-llm-serving  master [!]
└─❯ modal deploy src/vllm/server.py
✓ Created objects.
├── 🔨 Created mount /Users/xxx/code/modal-llm-serving/template_mistral_7b_instruct.jinja
├── 🔨 Created mount /Users/xxx/code/modal-llm-serving/src/vllm/server.py
├── 🔨 Created download_hf_model.
└── 🔨 Created serve => https://xxx--vllm-mistralai--mistral-7b-instruct-v02-serve.modal.run
✓ App deployed! 🎉

View Deployment:
https://modal.com/xxx/main/apps/deployed/vllm-mistralai--mistral-7b-instruct-v02
```

To access the respective Swagger UI, you can either directly access the `serve` URL or append `/docs` to the URL, depending on the serving frameworks.

## Benchmark

To run benchmarks on the deployed LLM inference servers, you can run the benchmark script as follows:

```shell
python benchmark/benchmark_server.py --backend vllm \
--model "mistralai--mistral-7b-instruct" \
--num-request 1000 \
--request-rate 64 \
--num-benchmark-runs 3 \
--max-input-len 1024 \
--max-output-len 1024 \
--base-url "https://xxx--vllm-mistralai--mistral-7b-instruct-v02-serve.modal.run"
```
> [!IMPORTANT]
>
> **NOTE**: Replace the `--base-url` with your own deployment url as indicated upon successful deployment with `modal deploy`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wtlow003/modal-llm-serving

Awesome Lists containing this project

README

Modal LLM Serving Examples and Benchmarks