{"id":20472008,"url":"https://github.com/wtlow003/modal-llm-serving","last_synced_at":"2026-04-27T11:31:02.125Z","repository":{"id":243105226,"uuid":"809079022","full_name":"wtlow003/modal-llm-serving","owner":"wtlow003","description":"Examples of serving LLM on Modal.","archived":false,"fork":false,"pushed_at":"2024-06-13T04:39:34.000Z","size":42,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-29T18:57:08.140Z","etag":null,"topics":["llm","lmdeploy","modal","model-serving","openai","openai-api","sglang","vllm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wtlow003.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-01T16:16:47.000Z","updated_at":"2024-07-30T09:13:16.000Z","dependencies_parsed_at":"2024-06-06T19:32:56.763Z","dependency_job_id":"4ac52513-bdde-40c7-9301-c079d83c3d4d","html_url":"https://github.com/wtlow003/modal-llm-serving","commit_stats":null,"previous_names":["wtlow003/modal-llm-serving"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/wtlow003/modal-llm-serving","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wtlow003%2Fmodal-llm-serving","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wtlow003%2Fmodal-llm-serving/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wtlow003%2Fmodal-llm-serving/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wtlow003%2Fmodal-llm-serving/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wtlow003","download_url":"https://codeload.github.com/wtlow003/modal-llm-serving/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wtlow003%2Fmodal-llm-serving/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32335295,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T23:26:28.701Z","status":"online","status_checked_at":"2026-04-27T02:00:06.769Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm","lmdeploy","modal","model-serving","openai","openai-api","sglang","vllm"],"created_at":"2024-11-15T14:17:54.788Z","updated_at":"2026-04-27T11:31:02.105Z","avatar_url":"https://github.com/wtlow003.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://modal.com\"\u003e\n    \u003cimg src=\"https://modal.com/assets/social-image.jpg\" height=\"96\"\u003e\n    \u003ch1 align=\"center\"\u003eModal LLM Serving Examples and Benchmarks\u003c/h3\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/python-3.10-orange\"\n         alt=\"python version\"\u003e\n     \u003cimg src=\"https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json\"\n          alt=\"uv\"\u003e\n    \u003cimg src=\"https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v1.json\"\n         alt=\"ruff\"\u003e\n\u003c/p\u003e\n\n## About\n\nThis repo contains a collections of examples for LLM Serving on [Modal](https://modal.com/). For comparison purposes on various serving frameworks, benchmarking setup heavily referenced from [vLLM](https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py) is also [provided](./benchmark/benchmark_server.py).\n\nCurrently, the following framework as been deployed and tested to be working via Modal [Deployments](https://modal.com/docs/guide/managing-deployments).\n\n| Framework                       | GitHub Repo                                              | Modal Script                       |\n|---------------------------------|----------------------------------------------------------|------------------------------------|\n| vLLM                            | https://github.com/vllm-project/vllm                     | [script](./src/vllm/server.py)     |\n| Text Generation Interface (TGI) | https://github.com/huggingface/text-generation-inference | [script](./src/tgi/server.py)      |\n| LMDeploy                        | https://github.com/InternLM/lmdeploy                     | [script](./src/lmdeploy/server.py) |\n\n\n## Getting Started\n\nTo ensure for deploying the respective examples, you can setup the environment using the following commands.\n\nThis project uses [uv](https://github.com/astral-sh/uv) for dependency management. To install `uv`, please refer to this [guide](https://github.com/astral-sh/uv#getting-started):\n\n```shell\n# On macOS and Linux.\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n\n# On Windows.\npowershell -c \"irm https://astral.sh/uv/install.ps1 | iex\"\n\n# With pip.\npip install uv\n\n# With pipx.\npipx install uv\n\n# With Homebrew.\nbrew install uv\n\n# With Pacman.\npacman -S uv\n```\n\nTo install the required dependencies:\n\n```shell\n# create a virtual env\nuv venv\n\n# install dependencies\nuv pip install -r requirements.txt  # Install from a requirements.txt file.\n```\n\nIf you are looking to contribute to the repo, you will also be required to install the pre-commit hooks to ensure that your code changes are linted and formatted accordingly:\n\n```shell\npip install pre-commit\n\npre-commit install \u0026\u0026\npre-commit install --hook-type commit-msg\n```\n\n## Deployment\n\nTo deploy on **Modal**, simply use the [CLI](https://modal.com/docs/reference/changelog), and deploy the respective serving framework as desired.\n\nFor example to deploy a vLLM server:\n\n```shell\nsource .venv/bin/activate\n\nmodal deploy src/vllm/server.py\n```\n\nUpon successfully deployment, you should see the following (similar) information on your terminal:\n\n```shell\n┌───────────────────\n│ 📁 ~/c/modal-llm-serving  master [!]\n└─❯  modal deploy src/vllm/server.py\n✓ Created objects.\n├── 🔨 Created mount /Users/xxx/code/modal-llm-serving/template_mistral_7b_instruct.jinja\n├── 🔨 Created mount /Users/xxx/code/modal-llm-serving/src/vllm/server.py\n├── 🔨 Created download_hf_model.\n└── 🔨 Created serve =\u003e https://xxx--vllm-mistralai--mistral-7b-instruct-v02-serve.modal.run\n✓ App deployed! 🎉\n\nView Deployment:\nhttps://modal.com/xxx/main/apps/deployed/vllm-mistralai--mistral-7b-instruct-v02\n```\n\nTo access the respective Swagger UI, you can either directly access the `serve` URL or append `/docs` to the URL, depending on the serving frameworks.\n\n## Benchmark\n\nTo run benchmarks on the deployed LLM inference servers, you can run the benchmark script as follows:\n\n```shell\npython benchmark/benchmark_server.py --backend vllm \\\n    --model \"mistralai--mistral-7b-instruct\" \\\n    --num-request 1000 \\\n    --request-rate 64 \\\n    --num-benchmark-runs 3 \\\n    --max-input-len 1024 \\\n    --max-output-len 1024 \\\n    --base-url \"https://xxx--vllm-mistralai--mistral-7b-instruct-v02-serve.modal.run\"\n```\n\u003e [!IMPORTANT]\n\u003e\n\u003e **NOTE**: Replace the `--base-url` with your own deployment url as indicated upon successful deployment with `modal deploy`.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwtlow003%2Fmodal-llm-serving","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwtlow003%2Fmodal-llm-serving","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwtlow003%2Fmodal-llm-serving/lists"}