https://github.com/xingyaoww/llm-serving-hub
https://github.com/xingyaoww/llm-serving-hub
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/xingyaoww/llm-serving-hub
- Owner: xingyaoww
- Created: 2023-09-11T22:37:44.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-22T12:43:39.000Z (over 1 year ago)
- Last Synced: 2025-02-04T20:12:10.588Z (3 months ago)
- Language: Shell
- Size: 7.81 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Installation
```bash
# Install litellm
conda create -n llm-serve python=3.10
conda activate llm-serve
pip install vllm
pip install 'litellm[proxy]'
```Also install all dependencies for [TGI](https://github.com/huggingface/text-generation-inference).
If `docker` is available, you only need it!
## Start LiteLLM Proxy
```bash
litellm --config config.yml --port 8000
```## Start TGI
```bash
export CUDA_VISIBLE_DEVICES="2,3";
source source.sh# MODEL_PATH, MAX_INPUT_LENGTH, MAX_TOTAL_TOKENS (need for TGI)
tgi_serve_model_docker /data/shared/Llama-2-70b-chat-hf 3968 4096
tgi_serve_model_docker /data/shared/CodeLlama-34b-Instruct-hf 16256 16384# MODEL_PATH, MAX_TOTAL_TOKENS, CHAT_TEMPLATE
vllm_serve_model_docker /data/shared/CodeLlama-34b-Instruct-hf/ 16384 chat_templates/llama.jinja
```