https://github.com/xingyaoww/llm-serving-hub

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/xingyaoww/llm-serving-hub
Owner: xingyaoww
Created: 2023-09-11T22:37:44.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-12-22T12:43:39.000Z (over 1 year ago)
Last Synced: 2025-02-04T20:12:10.588Z (3 months ago)
Language: Shell
Size: 7.81 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

## Installation

```bash
# Install litellm
conda create -n llm-serve python=3.10
conda activate llm-serve
pip install vllm
pip install 'litellm[proxy]'
```

Also install all dependencies for [TGI](https://github.com/huggingface/text-generation-inference).

If `docker` is available, you only need it!

## Start LiteLLM Proxy

```bash
litellm --config config.yml --port 8000
```

## Start TGI

```bash
export CUDA_VISIBLE_DEVICES="2,3";
source source.sh

# MODEL_PATH, MAX_INPUT_LENGTH, MAX_TOTAL_TOKENS (need for TGI)
tgi_serve_model_docker /data/shared/Llama-2-70b-chat-hf 3968 4096
tgi_serve_model_docker /data/shared/CodeLlama-34b-Instruct-hf 16256 16384

# MODEL_PATH, MAX_TOTAL_TOKENS, CHAT_TEMPLATE
vllm_serve_model_docker /data/shared/CodeLlama-34b-Instruct-hf/ 16384 chat_templates/llama.jinja
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/xingyaoww/llm-serving-hub

Awesome Lists containing this project

README