https://github.com/chmp/lm-proxy
https://github.com/chmp/lm-proxy
Last synced: 17 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/chmp/lm-proxy
- Owner: chmp
- License: mit
- Created: 2024-07-01T16:39:41.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-01T17:48:46.000Z (almost 2 years ago)
- Last Synced: 2025-02-13T01:35:01.041Z (over 1 year ago)
- Language: Rust
- Size: 31.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
- License: License.md
Awesome Lists containing this project
README
# `lm-proxy` - (large) language model proxy
A proxy for (large) language models that forwards to external servers. It manages external servers
and spins them up and down on demand.
Config:
```toml
[proxy]
port = 8080
# without running requests, keep models alive for 60s
keep_alive = 60
# with a running request, keep models alive for 300s
request_keep_alive = 300
[models.phi3]
args = [
"llama-server",
"--model",
"phi-3-mini-4k-instruct-q4.gguf",
"--port",
"{{ port }}",
]
[models.gemma2]
args = [
"llama-server",
"--model",
"gemma-2-9b-it-q5_k_m.gguf",
"--port",
"{{ port }}",
]
```
Start the server:
```bash
lm-proxy serve config.toml
```
Use the server:
```python
from openai import OpenAI
client = OpenAI(
base_url = 'http://localhost:8080/v1',
api_key='unused',
)
# use the phi3 model
response = client.chat.completions.create(
model="phi3",
messages=[{"role": "user", "content": "What is 2 + 3?"}]
)
print(response.choices[0].message.content)
# use the gemma2 model
response = client.chat.completions.create(
model="gemma2",
messages=[{"role": "user", "content": "How can I add 2 and 3 in Python?"}]
)
print(response.choices[0].message.content)
```