https://github.com/bjodah/llm-multi-backend-container
Docker/podman container for llama.cpp/vllm/exllamav2 orchestrated using llama-swap
https://github.com/bjodah/llm-multi-backend-container
Last synced: about 2 months ago
JSON representation
Docker/podman container for llama.cpp/vllm/exllamav2 orchestrated using llama-swap
- Host: GitHub
- URL: https://github.com/bjodah/llm-multi-backend-container
- Owner: bjodah
- License: bsd-2-clause
- Created: 2025-03-27T08:46:49.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-04-15T14:33:59.000Z (2 months ago)
- Last Synced: 2025-04-15T15:39:24.123Z (2 months ago)
- Language: Python
- Size: 16.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome - bjodah/llm-multi-backend-container - Docker/podman container for llama.cpp/vllm/exllamav2 orchestrated using llama-swap (Python)
README
# llm-multi-backend-container
Use llama-swap inside a container with vllm, llama.cpp, and exllamav2+tabbyAPI.## Usage
```console
$ head ./bin/host-llm-multi-backend-container.sh
$ ./bin/host-llm-multi-backend-container.sh --build --force-recreate
```## Useful(?) tools
```console
$ ./bin/prompt-llm-multi-backend.py stream --model llamacpp-gemma-3-27b-it -t "Write a poem about a
bear on a unicycle" --
```## Testing
```console
$ bash -x scripts/test-chat-completions.sh
```## Notes
- Right now the config for vLLM struggle with allocating VRAM. Unclear why that is.
- For customization, you might want to grep for a few keywords:
```console
$ git grep 8686
$ git grep sk-empty
```
- Does not seem like `seed` is working?
```console
./bin/prompt-llm-multi-backend.py stream -t "write a poem about a bear on a unicycle" --opts 'temperature=2.0;max_tokens=1000;seed=42'
```