https://github.com/bjodah/llm-multi-backend-container

Docker/podman container for llama.cpp/vllm/exllamav2 orchestrated using llama-swap
https://github.com/bjodah/llm-multi-backend-container

Last synced: about 2 months ago
JSON representation

Docker/podman container for llama.cpp/vllm/exllamav2 orchestrated using llama-swap

Host: GitHub
URL: https://github.com/bjodah/llm-multi-backend-container
Owner: bjodah
License: bsd-2-clause
Created: 2025-03-27T08:46:49.000Z (3 months ago)
Default Branch: main
Last Pushed: 2025-04-15T14:33:59.000Z (2 months ago)
Last Synced: 2025-04-15T15:39:24.123Z (2 months ago)
Language: Python
Size: 16.6 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome - bjodah/llm-multi-backend-container - Docker/podman container for llama.cpp/vllm/exllamav2 orchestrated using llama-swap (Python)

README

# llm-multi-backend-container
Use llama-swap inside a container with vllm, llama.cpp, and exllamav2+tabbyAPI.

## Usage
```console
$ head ./bin/host-llm-multi-backend-container.sh
$ ./bin/host-llm-multi-backend-container.sh --build --force-recreate
```

## Useful(?) tools
```console
$ ./bin/prompt-llm-multi-backend.py stream --model llamacpp-gemma-3-27b-it -t "Write a poem about a
bear on a unicycle" --
```

## Testing
```console
$ bash -x scripts/test-chat-completions.sh
```

## Notes
- Right now the config for vLLM struggle with allocating VRAM. Unclear why that is.
- For customization, you might want to grep for a few keywords:
```console
$ git grep 8686
$ git grep sk-empty
```
- Does not seem like `seed` is working?
```console
./bin/prompt-llm-multi-backend.py stream -t "write a poem about a bear on a unicycle" --opts 'temperature=2.0;max_tokens=1000;seed=42'
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bjodah/llm-multi-backend-container

Awesome Lists containing this project

README