https://github.com/open-webui/llama-cpp-runner
https://github.com/open-webui/llama-cpp-runner
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/open-webui/llama-cpp-runner
- Owner: open-webui
- License: mit
- Created: 2025-01-26T02:08:44.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-19T02:03:02.000Z (about 1 year ago)
- Last Synced: 2025-06-03T10:05:51.240Z (about 1 year ago)
- Language: Python
- Size: 23.4 KB
- Stars: 22
- Watchers: 1
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ฆ llama-cpp-runner
`llama-cpp-runner` is the ultimate Python library for running [llama.cpp](https://github.com/ggerganov/llama.cpp) with zero hassle. It automates the process of downloading prebuilt binaries from the upstream repo, keeping you always **up to date** with the latest developments. All while requiring no complicated setupsโeverything works **out-of-the-box**.
## Key Features ๐
1. **Always Up-to-Date**: Automatically fetches the latest prebuilt binaries from the upstream llama.cpp GitHub repo. No need to worry about staying current.
2. **Zero Dependencies**: No need to manually install compilers or build binaries. Everything is handled for you during installation.
3. **Model Flexibility**: Seamlessly load and serve **GGUF** models stored locally or from Hugging Face with ease.
4. **Built-in HTTP Server**: Automatically spins up a server for chat interactions and manages idle timeouts to save resources.
5. **Cross-Platform Support**: Works on **Windows**, **Linux**, and **macOS** with automatic detection for AVX/AVX2/AVX512/ARM architectures.
## Why Use `llama-cpp-runner`?
- **Out-of-the-box experience**: Forget about setting up complex environments for building. Just install and get started! ๐ ๏ธ
- **Streamlined Model Serving**: Effortlessly manage multiple models and serve them with an integrated HTTP server.
- **Fast Integration**: Use prebuilt binaries from upstream so you can spend more time building and less time troubleshooting.
## Installation ๐
Installing `llama-cpp-runner` is quick and easy! Just use pip:
```bash
pip install llama-cpp-runner
```
## Optional Installation (Docker)
Clone the repository
```bash
git clone https://github.com/open-webui/llama-cpp-runner
```
Build and run
```bash
docker compose up -d
```
## Usage ๐
### Initialize the Runner
```python
from llama_cpp_runner import LlamaCpp
llama_runner = LlamaCpp(models_dir="path/to/models", verbose=True)
# List all available GGUF models
models = llama_runner.list_models()
print("Available Models:", models)
```
### Chat Completion
```python
response = llama_runner.chat_completion({
"model": "your-model-name.gguf",
"messages": [{"role": "user", "content": "Hello, Llama!"}],
"stream": False
})
print(response)
```
## How It Works ๐ ๏ธ
1. Automatically detects your system architecture (e.g., AVX, AVX2, ARM) and platform.
2. Downloads and extracts the prebuilt llama.cpp binaries from the official repo.
3. Spins up a lightweight HTTP server for chat interactions.
## Advantages ๐
- **Hassle-Free**: No need to compile binaries or manage system-specific dependencies.
- **Latest Features, Always**: Stay up to date with llama.cppโs improvements with every release.
- **Optimized for Your System**: Automatically fetches the best binary for your architecture.
## Supported Platforms ๐ฅ๏ธ
- Windows
- macOS
- Linux
## Contributing ๐ป
Weโd love your contributions! Bug reports, feature requests, and pull requests are all welcome.
## License ๐
This library is open-source and distributed under the MIT license.
Happy chatting with llama.cpp! ๐