https://github.com/iakashpaul/ghudsavar
Ghudsavar (Horse rider) - Is a quick llama.cpp server for CPU only runtimes
https://github.com/iakashpaul/ghudsavar
gemma gemma-2b ggml gguf google huggingface llama llamacpp server
Last synced: 3 months ago
JSON representation
Ghudsavar (Horse rider) - Is a quick llama.cpp server for CPU only runtimes
- Host: GitHub
- URL: https://github.com/iakashpaul/ghudsavar
- Owner: iakashpaul
- License: mit
- Created: 2024-02-27T09:25:27.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-05-04T05:05:59.000Z (over 1 year ago)
- Last Synced: 2025-07-18T22:34:33.858Z (3 months ago)
- Topics: gemma, gemma-2b, ggml, gguf, google, huggingface, llama, llamacpp, server
- Language: Dockerfile
- Homepage: https://huggingface.co/spaces/iAkashPaul/Ghudsavar
- Size: 16.6 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Ghudsavar 🏇🏻
Ghudsavar (Horse rider) - Helps you spin up a quick [llama.cpp](https://github.com/ggerganov/llama.cpp) server (OpenAI API compatbile) which plugs into ```langchain``` & ```llamaindex``` w/o sweat. Currently for CPU only runtimes, made available as a docker image. Duplicate this [🤗 HF-space](https://huggingface.co/spaces/iAkashPaul/server.cpp) as your own CPU or GPU(with suitable build flags & ngl params) space & change the model weights to your own GGUF file.
> BTW the free tier with 2 CPU-cores runs between 5-8tok/s with Gemma-2B-Instruct@Q8, which is alright for quick testing.
## Local setup
```bash
git clone https://github.com/iakashpaul/Ghudsavar.git
cd Ghudsavar
docker build -t iakashpaul/Ghudsavar:latest .
```