https://github.com/klutchell/balena-serge
Run an LLM on your edge device with balena.io
https://github.com/klutchell/balena-serge
Last synced: about 1 year ago
JSON representation
Run an LLM on your edge device with balena.io
- Host: GitHub
- URL: https://github.com/klutchell/balena-serge
- Owner: klutchell
- Created: 2023-05-01T19:04:05.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-07-29T13:19:39.000Z (almost 2 years ago)
- Last Synced: 2025-04-04T14:03:43.279Z (about 1 year ago)
- Size: 142 KB
- Stars: 3
- Watchers: 2
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# balena-serge
A chat interface based on llama.cpp for running Alpaca models.
Entirely self-hosted, no API keys needed. Fits on 4GB of RAM and runs on the CPU.
You can read more on the [official project README](https://github.com/serge-chat/serge).
## Hardware required
LLaMA will just crash if you don't have enough available memory for your model.
- 7B requires about 4.5GB of free RAM
- 13B requires about 12GB free
- 30B requires about 20GB free
I've tested on Intel NUC, but any `amd64` or `aarch64` device with at least 5GB of memory should work!
In theory Raspberry Pi 4 8GB model should work but I haven't tried it myself!
## Getting Started
You can one-click-deploy this project to balena using the button below:
[](https://dashboard.balena-cloud.com/deploy?repoUrl=https://github.com/klutchell/balena-serge)
## Manual Deployment
Alternatively, deployment can be carried out by manually creating a [balenaCloud account](https://dashboard.balena-cloud.com) and application, flashing a device,
downloading the project and pushing it via the [balena CLI](https://github.com/balena-io/balena-cli).
### Environment Variables
| Name | Default | Purpose |
| ---- | ------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| `TZ` | `UTC` | The timezone in your location. Find a [list of all timezone values here](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). |
## Usage
Once your device joins the fleet you'll need to allow some time for it to download the application containers.
When it's done you should be able to access the app on port 80 of the device.
You can read more on the [official project README](https://github.com/serge-chat/serge).
## Contributing
Please open an issue or submit a pull request with any features, fixes, or changes.