Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/patw/bumblebee
A small LLM driven home assistant
https://github.com/patw/bumblebee
home-assistant picovoice
Last synced: 29 days ago
JSON representation
A small LLM driven home assistant
- Host: GitHub
- URL: https://github.com/patw/bumblebee
- Owner: patw
- License: mit
- Created: 2024-04-28T22:30:14.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-04-30T01:24:09.000Z (8 months ago)
- Last Synced: 2024-05-07T18:23:06.941Z (8 months ago)
- Topics: home-assistant, picovoice
- Language: Python
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Bumblebee
Bumblebee is an LLM driven home assistant. Say the wake word "bumblebee" and ask any question you want to a local LLM model.
Requires a valid API key for PicoVoice (https://picovoice.ai/)
## Local Installation
```
pip install -r requirements.txt
```Copy one of the model.json.type files to model.json. This file is used to set the prompt format and ban tokens, the default
is ChatML format so it should work with most recent models. Set the llama_endpoint to point to your llama.cpp running
in server mode, if it's not on the same container/server as your SumBot service (see below!)Copy sample.env to .env and put in your PicoVoice API key and the URL for your Bumblebee API (usually localhost port 3000)
## Running Bumblebee API
```
uvicorn main:app --host 0.0.0.0 --port 3000 --reload
```## Running Bumblebee Voice App
```
python3 bumblebee.py
```## Downloading an LLM model
We highly recommend OpenHermes 2.5 Mistral-7b fine tune for this task, as it's currently the best (Nov 2023) that
we've tested personally. You can find different quantized versions of the model here:https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF/tree/main
I'd suggest the Q6 quant for GPU and Q4_K_M for CPU
## Running a model on llama.cpp in API mode
### Windows
Go to the llama.cpp releases and download either the win-avx2 package for CPU or the cublas for nvidia cards:
https://github.com/ggerganov/llama.cpp/releases
Extract the files out and run the following for nvidia GPUs:
```
server.exe -m .gguf -t 4 -c 2048 -ngl 33 --host 0.0.0.0 --port 8086
```For CPU only:
```
server.exe -m .gguf -c 2048 --host 0.0.0.0 --port 8086
```Replace with whatever model you downloaded and put into the llama.cpp directory
### Linux, MacOS or WSL2
Follow the install instructions for llama.cpp at https://github.com/ggerganov/llama.cppGit clone, compile and run the following for GPU:
```
./server -m models/.gguf -t 4 -c 2048 -ngl 33 --host 0.0.0.0 --port 8086
```For CPU only:
```
./server -m models/.gguf -c 2048 --host 0.0.0.0 --port 8086
```Replace with whatever model you downloaded and put into the llama.cpp/models directory
## Accessing API Directly for Testing
http://localhost:3000/docs