Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/patw/discord_llama
Run llama.cpp based chatbots on Discord
https://github.com/patw/discord_llama
discord discord-bot llamacpp openhermes python
Last synced: 29 days ago
JSON representation
Run llama.cpp based chatbots on Discord
- Host: GitHub
- URL: https://github.com/patw/discord_llama
- Owner: patw
- License: mit
- Created: 2023-11-26T15:02:18.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-11T15:37:05.000Z (7 months ago)
- Last Synced: 2024-06-11T23:10:37.838Z (7 months ago)
- Topics: discord, discord-bot, llamacpp, openhermes, python
- Language: Python
- Homepage:
- Size: 20.5 KB
- Stars: 1
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Discord Llama
Build discord bots that respond with a locally running llama.cpp server. This allows you to run your own models, on CPU or GPU
as long as you have the hardware resources. Bots can be given identies and respond to trigger words. They can take advantage
of the discord channel history to act conversational.## Local Installation
```
pip install -r requirements.txt
```## Downloading an LLM model
We highly recommend OpenHermes 2.5 Mistral-7b fine tune for this task, as it's currently the best (Nov 2023) that
we've tested personally. You can find different quantized versions of the model here:https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF/tree/main
I'd suggest the Q6 quant for GPU and Q4_K_M for CPU
## Running a model on llama.cpp in API mode
### Windows
Go to the llama.cpp releases and download either the win-avx2 package for CPU or the cublas for nvidia cards:
https://github.com/ggerganov/llama.cpp/releases
Extract the files out and run the following for nvidia GPUs:
```
server.exe -m .gguf -t 4 -c 2048 -ngl 33 --host 0.0.0.0 --port 8086
```For CPU only:
```
server.exe -m .gguf -c 2048 --host 0.0.0.0 --port 8086
```Replace with whatever model you downloaded and put into the llama.cpp directory
### Linux, MacOS or WSL2
Follow the install instructions for llama.cpp at https://github.com/ggerganov/llama.cppGit clone, compile and run the following for GPU:
```
./server -m models/.gguf -t 4 -c 2048 -ngl 33 --host 0.0.0.0 --port 8086
```For CPU only:
```
./server -m models/.gguf -c 2048 --host 0.0.0.0 --port 8086
```Replace with whatever model you downloaded and put into the llama.cpp/models directory
## Running discord bot
* Copy the model and wizard sample files to .json files
* Paste your discord token into the wizard.json file, you can create a discord bot application here: https://discord.com/developers/applications
* Join your bot to the server with the proper chat permissions
* Run the following:```
python discord_llama.py model.json wizard.json
```### Configuration Parameters
#### model.json
* llama_endpoint - Usually localhost, but if you have a dedicated machine to run it on change URL to that one
* prompt_format - For instruct tuned models, the format provided is ChatML format which works fine with OpenHermes
* stop_tokens - Stopping token for generation, should match the model's prompt format#### wizard.json
* discord_token - The discord token for the bot, you can make one following this: https://discordpy.readthedocs.io/en/stable/discord.html#discord-intro
* identity - This is where your bot's personality comes from you can get very creative with prompt engineering here and make the bot anything you want
* triggers - These are words the bot will respond to if they show up in any conversation.
* trigger_level - This is how often the bot will respond to triggers 1.0 = 100%, 0.25 = 25%. Anything > 0.5 is annoying
* temperature - This is the LLM temperature which you can think of as the creativity. 0.7 is good for chatbots.
* tokens - This is the number of words it can output. You don't want it writing 10 page novels in discord, so keep it low
* history_lines - This is how much history it can see when answering questions. Too much and it can get confused.
* question_prompt - This is the prompt it processes when @ msged a question
* trigger_prompt - This is the prompt it processes when it's commenting on a trigger word.### Running multiple bots
Each bot needs it's own Discord application and key generated. Once you do that, create multiple bot identity json files like the
sample wizard.json. Create multiple .sh or .bat files to start each one up.### Known issues
* The bot will use your discord account name, not the alias you've set for the server. I tried to get this working with display_name but it doesn't seem possible to get that in the history object.