https://github.com/remixer-dec/botality-ii

telegram bot for self-hosted local inference of stable diffusion, text-to-speech and large language models, such as llama3
https://github.com/remixer-dec/botality-ii

ai alpaca gpt-2 gpt-j llama llama3 llamacpp lora m1-mac mps multimodal self-hosted stable-diffusion stt telegram-bot tta tts

Last synced: 4 months ago
JSON representation

telegram bot for self-hosted local inference of stable diffusion, text-to-speech and large language models, such as llama3

Host: GitHub
URL: https://github.com/remixer-dec/botality-ii
Owner: remixer-dec
Created: 2023-03-11T20:52:50.000Z (about 2 years ago)
Default Branch: master
Last Pushed: 2024-05-13T00:43:39.000Z (about 1 year ago)
Last Synced: 2024-09-30T23:31:42.793Z (8 months ago)
Topics: ai, alpaca, gpt-2, gpt-j, llama, llama3, llamacpp, lora, m1-mac, mps, multimodal, self-hosted, stable-diffusion, stt, telegram-bot, tta, tts
Language: Python
Homepage: https://github.com/remixer-dec/botality-ii/wiki/Overview
Size: 377 KB
Stars: 37
Watchers: 3
Forks: 7
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md

Awesome Lists containing this project

README

        ## Botality II  

  

This project is an implementation of a modular **telegram bot** based on [aiogram](https://github.com/aiogram/aiogram), designed for local ML Inference with remote service support. Currently integrated with:

-  **Stable Diffusion** (using [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) API),

-  **TTS** text-to-speech engine (using [TTS (VITS)](https://github.com/coqui-ai/TTS) and [so-vits-SVC](https://github.com/svc-develop-team/so-vits-svc/tree/4.0)) as well as OS voices.  

-  **STT** integrated with multiple speech recognition engines, including [whisper.cpp](https://github.com/ggerganov/whisper.cpp)[¹](https://github.com/stlukey/whispercpp.py), [whisperS2T](https://github.com/shashikg/WhisperS2T), [silero](https://github.com/snakers4/silero-models), [wav2vec2](https://ai.meta.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/)  

-  **LLMs** such as [llama (1-3)](https://github.com/facebookresearch/llama), [gpt-j](https://github.com/kingoflolz/mesh-transformer-jax#gpt-j-6b), [gpt-2](https://huggingface.co/gpt2) with support for assistant mode via instruct-tuned lora models and multimodality via [adapter-model](https://github.com/OpenGVLab/LLaMA-Adapter) 

- **TTA** experimental text-to-audio support via [audiocraft](https://github.com/facebookresearch/audiocraft)  

Accelerated LLM inference support: [llama.cpp](https://github.com/ggerganov/llama.cpp), [mlc-llm](https://github.com/mlc-ai/mlc-llm) and [llama-mps](https://github.com/remixer-dec/llama-mps/)  

Remote LLM inference support: [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/), [LostRuins/koboldcpp](https://github.com/LostRuins/koboldcpp) and [llama.cpp server](https://github.com/ggerganov/llama.cpp/tree/master/examples/server)  

Compatibility table is available [here](COMPATIBILITY.md)  

  

Evolved from predecessor [Botality I](https://github.com/remixer-dec/ru-gpt3-telegram-bot)  

Shipped with an easy-to-use webui, you can run commands and talk with the bot right in the webui.



### Documentation

You can find it [here](https://github.com/remixer-dec/botality-ii/wiki/Getting-started) (coming soon)

### Changelog

Some versions have breaking changes, see [Changelog file](CHANGELOG.md) for more information

### Features

[Bot]

- User-based queues and delayed task processing

- Multiple modes to filter access scopes (WL/BL/Both/Admin-only)

- Support of accelerated inference on M1 Macs

- Memory manager, keeps track of models loaded at the same time and loads/unloads them on demand.

[LLM]

- Supports dialog mode casually playing a role described in a character file, keeping chat history with all users in group chats or with each user separately

- Character files can be easily localized for any language for non-english models

- Assistant mode via /ask command or with direct replies (configurable)

- Single-reply short-term memory for assistant feedback

- Supports visual question answering, when multimodal-adapter is available

[SD]

- CLI-like way to pass stable diffusion parameters

- pre-defined prompt wrappers

- lora integration with easy syntax: lora_name100 => <lora:lora_name:1.0> and custom lora activators

[TTS]

- can be run remotely, or on the same machine

- tts output is sent as voice messages

- can be used on voice messages (speech and acapella songs) to dub them with a different voice 

[STT]

- can be activated as a speech recognition tool via /stt command replying to voice messages  

- if `stt_autoreply_mode` parameter is not `none`, it recognizes voice messages and replies to them with LLM and TTS modules  

[TTA]

- can be used with /sfx and /music commands after adding `tta` to `active_modules`  

  

  

### Setup:

- copy `.env.example` file and rename the copy to `.env`, do NOT add the .env file to your commits! 

- set up your telegram bot token and other configuration options in `.env` file

- install requirements `pip install -r requrements.txt`

- install optional requirements if you want to use tts and tts_server `pip install -r requrements-tts.txt` and `pip install -r requrements-llm.txt` if you want to use llm, you'll probably also need a fresh version of [pytorch](https://pytorch.org/get-started/locally/). For speech-to-text run `pip install -r requrements-stt.txt`, for text-to-audio run `pip install -U git+https://[email protected]/facebookresearch/audiocraft#egg=audiocraft`

- you can continue configuration in the webui, it has helpful tips about each configuration option

- for stable diffusion module, make sure that you have webui installed and it is running with `--api` flag

- for text-to-speech module download VITS models, put their names in `tts_voices` configuration option and path to their directory in `tts_path`

- for llm module, see LLM Setup section bellow

- if you want to use webui + api, run it with `python dashboard.py`, otherwise run the bot with `python bot.py` 

  

python3.10+ is recommended, due to aiogram compatibility, if you are experiencing problems with whisper or logging, please update numpy.

### Supported language models (tested):  

#### Python/Pytorch backend  

- [original llama](https://github.com/facebookresearch/llama/blob/main/example.py) (7b version was tested on [llama-mps fork](https://github.com/remixer-dec/llama-mps/tree/multimodal-adapter) for macs), requires running the bot with `python3.10 -m torch.distributed.launch --use_env bot.py`  

assistant mode for original llama is available with [LLaMa-Adapter](https://github.com/ZrrSkywalker/LLaMA-Adapter), to use both chat and assistant mode, some changes[[1]](https://github.com/remixer-dec/llama-mps/commit/a9b319a927461e4d9b5d74789b3b4a079cb90620)[[2]](https://github.com/remixer-dec/llama-mps/commit/74e9734eefaba721d03974924d0a43175237f32c) are necessary for non-mac users.

- [hf llama](https://huggingface.co/decapoda-research/llama-7b-hf/tree/main) (tests outdated) + [alpaca-lora](https://github.com/tloen/alpaca-lora) / [ru-turbo-alpaca-lora](https://huggingface.co/IlyaGusev/llama_7b_ru_turbo_alpaca_lora)

- [gpt-2](https://huggingface.co/gpt2) (tested on [ru-gpt3](https://github.com/ai-forever/ru-gpts)), nanoGPT (tested on [minChatGPT](https://github.com/ethanyanjiali/minChatGPT) [[weights](https://huggingface.co/ethanyanjiali/minChatGPT/blob/main/final_ppo_model_gpt2medium.pt)])

- [gpt-j](https://github.com/kingoflolz/mesh-transformer-jax#gpt-j-6b) (tested on a custom model)

  

#### C++ / TVM backend  

- [llama.cpp](https://github.com/abetlen/llama-cpp-python) (tested on a lot of models)[[models]](https://huggingface.co/models?sort=downloads&search=GGUF)]

- [mlc-llm-chat](https://mlc.ai/mlc-llm/#windows-linux-mac) (tested using prebuilt binaries on demo-vicuna-v1-7b-int3 model, M1 GPU acceleration confirmed, integrated via [mlc-chatbot](https://github.com/XinyuSun/mlc-chatbot))

  

#### Remote api backend  

- [oobabooga webui](https://github.com/oobabooga/text-generation-webui/) 

- [kobold.cpp](https://github.com/LostRuins/koboldcpp/) with the same `remote_ob` backend

- [llama.cpp server](https://github.com/ggerganov/llama.cpp/tree/master/examples/server) with `remote_lcpp` llm backend option (Obsidian model w/ multimodality tested)

### LLM Setup

- Make sure that you have enough RAM / vRAM to run models.

- Download the weights (and the code if needed) for any large language model

- in .env file, make sure that `"llm"` is in `active_modules`, then set:  

`llm_paths` - change the path(s) of model(s) that you downloaded  

`llm_backend` - select from `pytorch`, `llama.cpp`, `mlc_pb`, `remote_ob`, `remote_lcpp`

`llm_python_model_type` = if you set `pytorch` in the previous option, set the model type that you want to use, it can be `gpt2`,`gptj`,`llama_orig`, `llama_hf` and `auto_hf`.  

`llm_character` = a character of your choice, from `characters` directory, for example `characters.gptj_6B_default`, character files also have prompt templates and model configuration options optimal to specific model, feel free to change the character files, edit their personality and use with other models.  

`llm_assistant_chronicler` = a input/output formatter/parser for assistant task, can be `instruct` or `raw`, do not change if you do not use `mlc_pb`.  

`llm_history_grouping` = `user` to store history with each user separately or `chat` to store group chat history with all users in that chat  

`llm_assistant_use_in_chat_mode` = `True`/`False` when False, use /ask command to ask the model questions without any input history, when True, all messages are treated as questions.  

  

- For llama.cpp: make sure that you have a c++ compiler, then put all necessary flags to enable GPU support, and install it `pip install llama-cpp-python`, download model weights and change the path in `llm_paths`.

- For mlc-llm, follow the installation instructions from the docs, then clone [mlc-chatbot](https://github.com/XinyuSun/mlc-chatbot), and put 3 paths in `llm_paths`. Use with `llm_assistant_use_in_chat_mode=True` and with `raw` chronicler.  

- For oobabooga webui and kobold.cpp, instead of specifying `llm_paths`, set `llm_host`, set `llm_active_model_type` to `remote_ob` and set the `llm_character` to one that has the same prompt format / preset as your model. Run the server with --api flag.

- For llama.cpp c-server, start the `./server`, set its URL in `llm_host` and set `llm_active_model_type` to `remote_lcpp`, for multimodality please refer to this [thread](https://www.reddit.com/r/LocalLLaMA/comments/17jus3h/obsidian_worlds_first_3b_multimodal_opensource_llm/)

  

  

### Bot commands

Send a message to your bot with the command **/tti -h** for more info on how to use stable diffusion in the bot, and **/tts -h** for tts module. The bot uses the same commands as voice names in configuration file for tts. Try **/llm** command for llm module details. LLM defaults to chat mode for models that support it, assistant can be called with /ask command

  

License: the code of this project is currently distributed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license, third party libraries might have different licenses.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/remixer-dec/botality-ii

Awesome Lists containing this project

README