Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/armaggheddon/whisper2me
whisper2me is a telegram bot written with pyTelegramBotAPI that uses OpenAI's whisper to perform speech2text so you no longer have listen to voice messages ๐คซ๐
https://github.com/armaggheddon/whisper2me
docker openia pytelegrambotapi python whisper
Last synced: 29 days ago
JSON representation
whisper2me is a telegram bot written with pyTelegramBotAPI that uses OpenAI's whisper to perform speech2text so you no longer have listen to voice messages ๐คซ๐
- Host: GitHub
- URL: https://github.com/armaggheddon/whisper2me
- Owner: Armaggheddon
- License: mit
- Created: 2024-01-03T18:31:10.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-01-06T14:32:32.000Z (12 months ago)
- Last Synced: 2024-04-21T11:26:01.076Z (8 months ago)
- Topics: docker, openia, pytelegrambotapi, python, whisper
- Language: Python
- Homepage:
- Size: 306 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
whisper2me
๐ฌ Hate voice messages? ๐๏ธ Let whisper2me handle them! Just forward the audios and get smooth transcriptions. Fast, simple, and ready for action! โกโจ## Table of Contents
* [Prerequisites](#preprequisites)
* [Setup](#setup)
* [CUDA Setup](#cuda-setup)
* [Usage](#usage)
* [Available commands](#available-commands)
* [How it works](#how-it-works)
* [Known issues](#known-issues)
* [Task list](#task-list)## Prerequisites ๐
The easiest way to get **whisper2me** up and running is via Docker. Check out the official guide to install Docker Compose [here](https://docs.docker.com/compose/install/).
Here's what you'll need:
- The **bot token** from BotFather on Telegram (find out how [here](https://core.telegram.org/bots/tutorial#getting-ready))
- Your **user_id** from Telegram
- An Nvidia GPU if you're planning to run the CUDA version with the NVIDIA Container Toolkit (see installation steps [here](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html))> [!NOTE]
> Heads-up! Tested on Ubuntu and WSL. No guarantees for other OS's. CUDA tests were done on Nvidia Orin AGX and RTX 3070 Ti via WSL.## Setup ๐ง
1. Clone the repository on your machine with:
```bash
git clone https://github.com/Armaggheddon/whisper2me.git
```
1. Enter the folder:
```bash
cd whisper2me
```
1. Rename `bot_config.env.example` to `bot_config.env` and replace the fields with your own:
- Replace `YOUR_BOT_TOKEN` and `ADMIN_USER_ID`:```dosini
BOT_TOKEN=0000000000:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ADMIN_USER_ID=000000000
```
- By default, the bot uses the **TINY** model, but you can pick a larger one if your system can handle it. Here are your options:
- TINY
- TINY_EN
- BASE
- BASE_EN
- SMALL
- SMALL_EN
- MEDIUM
- MEDIUM_EN
- LARGE_V1
- LARGE_V2
- LARGE_V3
- LARGE
- LARGE_V3_TURBO
- TURBO
To try different models, replace `TINY` with one of the above options in `bot_config.env`:
```dosini
# Available values are, defaults to TINY if mispelled:
# >TINY >TINY_EN
# >BASE >BASE_EN
# >SMALL >SMALL_EN
# >MEDIUM >MEDIUM_EN
# >LARGE_V1 >LARGE_V2
# >LARGE_V3 >LARGE
# >LARGE_V3_TURBO >TURBO
MODEL_NAME=TINY
```
> [!NOTE]
> Refer to the OpenAI whisper's official paper for the performance evaluation between the different models, available [here](https://arxiv.org/abs/2212.04356)1. Build the image:
```bash
docker compose build
```
The image created is named as `whisper2me_bot:latest`.1. Run the container with:
```bash
docker compose up -d
```
`-d` runs the container in detached mode.
> [!TIP]
> The container is, by default, set to automatically restart on failure and when the device restart. This can be changed in the `deploy.restart_policy.condition` setting in `docker-compose.yml` file.9. When the container starts the model is downloaded. Depending on your internet connection and the selected model, this might take a while. The model's weights and the list of allowed users (other than the administrator) are stored in a volume named `whisper2me_bot_data`.
## CUDA Setup โก
To run whisper2me with CUDA acceleration, follow the regular setup, but use these commands for building and running the container:
- Build:
```bash
docker compose -f cuda-docker-compose.yml build
```- Run:
```bash
docker compose -f cuda-docker-compose up -d
```> [!NOTE]
> Tested on Nvidia Orin AGX running Jetpack 5.1.2 with the NVIDIA L4T PyTorch r35.2.1-pth2.0-py3 image and on an RTX 3070 Ti running in WSL.## Usage ๐
Once everythingโs running, open your botโs chat and hit `/start`. Ready to roll! ๐
![](/doc/images/start_example.png)
To transcribe, just forward any voice message, and voilร , youโll receive the transcription. ๐
![](/doc/images/test_message.png)
When a non-admin user tries a restricted command, the admin will be notified with a message containing the `user_id` and the `command` that the user sent. ๐
![](/doc/images/admin_warning.png)
## Available commands ๐
**For all users:**
- `/start` begins the conversation with the bot
- `/info` shows the current bot settings
- `/help` shows a list of available commands**For the admin only:**
- `/language` change the model target language, currently are listed only:
- ๐บ๐ธ English
- ๐ซ๐ท French
- ๐ฉ๐ช German
- ๐ฎ๐น Italian
- ๐ช๐ธ Spanish
- `/task` change the model task to:
- โ Transcribe, the input voice message is trasncribed using the automatically detected language
- ๐ฃ Translate, the input voice message is translated using the selected language with the `/language` command
- `/users` lists the users that are currently allowed to use the bot- `/add_user` starts the interaction to add allow a new user. You can either send:
- The `user_id` of the user you want to add
- Forward a text message of the desired user so that the `user_id` is automatically retrieved, much simpler!
- `/remove_user` starts the interaction to remove a user. A list of currently allowed users is display, simply click the one you want to remove- `/purge` removes all users from the allowed list. Requires a confirmation message that spells exactly `YES`
## How it works โ๏ธ
whisper2me combines the magic of OpenAI's [whisper](https://github.com/openai/whisper) and [pyTelegramBotAPI](https://github.com/eternnoir/pyTelegramBotAPI).
> [!NOTE]
> Translation works only with non-`_EN` modelsThe code can run on both ARM-64 and X64 architectures. It has been tested on:
- Raspberry Pi 3B with 1GB of RAM (using [Raspberry Pi OS(64-bit) Lite](https://www.raspberrypi.com/software/operating-systems/)), the only runnable model is the `TINY` one. Almost all available Pi's resources are used and runs approximately 6x slower than real-time.- Nvidia Orin AGX with 64GB of RAM (using [Jetpack 5.1.2](https://developer.nvidia.com/embedded/jetpack-sdk-512)), all models run without any issue. Using the `LARGE_V3` model requires around 25-30 GB of combined RAM (both CPU and GPU). Execution time is faster than real-time.
- WSL on a desktop in both standard and CUDA version with an RTX 3070 Ti. Execution time is faster than real-time.