Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mgonzs13/whisper_ros
Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2
https://github.com/mgonzs13/whisper_ros
ggml ros2 speech-recognition speech-to-text vad voice-activity-detection whisper-cpp
Last synced: 24 days ago
JSON representation
Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2
- Host: GitHub
- URL: https://github.com/mgonzs13/whisper_ros
- Owner: mgonzs13
- License: mit
- Created: 2023-05-01T19:12:26.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-08T18:40:53.000Z (about 1 month ago)
- Last Synced: 2024-10-11T18:55:24.337Z (about 1 month ago)
- Topics: ggml, ros2, speech-recognition, speech-to-text, vad, voice-activity-detection, whisper-cpp
- Language: C++
- Homepage:
- Size: 193 KB
- Stars: 44
- Watchers: 4
- Forks: 12
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
- awesome-foundation-model-ros - whisper_ros - ROS 2 wrapper for whisper.cpp. Also provides Voice Activity Detection. (Research-Grade Frameworks)
- awesome-foundation-model-ros - whisper_ros - ROS 2 wrapper for whisper.cpp. Also provides Voice Activity Detection. (Research-Grade Frameworks)
README
# whisper_ros
This repository provides a set of ROS 2 packages to integrate [whisper.cpp](https://github.com/ggerganov/whisper.cpp) into ROS 2 using [audio_common](https://github.com/mgonzs13/audio_common). Besides, [silero-vad](https://github.com/snakers4/silero-vad) is used to perform VAD (Voice Activity Detection).
## Table of Contents
1. [Related Projects](#related-projects)
2. [Installation](#installation)
3. [Docker](#docker)
4. [Usage](#usage)
5. [Demos](#demos)## Related Projects
- [chatbot_ros](https://github.com/mgonzs13/chatbot_ros) → This chatbot, integrated into ROS 2, uses whisper_ros, to listen to people speech; and [llama_ros](https://github.com/mgonzs13/llama_ros/tree/main), to generate responses. The chatbot is controlled by a state machine created with [YASMIN](https://github.com/uleroboticsgroup/yasmin).
## Installation
To run whisper_ros with CUDA, first, you must install the [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit).
```shell
$ cd ~/ros2_ws/src
$ git clone https://github.com/mgonzs13/audio_common.git
$ git clone https://github.com/mgonzs13/whisper_ros.git
$ sudo apt install portaudio19-dev
$ pip3 install -r audio_common/requirements.txt
$ pip3 install -r whisper_ros/requirements.txt
$ cd ~/ros2_ws
$ colcon build --cmake-args -DGGML_CUDA=ON # add this for CUDA
```## Docker
Build the whisper_ros docker. Additionally, you can choose to build whisper_ros with CUDA (`USE_CUDA`) and choose the CUDA version (`CUDA_VERSION`). Remember that you have to use `DOCKER_BUILDKIT=0` to compile whisper_ros with CUDA when building the image.
```shell
$ DOCKER_BUILDKIT=0 docker build -t whisper_ros --build-arg USE_CUDA=1 --build-arg CUDA_VERSION=12-6 .
```Run the docker container. If you want to use CUDA, you have to install the [NVIDIA Container Tollkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) and add `--gpus all`.
```shell
$ docker run -it --rm --gpus all whisper_ros
```## Usage
Run Silero for VAD and Whisper for STT:
```shell
$ ros2 launch whisper_bringup whisper.launch.py
```## Demos
Send a goal action to listen:
```shell
$ ros2 action send_goal /whisper/listen whisper_msgs/action/STT "{}"
```Or try the example of a whisper client:
```shell
$ ros2 run whisper_demos whisper_demo_node
```