https://github.com/mgonzs13/audio_common
A PortAudio based audio_common with text to speech for ROS 2
https://github.com/mgonzs13/audio_common
audio espeak pyaudio ros2 text-to-speech tts
Last synced: 9 months ago
JSON representation
A PortAudio based audio_common with text to speech for ROS 2
- Host: GitHub
- URL: https://github.com/mgonzs13/audio_common
- Owner: mgonzs13
- License: mit
- Created: 2021-11-19T23:51:45.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2025-06-28T16:05:40.000Z (12 months ago)
- Last Synced: 2025-09-05T05:45:36.282Z (10 months ago)
- Topics: audio, espeak, pyaudio, ros2, text-to-speech, tts
- Language: C++
- Homepage:
- Size: 2.36 MB
- Stars: 21
- Watchers: 2
- Forks: 14
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# audio_capture
This repositiory provides a set of ROS 2 packages for audio. It provides a C++ version to capture and play audio data using PortAudio.
[](https://opensource.org/license/mit) [](https://github.com/mgonzs13/audio_common/releases) [](https://github.com/mgonzs13/audio_common?branch=main) [](https://github.com/mgonzs13/audio_common/commits/main) [](https://github.com/mgonzs13/audio_common/issues) [](https://github.com/mgonzs13/audio_common/pulls) [](https://github.com/mgonzs13/audio_common/graphs/contributors) [](https://github.com/mgonzs13/audio_common/actions/workflows/cpp-formatter.yml?branch=main)
| ROS 2 Distro | Branch | Build status | Docker Image | Documentation |
| :----------: | :----------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------: | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Foxy** | [`main`](https://github.com/mgonzs13/audio_common/tree/main) | [](https://github.com/mgonzs13/audio_common/actions/workflows/foxy-docker-build.yml?branch=main) | [](https://hub.docker.com/r/mgons/audio_common/tags?name=foxy) | [](https://mgonzs13.github.io/audio_common/latest) |
| **Galactic** | [`main`](https://github.com/mgonzs13/audio_common/tree/main) | [](https://github.com/mgonzs13/audio_common/actions/workflows/galactic-docker-build.yml?branch=main) | [](https://hub.docker.com/r/mgons/audio_common/tags?name=galactic) | [](https://mgonzs13.github.io/audio_common/latest) |
| **Humble** | [`main`](https://github.com/mgonzs13/audio_common/tree/main) | [](https://github.com/mgonzs13/audio_common/actions/workflows/humble-docker-build.yml?branch=main) | [](https://hub.docker.com/r/mgons/audio_common/tags?name=humble) | [](https://mgonzs13.github.io/audio_common/latest) |
| **Iron** | [`main`](https://github.com/mgonzs13/audio_common/tree/main) | [](https://github.com/mgonzs13/audio_common/actions/workflows/iron-docker-build.yml?branch=main) | [](https://hub.docker.com/r/mgons/audio_common/tags?name=iron) | [](https://mgonzs13.github.io/audio_common/latest) |
| **Jazzy** | [`main`](https://github.com/mgonzs13/audio_common/tree/main) | [](https://github.com/mgonzs13/audio_common/actions/workflows/jazzy-docker-build.yml?branch=main) | [](https://hub.docker.com/r/mgons/audio_common/tags?name=jazzy) | [](https://mgonzs13.github.io/audio_common/latest) |
| **Kilted** | [`main`](https://github.com/mgonzs13/audio_common/tree/main) | [](https://github.com/mgonzs13/audio_common/actions/workflows/kilted-docker-build.yml?branch=main) | [](https://hub.docker.com/r/mgons/audio_common/tags?name=kilted) | [](https://mgonzs13.github.io/audio_common/latest) |
| **Rolling** | [`main`](https://github.com/mgonzs13/audio_common/tree/main) | [](https://github.com/mgonzs13/audio_common/actions/workflows/rolling-docker-build.yml?branch=main) | [](https://hub.docker.com/r/mgons/audio_common/tags?name=rolling) | [](https://mgonzs13.github.io/audio_common/latest) |
## Table of Contents
1. [Installation](#installation)
2. [Docker](#docker)
3. [Nodes](#nodes)
4. [Demos](#demos)
## Installation
```shell
cd ~/ros2_ws/src
git clone https://github.com/mgonzs13/audio_common.git
cd ~/ros2_ws
rosdep install --from-paths src --ignore-src -r -y
colcon build
```
## Docker
You can create a docker image to test audio_common. Use the following command inside the directory of audio_common.
```shell
docker build -t audio_common .
```
After the image is created, run a docker container with the following command.
```shell
docker run -it --rm --device /dev/snd audio_common
```
## Nodes
### audio_capturer_node
Node to obtain audio data from a microphone and publish it into the `audio` topic.
Click to expand
#### Parameters
- **format**: Specifies the audio format to be used for capturing. Possible values are:
- `1` (paFloat32 - 32-bit floating point)
- `2` (paInt32 - 32-bit integer)
- `8` (paInt16 - 16-bit integer)
- `16` (paInt8 - 8-bit integer)
- `32` (paUInt8 - 8-bit unsigned integer)
Default: `8` (paInt16)
The integer values correspond to PortAudio sample format flags.
- **channels**: The number of audio channels to capture. Typically, `1` for mono and `2` for stereo. Default: `1`
- **rate**: The sample rate that is how many samples per second should be captured. Default: `16000`
- **chunk**: The size of each audio frame. Default: `512`
- **device**: The ID of the audio input device. A value of `-1` indicates that the default audio input device should be used. Default: `-1`
- **frame_id**: An identifier for the audio frame. This can be useful for synchronizing audio data with other data streams. Default: `""`
#### ROS 2 Interfaces
- **audio**: Topic to publish the audio data captured from the microphone. Type: `audio_common_msgs/msg/AudioStamped`
### audio_player_node
Node to play the audio data obtained from the `audio` topic.
Click to expand
#### Parameters
- **channels**: The number of audio channels to play. Typically, `1` for mono and `2` for stereo. Default: `2`
- The node automatically handles conversion between mono and stereo formats if needed.
- **device**: The ID of the audio output device. A value of `-1` indicates that the default audio output device should be used. Default: `-1`
#### ROS 2 Interfaces
- **audio**: Topic subscriber to get the audio data to be played. Type: `audio_common_msgs/msg/AudioStamped`
### music_node
Node to play music from audio files in `wav` format.
Click to expand
#### Parameters
- **chunk**: The size of each audio frame. Default: `2048`
- **frame_id**: An identifier for the audio frame. This can be useful for synchronizing audio data with other data streams. Default: `""`
#### ROS 2 Interfaces
- **audio**: Topic to publish the audio data from the files. Type: `audio_common_msgs/msg/AudioStamped`
- **music_play**: Service to play audio files. Type: `audio_common_msgs/srv/MusicPlay`
- Parameters:
- `audio`: Name of a built-in audio sample (e.g., "elevator")
- `file_path`: Path to a custom WAV file (ignored if audio is specified)
- `loop`: Boolean to indicate if the audio should loop. Default: `false`
- **music_stop**: Service to stop the currently playing music. Type: `std_srvs/srv/Trigger`
- **music_pause**: Service to pause the currently playing music. Type: `std_srvs/srv/Trigger`
- **music_resume**: Service to resume paused music. Type: `std_srvs/srv/Trigger`
### tts_node
Node to generate audio from text (TTS) using espeak.
Click to expand
#### Parameters
- **chunk**: The size of each audio frame. Default: `4096`
- **frame_id**: An identifier for the audio frame. This can be useful for synchronizing audio data with other data streams. Default: `""`
#### ROS 2 Interfaces
- **audio**: Topic publisher to send the audio data generated by the TTS. Type: `audio_common_msgs/msg/AudioStamped`
- **say**: Action to generate audio data from a text. Type: `audio_common_msgs/action/TTS`
- Goal:
- `text`: The text to convert to speech
- `language`: The language to use for speech synthesis. Default: `"en"`
- `volume`: The volume of the generated speech (0.0-1.0). Default: `1.0`
- `rate`: The speech rate (1.0 is normal speed). Default: `1.0`
- Feedback:
- `audio`: The audio being currently played
- Result:
- `text`: The text that was converted to speech
## Demos
### Audio Capturer/Player
```shell
ros2 run audio_common audio_capturer_node
```
```shell
ros2 run audio_common audio_player_node
```
### TTS
```shell
ros2 run audio_common tts_node
```
```shell
ros2 run audio_common audio_player_node
```
```shell
ros2 action send_goal /say audio_common_msgs/action/TTS "{'text': 'Hello World'}"
```
Advanced TTS example with additional parameters:
```shell
ros2 action send_goal /say audio_common_msgs/action/TTS "{'text': 'Hello World', 'language': 'en-us', 'volume': 0.8, 'rate': 1.2}"
```
### Music Player
```shell
ros2 run audio_common music_node
```
```shell
ros2 run audio_common audio_player_node
```
Play a built-in sample:
```shell
ros2 service call /music_play audio_common_msgs/srv/MusicPlay "{audio: 'elevator'}"
```
Play a custom WAV file:
```shell
ros2 service call /music_play audio_common_msgs/srv/MusicPlay "{file_path: '/path/to/your/file.wav'}"
```
Play with looping enabled:
```shell
ros2 service call /music_play audio_common_msgs/srv/MusicPlay "{audio: 'elevator', loop: true}"
```
Control playback:
```shell
ros2 service call /music_pause std_srvs/srv/Trigger "{}"
ros2 service call /music_resume std_srvs/srv/Trigger "{}"
ros2 service call /music_stop std_srvs/srv/Trigger "{}"
```