https://github.com/rhasspy/piper-sample-generator
Generate samples using Piper to train wake word models
https://github.com/rhasspy/piper-sample-generator
Last synced: 9 days ago
JSON representation
Generate samples using Piper to train wake word models
- Host: GitHub
- URL: https://github.com/rhasspy/piper-sample-generator
- Owner: rhasspy
- License: mit
- Created: 2023-05-17T17:03:32.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2024-03-04T17:40:42.000Z (about 1 year ago)
- Last Synced: 2025-03-28T08:11:53.265Z (27 days ago)
- Language: Python
- Size: 1.02 MB
- Stars: 28
- Watchers: 6
- Forks: 26
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Piper Sample Generator
Generates samples using [Piper](https://github.com/rhasspy/piper/) for training a wake word system like [openWakeWord](https://github.com/dscripka/openWakeWord).
Available models:
* [English](https://github.com/rhasspy/piper-sample-generator/releases/download/v2.0.0/en_US-libritts_r-medium.pt)
* [French](https://github.com/rhasspy/piper-sample-generator/releases/download/v2.0.0/fr_FR-mls-medium.pt)
* [German](https://github.com/rhasspy/piper-sample-generator/releases/download/v2.0.0/de_DE-mls-medium.pt)
* [Dutch](https://github.com/rhasspy/piper-sample-generator/releases/download/v2.0.0/nl_NL-mls-medium.pt)## Install
Create a virtual environment and install the requirements:
``` sh
git clone https://github.com/rhasspy/piper-sample-generator.git
cd piper-sample-generator/python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install -r requirements.txt
```Download the LibriTTS-R generator (exported from [checkpoint](https://huggingface.co/datasets/rhasspy/piper-checkpoints/tree/main/en/en_US/libritts_r/medium)):
``` sh
wget -O models/en-us-libritts-high.pt 'https://github.com/rhasspy/piper-sample-generator/releases/download/v2.0.0/en_US-libritts_r-medium.pt'
```See links above for models for other languages.
## Run
Generate a small set of samples with the CLI:
``` sh
python3 generate_samples.py 'okay, piper.' --max-samples 10 --output-dir okay_piper/
```Check the `okay_piper/` directory for 10 WAV files (named `0.wav` to `9.wav`).
Generation can be much faster and more efficient if you have a GPU available and PyTorch is configured to use it. In this case, increase the batch size:
``` sh
python3 generate_samples.py 'okay, piper.' --max-samples 100 --batch-size 10 --output-dir okay_piper/
```On an NVidia 2080 Ti with 11GB, a batch size of 100 was possible (generating approximately 100 samples per second).
Setting `--max-speakers` to a value less than 904 (the number of speakers LibriTTS) is recommended. Because very few samples of later speakers were in the original dataset, using them can cause audio artifacts.
See `--help` for more options, including adjust the `--length-scales` (speaking speeds) and `--slerp-weights` (speaker blending) which are cycled per batch.
Alternatively, you can import the generate function into another Python script:
```python
from generate_samples import generate_samples # make sure to add this to your Python path as neededgenerate_samples(text = ["okay, piper"], max_samples = 100, output_dir = output_dir, batch_size=10)
```There are some additional arguments available when importing the function directly, see the docstring of `generate_sample` for more information.
### Augmentation
Once you have samples generating, you can augment them using [audiomentation](https://iver56.github.io/audiomentations/):
``` sh
python3 augment.py --sample-rate 16000 okay_piper/ okay_piper_augmented/
```This will do several things to each sample:
1. Randomly decrease the volume
* The original samples are normalized, so different volume levels are needed
2. Randomly [apply an impulse response](https://iver56.github.io/audiomentations/waveform_transforms/apply_impulse_response/) using the files in `impulses/`
* Change the acoustics of the sample to sound like the speaker was in a room with echo or using a poor quality microphone
3. Resample to 16Khz for training (e.g., [openWakeWord](https://github.com/dscripka/openWakeWord))## Short Phrases
Models that were trained on audio books tend to perform poorly when speaking short phrases or single words.
The French, German, and Dutch models trained from the [MLS](http://openslr.org/94/) have this problem.The problem can be mitigated by repeating the phrase over and over, and then clipping out a single sample.
To do this automatically, follow these steps:1. Ensure your short phrase ends with a comma (`,`)
2. Lower the noise settings with `--noise-scales 0.333` and `--noise-scale-ws 0.333`
3. Use `--min-phoneme-count 300` (the value 300 was determined empirically and may be less for some models)For example:
``` sh
python3 generate_samples.py \
'framboise,' \
--model models/fr_FR-mls-medium.pt \
--noise-scales 0.333 \
--noise-scale-ws 0.333 \
--min-phoneme-count 300
--max-samples 1 \
--output-dir .
```