Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fatchord/wavernn
WaveRNN Vocoder + TTS
https://github.com/fatchord/wavernn
neural-vocoder pytorch speech-synthesis tacotron text-to-speech tts wavernn
Last synced: 3 days ago
JSON representation
WaveRNN Vocoder + TTS
- Host: GitHub
- URL: https://github.com/fatchord/wavernn
- Owner: fatchord
- License: mit
- Created: 2018-03-16T14:03:52.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2022-07-02T14:21:35.000Z (over 2 years ago)
- Last Synced: 2024-12-21T03:03:05.049Z (3 days ago)
- Topics: neural-vocoder, pytorch, speech-synthesis, tacotron, text-to-speech, tts, wavernn
- Language: Python
- Homepage: https://fatchord.github.io/model_outputs/
- Size: 236 MB
- Stars: 2,144
- Watchers: 86
- Forks: 698
- Open Issues: 107
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# WaveRNN
##### (Update: Vanilla Tacotron One TTS system just implemented - more coming soon!)
![Tacotron with WaveRNN diagrams](assets/tacotron_wavernn.png)
Pytorch implementation of Deepmind's WaveRNN model from [Efficient Neural Audio Synthesis](https://arxiv.org/abs/1802.08435v1)
# Installation
Ensure you have:
* Python >= 3.6
* [Pytorch 1 with CUDA](https://pytorch.org/)Then install the rest with pip:
> pip install -r requirements.txt
# How to Use
### Quick Start
If you want to use TTS functionality immediately you can simply use:
> python quick_start.py
This will generate everything in the default sentences.txt file and output to a new 'quick_start' folder where you can playback the wav files and take a look at the attention plots
You can also use that script to generate custom tts sentences and/or use '-u' to generate unbatched (better audio quality):
> python quick_start.py -u --input_text "What will happen if I run this command?"
### Training your own Models
![Attenion and Mel Training GIF](assets/training_viz.gif)Download the [LJSpeech](https://keithito.com/LJ-Speech-Dataset/) Dataset.
Edit **hparams.py**, point **wav_path** to your dataset and run:
> python preprocess.py
or use preprocess.py --path to point directly to the dataset
___Here's my recommendation on what order to run things:
1 - Train Tacotron with:
> python train_tacotron.py
2 - You can leave that finish training or at any point you can use:
> python train_tacotron.py --force_gta
this will force tactron to create a GTA dataset even if it hasn't finish training.
3 - Train WaveRNN with:
> python train_wavernn.py --gta
NB: You can always just run train_wavernn.py without --gta if you're not interested in TTS.
4 - Generate Sentences with both models using:
> python gen_tacotron.py wavernn
this will generate default sentences. If you want generate custom sentences you can use
> python gen_tacotron.py --input_text "this is whatever you want it to be" wavernn
And finally, you can always use --help on any of those scripts to see what options are available :)
# Samples
[Can be found here.](https://fatchord.github.io/model_outputs/)
# Pretrained Models
Currently there are two pretrained models available in the /pretrained/ folder':
Both are trained on LJSpeech
* WaveRNN (Mixture of Logistics output) trained to 800k steps
* Tacotron trained to 180k steps____
### References
* [Efficient Neural Audio Synthesis](https://arxiv.org/abs/1802.08435v1)
* [Tacotron: Towards End-to-End Speech Synthesis](https://arxiv.org/abs/1703.10135)
* [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884)### Acknowlegements
* [https://github.com/keithito/tacotron](https://github.com/keithito/tacotron)
* [https://github.com/r9y9/wavenet_vocoder](https://github.com/r9y9/wavenet_vocoder)
* Special thanks to github users [G-Wang](https://github.com/G-Wang), [geneing](https://github.com/geneing) & [erogol](https://github.com/erogol)