Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bshall/universalvocoding
A PyTorch implementation of "Robust Universal Neural Vocoding"
https://github.com/bshall/universalvocoding
neural-vocoder pytorch speech-synthesis wavernn
Last synced: 3 days ago
JSON representation
A PyTorch implementation of "Robust Universal Neural Vocoding"
- Host: GitHub
- URL: https://github.com/bshall/universalvocoding
- Owner: bshall
- License: mit
- Created: 2019-04-18T08:37:39.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-11-14T09:30:15.000Z (about 4 years ago)
- Last Synced: 2025-01-08T03:19:47.092Z (10 days ago)
- Topics: neural-vocoder, pytorch, speech-synthesis, wavernn
- Language: Python
- Homepage: https://bshall.github.io/UniversalVocoding/
- Size: 5.62 MB
- Stars: 238
- Watchers: 14
- Forks: 41
- Open Issues: 15
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Towards Achieving Robust Universal Neural Vocoding
A PyTorch implementation of [Towards Achieving Robust Universal Neural Vocoding](https://arxiv.org/abs/1811.06292).
Audio samples can be found [here](https://bshall.github.io/UniversalVocoding/). Colab demo can be found [here]("https://colab.research.google.com/github/bshall/Tacotron/blob/main/tacotron-demo.ipynb"). Accompanying Tacotron implementation can be found [here](https://github.com/bshall/Tacotron)
Fig 1:Architecture of the vocoder.## Quick Start
Ensure you have Python 3.6 and PyTorch 1.7 or greater installed. Then install the package with:
```
pip install univoc
```## Example Usage
```python
import torch
import soundfile as sf
from univoc import Vocoder# download pretrained weights (and optionally move to GPU)
vocoder = Vocoder.from_pretrained(
"https://github.com/bshall/UniversalVocoding/releases/download/v0.2/univoc-ljspeech-7mtpaq.pt"
).cuda()# load log-Mel spectrogram from file or from tts (see https://github.com/bshall/Tacotron for example)
mel = ...# generate waveform
with torch.no_grad():
wav, sr = vocoder.generate(mel)# save output
sf.write("path/to/save.wav", wav, sr)
```## Train from Scratch
1. Clone the repo:
```
git clone https://github.com/bshall/UniversalVocoding
cd ./UniversalVocoding
```
2. Install requirements:
```
pip install -r requirements.txt
```
3. Download and extract the [LJ-Speech dataset](https://keithito.com/LJ-Speech-Dataset/):
```
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar -xvjf LJSpeech-1.1.tar.bz2
```
4. Download the train split [here](https://github.com/bshall/UniversalVocoding/releases/tag/v0.2) and extract it in the root directory of the repo.
5. Extract Mel spectrograms and preprocess audio:
```
python preprocess.py in_dir=path/to/LJSpeech-1.1 out_dir=datasets/LJSpeech-1.1
```
6. Train the model:
```
python train.py checkpoint_dir=ljspeech dataset_dir=datasets/LJSpeech-1.1
```## Pretrained Models
Pretrained weights for the 10-bit LJ-Speech model are available [here](https://github.com/bshall/UniversalVocoding/releases/tag/v0.2).
## Notable Differences from the Paper
1. Trained on 16kHz audio from a single speaker. For an older version trained on 102 different speakers form the [ZeroSpeech 2019: TTS without T](https://zerospeech.com/2019/) English dataset click [here](https://github.com/bshall/UniversalVocoding/releases/tag/v0.1).
2. Uses an embedding layer instead of one-hot encoding.### Acknowlegements
- https://github.com/fatchord/WaveRNN