Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/bshall/universalvocoding

A PyTorch implementation of "Robust Universal Neural Vocoding"
https://github.com/bshall/universalvocoding

neural-vocoder pytorch speech-synthesis wavernn

Last synced: 3 days ago
JSON representation

A PyTorch implementation of "Robust Universal Neural Vocoding"

Awesome Lists containing this project

README

        


Open In Colab

# Towards Achieving Robust Universal Neural Vocoding

A PyTorch implementation of [Towards Achieving Robust Universal Neural Vocoding](https://arxiv.org/abs/1811.06292).
Audio samples can be found [here](https://bshall.github.io/UniversalVocoding/). Colab demo can be found [here]("https://colab.research.google.com/github/bshall/Tacotron/blob/main/tacotron-demo.ipynb"). Accompanying Tacotron implementation can be found [here](https://github.com/bshall/Tacotron)


Architecture of the vocoder.

Fig 1:Architecture of the vocoder.

## Quick Start

Ensure you have Python 3.6 and PyTorch 1.7 or greater installed. Then install the package with:
```
pip install univoc
```

## Example Usage


Open In Colab

```python
import torch
import soundfile as sf
from univoc import Vocoder

# download pretrained weights (and optionally move to GPU)
vocoder = Vocoder.from_pretrained(
"https://github.com/bshall/UniversalVocoding/releases/download/v0.2/univoc-ljspeech-7mtpaq.pt"
).cuda()

# load log-Mel spectrogram from file or from tts (see https://github.com/bshall/Tacotron for example)
mel = ...

# generate waveform
with torch.no_grad():
wav, sr = vocoder.generate(mel)

# save output
sf.write("path/to/save.wav", wav, sr)
```

## Train from Scratch

1. Clone the repo:
```
git clone https://github.com/bshall/UniversalVocoding
cd ./UniversalVocoding
```
2. Install requirements:
```
pip install -r requirements.txt
```
3. Download and extract the [LJ-Speech dataset](https://keithito.com/LJ-Speech-Dataset/):
```
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar -xvjf LJSpeech-1.1.tar.bz2
```
4. Download the train split [here](https://github.com/bshall/UniversalVocoding/releases/tag/v0.2) and extract it in the root directory of the repo.
5. Extract Mel spectrograms and preprocess audio:
```
python preprocess.py in_dir=path/to/LJSpeech-1.1 out_dir=datasets/LJSpeech-1.1
```
6. Train the model:
```
python train.py checkpoint_dir=ljspeech dataset_dir=datasets/LJSpeech-1.1
```

## Pretrained Models

Pretrained weights for the 10-bit LJ-Speech model are available [here](https://github.com/bshall/UniversalVocoding/releases/tag/v0.2).

## Notable Differences from the Paper

1. Trained on 16kHz audio from a single speaker. For an older version trained on 102 different speakers form the [ZeroSpeech 2019: TTS without T](https://zerospeech.com/2019/) English dataset click [here](https://github.com/bshall/UniversalVocoding/releases/tag/v0.1).
2. Uses an embedding layer instead of one-hot encoding.

### Acknowlegements

- https://github.com/fatchord/WaveRNN