https://github.com/bshall/universalvocoding

A PyTorch implementation of "Robust Universal Neural Vocoding"
https://github.com/bshall/universalvocoding

neural-vocoder pytorch speech-synthesis wavernn

Last synced: 3 months ago
JSON representation

A PyTorch implementation of "Robust Universal Neural Vocoding"

Host: GitHub
URL: https://github.com/bshall/universalvocoding
Owner: bshall
License: mit
Created: 2019-04-18T08:37:39.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2020-11-14T09:30:15.000Z (over 4 years ago)
Last Synced: 2025-03-24T08:12:10.423Z (4 months ago)
Topics: neural-vocoder, pytorch, speech-synthesis, wavernn
Language: Python
Homepage: https://bshall.github.io/UniversalVocoding/
Size: 5.62 MB
Stars: 239
Watchers: 14
Forks: 40
Open Issues: 15
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        


    



# Towards Achieving Robust Universal Neural Vocoding

A PyTorch implementation of [Towards Achieving Robust Universal Neural Vocoding](https://arxiv.org/abs/1811.06292).

Audio samples can be found [here](https://bshall.github.io/UniversalVocoding/). Colab demo can be found [here]("https://colab.research.google.com/github/bshall/Tacotron/blob/main/tacotron-demo.ipynb"). Accompanying Tacotron implementation can be found [here](https://github.com/bshall/Tacotron)



    


    ^{Fig 1:Architecture of the vocoder.}



## Quick Start

Ensure you have Python 3.6 and PyTorch 1.7 or greater installed. Then install the package with:

```

pip install univoc

```

## Example Usage



    



```python

import torch

import soundfile as sf

from univoc import Vocoder

# download pretrained weights (and optionally move to GPU)

vocoder = Vocoder.from_pretrained(

    "https://github.com/bshall/UniversalVocoding/releases/download/v0.2/univoc-ljspeech-7mtpaq.pt"

).cuda()

# load log-Mel spectrogram from file or from tts (see https://github.com/bshall/Tacotron for example)

mel = ...

# generate waveform

with torch.no_grad():

    wav, sr = vocoder.generate(mel)

# save output

sf.write("path/to/save.wav", wav, sr)

```

## Train from Scratch

1. Clone the repo:

```

git clone https://github.com/bshall/UniversalVocoding

cd ./UniversalVocoding

```

2. Install requirements:

```

pip install -r requirements.txt

```

3. Download and extract the [LJ-Speech dataset](https://keithito.com/LJ-Speech-Dataset/):

```

wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2

tar -xvjf LJSpeech-1.1.tar.bz2

```

4. Download the train split [here](https://github.com/bshall/UniversalVocoding/releases/tag/v0.2) and extract it in the root directory of the repo. 

5. Extract Mel spectrograms and preprocess audio:

```

python preprocess.py in_dir=path/to/LJSpeech-1.1 out_dir=datasets/LJSpeech-1.1

```

6. Train the model:

```

python train.py checkpoint_dir=ljspeech dataset_dir=datasets/LJSpeech-1.1

```

## Pretrained Models

Pretrained weights for the 10-bit LJ-Speech model are available [here](https://github.com/bshall/UniversalVocoding/releases/tag/v0.2).

## Notable Differences from the Paper

1. Trained on 16kHz audio from a single speaker. For an older version trained on 102 different speakers form the [ZeroSpeech 2019: TTS without T](https://zerospeech.com/2019/) English dataset click [here](https://github.com/bshall/UniversalVocoding/releases/tag/v0.1).

2. Uses an embedding layer instead of one-hot encoding.

### Acknowlegements

- https://github.com/fatchord/WaveRNN

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bshall/universalvocoding

Awesome Lists containing this project

README