Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/seungwonpark/melgan
MelGAN vocoder (compatible with NVIDIA/tacotron2)
https://github.com/seungwonpark/melgan
gan neural-vocoder pytorch tts
Last synced: 7 days ago
JSON representation
MelGAN vocoder (compatible with NVIDIA/tacotron2)
- Host: GitHub
- URL: https://github.com/seungwonpark/melgan
- Owner: seungwonpark
- License: bsd-3-clause
- Created: 2019-10-17T11:32:20.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2020-10-03T15:08:45.000Z (over 4 years ago)
- Last Synced: 2024-12-28T22:14:33.335Z (14 days ago)
- Topics: gan, neural-vocoder, pytorch, tts
- Language: Python
- Homepage: http://swpark.me/melgan/
- Size: 17.6 MB
- Stars: 639
- Watchers: 29
- Forks: 116
- Open Issues: 34
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-colab-project - MelGAN
README
# MelGAN
Unofficial PyTorch implementation of [MelGAN vocoder](https://arxiv.org/abs/1910.06711)## Key Features
- MelGAN is lighter, faster, and better at generalizing to unseen speakers than [WaveGlow](https://github.com/NVIDIA/waveglow).
- This repository use identical mel-spectrogram function from [NVIDIA/tacotron2](https://github.com/NVIDIA/tacotron2), so this can be directly used to convert output from NVIDIA's tacotron2 into raw-audio.
- Pretrained model on LJSpeech-1.1 via [PyTorch Hub](https://pytorch.org/hub).![](./assets/gd.png)
## Prerequisites
Tested on Python 3.6
```bash
pip install -r requirements.txt
```## Prepare Dataset
- Download dataset for training. This can be any wav files with sample rate 22050Hz. (e.g. LJSpeech was used in paper)
- preprocess: `python preprocess.py -c config/default.yaml -d [data's root path]`
- Edit configuration `yaml` file## Train & Tensorboard
- `python trainer.py -c [config yaml file] -n [name of the run]`
- `cp config/default.yaml config/config.yaml` and then edit `config.yaml`
- Write down the root path of train/validation files to 2nd/3rd line.
- Each path should contain pairs of `*.wav` with corresponding (preprocessed) `*.mel` file.
- The data loader parses list of files within the path recursively.
- `tensorboard --logdir logs/`## Pretrained model
Try with Google Colab: TODO
```python
import torch
vocoder = torch.hub.load('seungwonpark/melgan', 'melgan')
vocoder.eval()
mel = torch.randn(1, 80, 234) # use your own mel-spectrogram hereif torch.cuda.is_available():
vocoder = vocoder.cuda()
mel = mel.cuda()with torch.no_grad():
audio = vocoder.inference(mel)
```## Inference
- `python inference.py -p [checkpoint path] -i [input mel path]`
## Results
See audio samples at: http://swpark.me/melgan/.
Model was trained at V100 GPU for 14 days using LJSpeech-1.1.![](./assets/lj-tensorboard-v0.3-alpha.png)
## Implementation Authors
- [Seungwon Park](http://swpark.me) @ MINDsLab Inc. ([email protected], [email protected])
- Myunchul Joe @ MINDsLab Inc.
- [Rishikesh](https://github.com/rishikksh20) @ DeepSync Technologies Pvt Ltd.## License
BSD 3-Clause License.
- [utils/stft.py](./utils/stft.py) by Prem Seetharaman (BSD 3-Clause License)
- [datasets/mel2samp.py](./datasets/mel2samp.py) from https://github.com/NVIDIA/waveglow (BSD 3-Clause License)
- [utils/hparams.py](./utils/hparams.py) from https://github.com/HarryVolek/PyTorch_Speaker_Verification (No License specified)## Useful resources
- [How to Train a GAN? Tips and tricks to make GANs work](https://github.com/soumith/ganhacks) by Soumith Chintala
- [Official MelGAN implementation by original authors](https://github.com/descriptinc/melgan-neurips)
- [Reproduction of MelGAN - NeurIPS 2019 Reproducibility Challenge (Ablation Track)](https://openreview.net/pdf?id=9jTbNbBNw0) by Yifei Zhao, Yichao Yang, and Yang Gao
- "replacing the average pooling layer with max pooling layer and replacing reflection padding with replication padding improves the performance significantly, while combining them produces worse results"