https://github.com/l0sg/waveflow
A PyTorch implementation of "WaveFlow: A Compact Flow-based Model for Raw Audio" (ICML 2020)
https://github.com/l0sg/waveflow
normalizing-flows pytorch speech-synthesis waveflow
Last synced: 2 months ago
JSON representation
A PyTorch implementation of "WaveFlow: A Compact Flow-based Model for Raw Audio" (ICML 2020)
- Host: GitHub
- URL: https://github.com/l0sg/waveflow
- Owner: L0SG
- License: bsd-3-clause
- Created: 2019-10-07T05:41:39.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2024-07-25T10:46:30.000Z (over 1 year ago)
- Last Synced: 2025-04-03T18:23:17.636Z (9 months ago)
- Topics: normalizing-flows, pytorch, speech-synthesis, waveflow
- Language: Jupyter Notebook
- Homepage: https://arxiv.org/abs/1912.01219
- Size: 1.41 MB
- Stars: 123
- Watchers: 9
- Forks: 16
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## WaveFlow: A Compact Flow-based Model for Raw Audio
#### Update: Pretrained weights are now available. See links below.
This is an unofficial PyTorch implementation of [WaveFlow] (Ping et al, ICML 2020) model.
The aim for this repo is to provide easy-to-use PyTorch version of WaveFlow as a drop-in alternative to various neural vocoder models used with NVIDIA's [Tacotron2] audio processing backend.
Please refer to the [official implementation] written in PaddlePaddle for the official results.
## Setup
1. Clone this repo and install requirements
```command
git clone https://github.com/L0SG/WaveFlow.git
cd WaveFlow
pip install -r requirements.txt
```
2. Install [Apex] for mixed-precision training
## Train your model
1. Download [LJ Speech Data]. In this example it's in `data/`
2. Make a list of the file names to use for training/testing.
```command
ls data/*.wav | tail -n+10 > train_files.txt
ls data/*.wav | head -n10 > test_files.txt
```
`-n+10` and `-n10` indicates that this example reserves the first 10 audio clips for model testing.
3. Edit the configuration file and train the model.
Below are the example commands using `waveflow-h16-r64-bipartize.json`
```command
nano configs/waveflow-h16-r64-bipartize.json
python train.py -c configs/waveflow-h16-r64-bipartize.json
```
Single-node multi-GPU training is automatically enabled with [DataParallel] (instead of [DistributedDataParallel] for simplicity).
For mixed precision training, set `"fp16_run": true` on the configuration file.
You can load the trained weights from saved checkpoints by providing the path to `checkpoint_path` variable in the config file.
`checkpoint_path` accepts either explicit path, or the parent directory if resuming from averaged weights over multiple checkpoints.
### Examples
insert `checkpoint_path: "experiments/waveflow-h16-r64-bipartize/waveflow_5000"` in the config file then run
```command
python train.py -c configs/waveflow-h16-r64-bipartize.json
```
for loading averaged weights over 10 recent checkpoints, insert `checkpoint_path: "experiments/waveflow-h16-r64-bipartize"` in the config file then run
```command
python train.py -a 10 -c configs/waveflow-h16-r64-bipartize.json
```
you can reset the optimizer and training scheduler (and keep the weights) by providing `--warm_start`
```command
python train.py --warm_start -c configs/waveflow-h16-r64-bipartize.json
```
4. Synthesize waveform from the trained model.
insert `checkpoint_path` in the config file and use `--synthesize` to `train.py`. The model generates waveform by looping over `test_files.txt`.
```command
python train.py --synthesize -c configs/waveflow-h16-r64-bipartize.json
```
if `fp16_run: true`, the model uses FP16 (half-precision) arithmetic for faster performance (on GPUs equipped with Tensor Cores).
### Pretrained Weights
We provide pretrained weights via Google Drive. The models are trained for 5 M steps, then we averaged weights over 20 last checkpoints with `-a 20`. Audio quality almost matches the original paper.
| Models | Download |
|:-------------:|:-------------:|
| waveflow-h16-r64-bipartize |[Link](https://drive.google.com/file/d/1z402Lvb3D3no469NpC_7PkIHB8V140gj/view?usp=sharing) |
| waveflow-h16-r128-bipartize |[Link](https://drive.google.com/file/d/12tKPQMu79kr29oMloNLIl0I0l86SyPdX/view?usp=sharing) |
## Reference
NVIDIA Tacotron2: https://github.com/NVIDIA/waveglow
NVIDIA WaveGlow: https://github.com/NVIDIA/waveglow
r9y9 wavenet-vocoder: https://github.com/r9y9/wavenet_vocoder
FloWaveNet: https://github.com/ksw0306/FloWaveNet
Parakeet: https://github.com/PaddlePaddle/Parakeet
[Tacotron2]: https://github.com/NVIDIA/tacotron2
[DataParallel]: https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html
[DistributedDataParallel]: https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html
[WaveFlow]: https://arxiv.org/abs/1912.01219
[LJ Speech Data]: https://keithito.com/LJ-Speech-Dataset
[Apex]: https://github.com/nvidia/apex
[official implementation]: https://github.com/PaddlePaddle/Parakeet