Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/thuhcsi/VAENAR-TTS

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.
https://github.com/thuhcsi/VAENAR-TTS

Last synced: 3 months ago
JSON representation

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

Awesome Lists containing this project

README

        

# VAENAR-TTS
This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis".

## [Samples](https://light1726.github.io/vaenar-tts/) | [Paper](https://arxiv.org/abs/2107.03298) | [Pretrained Models](https://drive.google.com/drive/folders/1Xe9v_XfMUgqC0ztIW0evsUq9SNx3GLlS?usp=sharing)

## Usage

### 0. Dataset
1. English: [LJSpeech](https://keithito.com/LJ-Speech-Dataset/)
2. Mandarin: [DataBaker(标贝)](https://www.data-baker.com/data/index/source/)

### 1. Environment setup
```bash
conda env create -f environment.yml
conda activate vaenartts-env
```

### 2. Data pre-processing

For English using LJSpeech:
```bash
CUDA_VISIBLE_DEVICES= python preprocess.py --dataset ljspeech --data_dir /path/to/extracted/LJSpeech-1.1 --save_dir ./ljspeech
```
For Mandarin using Databaker(标贝):
```bash
CUDA_VISIBLE_DEVICES= python preprocess.py --dataset databaker --data_dir /path/to/extracted/biaobei --save_dir ./databaker
```

### 3. Training
For English using LJSpeech:
```bash
CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset ljspeech --log_dir ./lj-log_dir --test_dir ./lj-test_dir --data_dir ./ljspeech/tfrecords/ --model_dir ./lj-model_dir
```
For Mandarin using Databaker(标贝):
```bash
CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset databaker --log_dir ./db-log_dir --test_dir ./db-test_dir --data_dir ./databaker/tfrecords/ --model_dir ./db-model_dir
```

### 4. Inference (synthesize speech for the whole test set)
For English using LJSpeech:
```bash
CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset ljspeech --test_dir ./lj-test-2000 --data_dir ./ljspeech/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./lj-model_dir/ckpt-2000
```
For Mandarin using Databaker(标贝):
```bash
CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./db-test-2000 --data_dir ./databaker/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./db-model_dir/ckpt-2000
```

## Reference
1. [XuezheMax/flowseq](https://github.com/XuezheMax/flowseq)
2. [keithito/tacotron](https://github.com/keithito/tacotron)