https://github.com/thuhcsi/VAENAR-TTS

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.
https://github.com/thuhcsi/VAENAR-TTS

Last synced: 5 months ago
JSON representation

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

Host: GitHub
URL: https://github.com/thuhcsi/VAENAR-TTS
Owner: thuhcsi
License: mit
Created: 2021-06-15T23:27:24.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2021-07-08T01:43:31.000Z (almost 4 years ago)
Last Synced: 2024-08-09T13:19:35.534Z (9 months ago)
Language: Python
Size: 46.9 KB
Stars: 144
Watchers: 8
Forks: 20
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - thuhcsi/VAENAR-TTS

README

# VAENAR-TTS
This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis".

## [Samples](https://light1726.github.io/vaenar-tts/) | [Paper](https://arxiv.org/abs/2107.03298) | [Pretrained Models](https://drive.google.com/drive/folders/1Xe9v_XfMUgqC0ztIW0evsUq9SNx3GLlS?usp=sharing)

## Usage

### 0. Dataset
1. English: [LJSpeech](https://keithito.com/LJ-Speech-Dataset/)
2. Mandarin: [DataBaker(标贝)](https://www.data-baker.com/data/index/source/)

### 1. Environment setup
```bash
conda env create -f environment.yml
conda activate vaenartts-env
```

### 2. Data pre-processing

For English using LJSpeech:
```bash
CUDA_VISIBLE_DEVICES= python preprocess.py --dataset ljspeech --data_dir /path/to/extracted/LJSpeech-1.1 --save_dir ./ljspeech
```
For Mandarin using Databaker(标贝):
```bash
CUDA_VISIBLE_DEVICES= python preprocess.py --dataset databaker --data_dir /path/to/extracted/biaobei --save_dir ./databaker
```

### 3. Training
For English using LJSpeech:
```bash
CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset ljspeech --log_dir ./lj-log_dir --test_dir ./lj-test_dir --data_dir ./ljspeech/tfrecords/ --model_dir ./lj-model_dir
```
For Mandarin using Databaker(标贝):
```bash
CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset databaker --log_dir ./db-log_dir --test_dir ./db-test_dir --data_dir ./databaker/tfrecords/ --model_dir ./db-model_dir
```

### 4. Inference (synthesize speech for the whole test set)
For English using LJSpeech:
```bash
CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset ljspeech --test_dir ./lj-test-2000 --data_dir ./ljspeech/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./lj-model_dir/ckpt-2000
```
For Mandarin using Databaker(标贝):
```bash
CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./db-test-2000 --data_dir ./databaker/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./db-model_dir/ckpt-2000
```

## Reference
1. [XuezheMax/flowseq](https://github.com/XuezheMax/flowseq)
2. [keithito/tacotron](https://github.com/keithito/tacotron)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/thuhcsi/VAENAR-TTS

Awesome Lists containing this project

README