Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/thuhcsi/VAENAR-TTS
The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.
https://github.com/thuhcsi/VAENAR-TTS
Last synced: 3 months ago
JSON representation
The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.
- Host: GitHub
- URL: https://github.com/thuhcsi/VAENAR-TTS
- Owner: thuhcsi
- License: mit
- Created: 2021-06-15T23:27:24.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-07-08T01:43:31.000Z (over 3 years ago)
- Last Synced: 2024-06-29T07:47:15.046Z (4 months ago)
- Language: Python
- Size: 46.9 KB
- Stars: 144
- Watchers: 8
- Forks: 20
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - thuhcsi/VAENAR-TTS
README
# VAENAR-TTS
This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis".## [Samples](https://light1726.github.io/vaenar-tts/) | [Paper](https://arxiv.org/abs/2107.03298) | [Pretrained Models](https://drive.google.com/drive/folders/1Xe9v_XfMUgqC0ztIW0evsUq9SNx3GLlS?usp=sharing)
## Usage
### 0. Dataset
1. English: [LJSpeech](https://keithito.com/LJ-Speech-Dataset/)
2. Mandarin: [DataBaker(标贝)](https://www.data-baker.com/data/index/source/)### 1. Environment setup
```bash
conda env create -f environment.yml
conda activate vaenartts-env
```### 2. Data pre-processing
For English using LJSpeech:
```bash
CUDA_VISIBLE_DEVICES= python preprocess.py --dataset ljspeech --data_dir /path/to/extracted/LJSpeech-1.1 --save_dir ./ljspeech
```
For Mandarin using Databaker(标贝):
```bash
CUDA_VISIBLE_DEVICES= python preprocess.py --dataset databaker --data_dir /path/to/extracted/biaobei --save_dir ./databaker
```### 3. Training
For English using LJSpeech:
```bash
CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset ljspeech --log_dir ./lj-log_dir --test_dir ./lj-test_dir --data_dir ./ljspeech/tfrecords/ --model_dir ./lj-model_dir
```
For Mandarin using Databaker(标贝):
```bash
CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset databaker --log_dir ./db-log_dir --test_dir ./db-test_dir --data_dir ./databaker/tfrecords/ --model_dir ./db-model_dir
```### 4. Inference (synthesize speech for the whole test set)
For English using LJSpeech:
```bash
CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset ljspeech --test_dir ./lj-test-2000 --data_dir ./ljspeech/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./lj-model_dir/ckpt-2000
```
For Mandarin using Databaker(标贝):
```bash
CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./db-test-2000 --data_dir ./databaker/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./db-model_dir/ckpt-2000
```## Reference
1. [XuezheMax/flowseq](https://github.com/XuezheMax/flowseq)
2. [keithito/tacotron](https://github.com/keithito/tacotron)