https://github.com/candlewill/aivoice

Deep CNN networks for Speech Synthesis
https://github.com/candlewill/aivoice

cnn deep-learning tts

Last synced: 9 months ago
JSON representation

Deep CNN networks for Speech Synthesis

Host: GitHub
URL: https://github.com/candlewill/aivoice
Owner: candlewill
Created: 2017-11-10T02:06:13.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2017-11-15T03:23:42.000Z (over 8 years ago)
Last Synced: 2025-03-30T05:41:16.481Z (over 1 year ago)
Topics: cnn, deep-learning, tts
Language: Python
Size: 29.3 KB
Stars: 49
Watchers: 6
Forks: 15
Open Issues: 3
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Deep Voice 3

This is a tensorflow implementation of [DEEP VOICE 3: 2000-SPEAKER NEURAL TEXT-TO-SPEECH](https://arxiv.org/pdf/1710.07654.pdf). For now, we are just focusing on single speaker synthesis.

## Requirement

* Tensorflow >= 1.2
* Python >= 3.0

## Dataset

[The LJ Speech Dataset](https://keithito.com/LJ-Speech-Dataset)

## Pre-process

Download and unzip the LJ Speech Dataset. Run:

```
python prepro.py
```

Note: Make sure that we have unzipped the dataset into the same foler of `prepro.py`.

After this, we would get three new folders:

```
├── dones [New]
├── mags [New]
├── mels [New]
├── metadata.csv
├── README
└── wavs
```

## Training

Training data is loaded from `./LJSpeech-1.0/metadata.csv`, `./LJSpeech-1.0/mels`, `./LJSpeech-1.0/dones`, `./LJSpeech-1.0/mags` as default. If we want to change the loading path, we could change the config in `class Hyperparams`.

To train the model, we use this command:

```
python train.py
```

## Pre-trained Model

Currently, we can not get good result. However, we still provide our pre-trained model in case someone is interested in it.

[Pre-trained Model](https://cnbj1.fds.api.xiaomi.com/tts/ExternalLink/Github/pre_trained_model.tar.gz).

Its attention figure is as follows:

![Image of attention](https://cnbj1.fds.api.xiaomi.com/tts/ExternalLink/Github/alignment.png)

All the attention figures generated at training are included in the pre-trained model zipped file.

## File Description

* hyperparams.py: hyper parameters
* prepro.py: creates inputs and targets, i.e., mel spectrogram, magnitude, and dones.
* data_load.py
* utils.py: several custom operational functions.
* modules.py: building blocks for the networks.
* networks.py: encoder, decoder, and converter
* train.py: train
* synthesize.py: inference
* test_sents.txt: some test sentences in the paper.

## Reference

Most of the code is borrowed from [Kyubyong/deepvoice3](https://github.com/Kyubyong/deepvoice3).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/candlewill/aivoice

Awesome Lists containing this project

README