Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/syoyo/tacotron-tts-cpp

Tacotron text to speech in C++(synthesize only)
https://github.com/syoyo/tacotron-tts-cpp

Last synced: 2 months ago
JSON representation

Tacotron text to speech in C++(synthesize only)

Awesome Lists containing this project

README

        

# Text-to-speech in (partially) C++ using Tacotron model + Tensorflow

Running Tacotron model in TensorFlow C++ API.

Its good for running TTS in mobile or embedded device.

Code is based on keithito's tacotron implementation: https://github.com/keithito/tacotron

## Status

Experimental.

Python preprocessing is required to generate sequence data from a text.

## Requirment

* TensorFlow r1.8+
* Ubuntu 16.04 or later
* C++ compiler + cmake

## Dump graph.

In keithito's tacotron repo, append `tf.train.write_graph` to `Synthesizer::load` to save TensorFlow graph.

```
class Synthesizer:
def load(self, checkpoint_path, model_name='tacotron'):

...

# write graph
tf.train.write_graph(self.session.graph.as_graph_def(), "models/", "graph.pb")
```

## Freeze graph

Freeze graph for example:

```
freeze_graph \
--input_graph=models/graph.pb \
--input_checkpoint=./tacotron-20180906/model.ckpt \
--output_graph=models/tacotron_frozen.pb \
--output_node_names=model/griffinlim/Squeeze
```

Example freeze graph file is included in this repo.

## Build

Edit libtensorflow_cc.so path(Assume you build TensorFlow from source code) in `bootstrap.sh`, then

```
$ ./bootstrap.sh
$ build
$ make
```

### Note on libtensorflow_cc

Please make sure building libtensorflow_cc with `--config=monolithic`. Otherwise you'll face undefined symbols error at linking stage.

https://www.tensorflow.org/install/source#preconfigured_configurations

## Run

Prepare sequence JSON file.
Sequence can be generated by using `text_to_sequence()` function in keithito's tacotron repo.

See `sample/sequence01.json` for generated example.

Then,

```
$ ./tts -i ../sample/sequence01.json -g ../tacotron_frozen.pb output.wav
```

example output01.wav and processed01.wav is included in `sample/`

### Optional parameter

You can specify hyperparameter settings(JSON format) using `-h` option.
See `sample/hparams.json` for example.

```
$ ./tts -i ../sample/sequence01.json -h ../sample/hparams.json -g ../tacotron_frozen.pb output.wav
```

## Performance

Currently TensorFlow C++ code path only uses single CPU core, so its slow.
Time for synthesis is roughly 10x slower on 2018's CPU than synthesized audio length(e.g. 60 secs for 6 secs audio).

## TODO

* Write all TTS pipeline fully in C++
* [ ] Text to sequence(Issue #1)
* [ ] Convert to lower case
* [ ] Expand abbreviation
* [ ] Normalize numbers(number_to_words. python inflect equivalent)
* [ ] Remove extra whitespace
* [ ] Use CPU implementation of Griffin-Lim

## License

MIT license.

Pretrained model used for freezing graph is obtained from keithito's repo.

### Third party licenses

- json.hpp : MIT license
- cxxopts.hpp : MIT license
- dr_wav : Public domain