Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/syoyo/tacotron-tts-cpp

Tacotron text to speech in C++(synthesize only)
https://github.com/syoyo/tacotron-tts-cpp

Last synced: 2 months ago
JSON representation

Tacotron text to speech in C++(synthesize only)

Host: GitHub
URL: https://github.com/syoyo/tacotron-tts-cpp
Owner: syoyo
License: mit
Created: 2018-10-04T16:01:11.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2019-10-17T05:20:01.000Z (about 5 years ago)
Last Synced: 2023-04-11T17:06:28.730Z (over 1 year ago)
Language: C++
Size: 25.3 MB
Stars: 71
Watchers: 9
Forks: 24
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Text-to-speech in (partially) C++ using Tacotron model + Tensorflow

Running Tacotron model in TensorFlow C++ API.

Its good for running TTS in mobile or embedded device.

Code is based on keithito's tacotron implementation: https://github.com/keithito/tacotron

## Status

Experimental.

Python preprocessing is required to generate sequence data from a text.

## Requirment

* TensorFlow r1.8+

* Ubuntu 16.04 or later

* C++ compiler + cmake

## Dump graph.

In keithito's tacotron repo, append `tf.train.write_graph` to `Synthesizer::load` to save TensorFlow graph.

```

class Synthesizer:

  def load(self, checkpoint_path, model_name='tacotron'):

    ...

    # write graph

    tf.train.write_graph(self.session.graph.as_graph_def(), "models/", "graph.pb")

```

## Freeze graph

Freeze graph for example:

```

freeze_graph \

        --input_graph=models/graph.pb \

        --input_checkpoint=./tacotron-20180906/model.ckpt \

        --output_graph=models/tacotron_frozen.pb \

        --output_node_names=model/griffinlim/Squeeze

```

Example freeze graph file is included in this repo.

## Build

Edit libtensorflow_cc.so path(Assume you build TensorFlow from source code) in `bootstrap.sh`, then

```

$ ./bootstrap.sh

$ build

$ make

```

### Note on libtensorflow_cc

Please make sure building libtensorflow_cc with `--config=monolithic`. Otherwise you'll face undefined symbols error at linking stage.

https://www.tensorflow.org/install/source#preconfigured_configurations

## Run

Prepare sequence JSON file.

Sequence can be generated by using `text_to_sequence()` function in keithito's tacotron repo.

See `sample/sequence01.json` for generated example.

Then,

```

$ ./tts -i ../sample/sequence01.json -g ../tacotron_frozen.pb output.wav

```

example output01.wav and processed01.wav is included in `sample/`

### Optional parameter

You can specify hyperparameter settings(JSON format) using `-h` option.

See `sample/hparams.json` for example.

```

$ ./tts -i ../sample/sequence01.json -h ../sample/hparams.json -g ../tacotron_frozen.pb output.wav

```

## Performance

Currently TensorFlow C++ code path only uses single CPU core, so its slow.

Time for synthesis is roughly 10x slower on 2018's CPU than synthesized audio length(e.g. 60 secs for 6 secs audio).

## TODO

* Write all TTS pipeline fully in C++

  * [ ] Text to sequence(Issue #1)

    * [ ] Convert to lower case

    * [ ] Expand abbreviation

    * [ ] Normalize numbers(number_to_words. python inflect equivalent)

    * [ ] Remove extra whitespace

  * [ ] Use CPU implementation of Griffin-Lim

## License

MIT license.

Pretrained model used for freezing graph is obtained from keithito's repo.

### Third party licenses

- json.hpp : MIT license

- cxxopts.hpp : MIT license

- dr_wav : Public domain