Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/seungwonpark/awesome-tts-samples

Awesome list of TTS papers with audio samples
https://github.com/seungwonpark/awesome-tts-samples

List: awesome-tts-samples

awesome tts

Last synced: 5 days ago
JSON representation

Awesome list of TTS papers with audio samples

Lists

README

        

# awesome-tts-samples

List of TTS papers **with audio samples** provided by the authors. The last rows of each paper show the spectrogram inversion (vocoder) being used.

For more comprehensive list of important TTS papers, I recommmend reading [xcmyz/speech-synthesis-paper](https://github.com/xcmyz/speech-synthesis-paper) written by Zhengxi Liu.

## 2020

- [FastPitch](https://arxiv.org/abs/2006.06873) - FastPitch: Parallel Text-to-speech with Pitch Prediction
- https://fastpitch.github.io/
- WaveGlow
- [EATS](https://arxiv.org/abs/2006.03575) - End-to-End Adversarial Text-to-Speech
- https://deepmind.com/research/publications/End-to-End-Adversarial-Text-to-Speech
- End-to-end model
- [Glow-TTS](https://arxiv.org/abs/2005.11129) - Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
- https://jaywalnut310.github.io/glow-tts-demo
- WaveGlow
- [Flowtron](https://arxiv.org/abs/2005.05957) - Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
- https://nv-adlr.github.io/Flowtron
- WaveGlow

## 2019
- [Tacotron2+DCA](https://arxiv.org/abs/1910.10288) - Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
- https://google.github.io/tacotron/publications/location_relative_attention
- WaveRNN
- [GAN-TTS](https://openreview.net/forum?id=r1gfQgSFDr) - High Fidelity Speech Synthesis with Adversarial Networks
- https://storage.googleapis.com/deepmind-media/research/abstract.wav
- End-to-end model (Built on top of 200Hz linguistic & log pitch features)
- [Multi-lingual Tacotron2](https://arxiv.org/abs/1907.04448) - Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
- https://google.github.io/tacotron/publications/multilingual
- WaveRNN
- [MelNet](https://arxiv.org/abs/1906.01083) - MelNet: A Generative Model for Audio in the Frequency Domain
- https://audio-samples.github.io
- https://sjvasquez.github.io/blog/melnet
- [Gradient-based spectrogram inversion](https://gist.github.com/carlthome/a4a8bf0f587da738c459d0d5a55695cd)
- [FastSpeech](https://arxiv.org/abs/1905.09263) - FastSpeech: Fast, Robust and Controllable Text to Speech
- https://speechresearch.github.io/fastspeech
- WaveGlow
- [ParaNet](https://arxiv.org/abs/1905.08459) - Parallel Neural Text-to-Speech
- https://parallel-neural-tts-demo.github.io
- WaveVAE, ClariNet, WaveNet

## 2018
- [Transformer-TTS](https://arxiv.org/abs/1809.08895) - Neural Speech Synthesis with Transformer Network
- https://neuraltts.github.io/transformertts
- WaveNet
- [Multi-speaker Tacotron2](https://arxiv.org/abs/1806.04558) - Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
- https://google.github.io/tacotron/publications/speaker_adaptation
- WaveNet
- [Tacotron2+GST](https://arxiv.org/abs/1803.09017) - Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
- https://google.github.io/tacotron/publications/global_style_tokens
- Griffin-Lim

## 2017
- [Tacotron2](https://arxiv.org/abs/1712.05884) - Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
- https://google.github.io/tacotron/publications/tacotron2
- WaveNet
- [Tacotron](https://arxiv.org/abs/1703.10135) - Tacotron: Towards End-to-End Speech Synthesis
- https://google.github.io/tacotron/publications/tacotron
- Griffin-Lim

# Contributing

TODO