An open API service indexing awesome lists of open source software.

https://github.com/litee/tts-asr-corpora

Catalogue of TTS and ASR corpora that can be used for machine learning
https://github.com/litee/tts-asr-corpora

asr corpora corpus corpus-linguistics machine-learning text-to-speech tts

Last synced: 7 months ago
JSON representation

Catalogue of TTS and ASR corpora that can be used for machine learning

Awesome Lists containing this project

README

          

# Text-To-Speech and Automatic Speech Recognition Corpora

Corpus | Languages/Voices | Audio Length | Download Size | URL
--- | --- | --- | --- | ---
CMU Arctic | 4 clean US English, 3 accented | 1150 clips | 100-150 Mb per voice | [Here](http://festvox.org/cmu_arctic/)
CSTR VCTK | 109 English voices | 400 sentences per speaker | 10.5 Gb | [Here](http://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html)
LJSpeech | One English voice | 13,100 clips | 2.6 Gb | [Here](https://keithito.com/LJ-Speech-Dataset/)
TEDLIUM | Multiple English voices | 452 hours | 50 Gb | [Here](https://lium.univ-lemans.fr/en/ted-lium3/)
M-AILABS | German, US/UK English, Spanish, Italian, Russian, etc | 999 hours | 110 Gb | [Here](http://www.m-ailabs.bayern/en/the-mailabs-speech-dataset/)
LibriSpeech | Multiple English voices | 1000 hours | 60 Gb | [Here](http://www.openslr.org/12/)
The World English Bible | One English voice | ??? | 5 Gb | [Here](https://www.kaggle.com/bryanpark/the-world-english-bible-speech-dataset)