https://github.com/litee/tts-asr-corpora
Catalogue of TTS and ASR corpora that can be used for machine learning
https://github.com/litee/tts-asr-corpora
asr corpora corpus corpus-linguistics machine-learning text-to-speech tts
Last synced: 7 months ago
JSON representation
Catalogue of TTS and ASR corpora that can be used for machine learning
- Host: GitHub
- URL: https://github.com/litee/tts-asr-corpora
- Owner: Litee
- Created: 2018-07-28T20:49:41.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2018-07-28T21:34:54.000Z (about 7 years ago)
- Last Synced: 2025-01-22T07:37:16.217Z (9 months ago)
- Topics: asr, corpora, corpus, corpus-linguistics, machine-learning, text-to-speech, tts
- Homepage:
- Size: 1.95 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Text-To-Speech and Automatic Speech Recognition Corpora
Corpus | Languages/Voices | Audio Length | Download Size | URL
--- | --- | --- | --- | ---
CMU Arctic | 4 clean US English, 3 accented | 1150 clips | 100-150 Mb per voice | [Here](http://festvox.org/cmu_arctic/)
CSTR VCTK | 109 English voices | 400 sentences per speaker | 10.5 Gb | [Here](http://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html)
LJSpeech | One English voice | 13,100 clips | 2.6 Gb | [Here](https://keithito.com/LJ-Speech-Dataset/)
TEDLIUM | Multiple English voices | 452 hours | 50 Gb | [Here](https://lium.univ-lemans.fr/en/ted-lium3/)
M-AILABS | German, US/UK English, Spanish, Italian, Russian, etc | 999 hours | 110 Gb | [Here](http://www.m-ailabs.bayern/en/the-mailabs-speech-dataset/)
LibriSpeech | Multiple English voices | 1000 hours | 60 Gb | [Here](http://www.openslr.org/12/)
The World English Bible | One English voice | ??? | 5 Gb | [Here](https://www.kaggle.com/bryanpark/the-world-english-bible-speech-dataset)