https://github.com/litee/tts-asr-corpora

Catalogue of TTS and ASR corpora that can be used for machine learning
https://github.com/litee/tts-asr-corpora

asr corpora corpus corpus-linguistics machine-learning text-to-speech tts

Last synced: 5 months ago
JSON representation

Catalogue of TTS and ASR corpora that can be used for machine learning

Host: GitHub
URL: https://github.com/litee/tts-asr-corpora
Owner: Litee
Created: 2018-07-28T20:49:41.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2018-07-28T21:34:54.000Z (almost 8 years ago)
Last Synced: 2025-01-22T07:37:16.217Z (over 1 year ago)
Topics: asr, corpora, corpus, corpus-linguistics, machine-learning, text-to-speech, tts
Homepage:
Size: 1.95 KB
Stars: 4
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Text-To-Speech and Automatic Speech Recognition Corpora

Corpus | Languages/Voices | Audio Length | Download Size | URL

--- | --- | --- | --- | ---

CMU Arctic | 4 clean US English, 3 accented | 1150 clips | 100-150 Mb per voice | [Here](http://festvox.org/cmu_arctic/)

CSTR VCTK | 109 English voices | 400 sentences per speaker | 10.5 Gb | [Here](http://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html)

LJSpeech | One English voice | 13,100 clips | 2.6 Gb | [Here](https://keithito.com/LJ-Speech-Dataset/)

TEDLIUM | Multiple English voices | 452 hours | 50 Gb | [Here](https://lium.univ-lemans.fr/en/ted-lium3/)

M-AILABS | German, US/UK English, Spanish, Italian, Russian, etc | 999 hours | 110 Gb | [Here](http://www.m-ailabs.bayern/en/the-mailabs-speech-dataset/)

LibriSpeech | Multiple English voices | 1000 hours | 60 Gb | [Here](http://www.openslr.org/12/)

The World English Bible | One English voice | ??? | 5 Gb | [Here](https://www.kaggle.com/bryanpark/the-world-english-bible-speech-dataset)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/litee/tts-asr-corpora

Awesome Lists containing this project

README