Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/double22a/speech_dataset

The dataset of Speech Recognition
https://github.com/double22a/speech_dataset

asr audio automatic-speech-recognition dataset deep-learning deep-neural-networks speech speech-diarization speech-enhancement speech-recognition speech-segmentation speech-separation speech-synthesis speech-to-text speech-translation text-to-speech tts voice-conversion wav

Last synced: 5 days ago
JSON representation

The dataset of Speech Recognition

Host: GitHub
URL: https://github.com/double22a/speech_dataset
Owner: double22a
License: apache-2.0
Created: 2021-04-07T06:48:08.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2023-03-07T03:56:11.000Z (over 1 year ago)
Last Synced: 2024-03-05T07:35:30.720Z (4 months ago)
Topics: asr, audio, automatic-speech-recognition, dataset, deep-learning, deep-neural-networks, speech, speech-diarization, speech-enhancement, speech-recognition, speech-segmentation, speech-separation, speech-synthesis, speech-to-text, speech-translation, text-to-speech, tts, voice-conversion, wav
Homepage:
Size: 62.5 KB
Stars: 333
Watchers: 10
Forks: 66
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Lists

awesome-ai-list-guide - speech_dataset

README

        ## The Dataset of Speech Recognition

**Chinese**

| name | duration/h | address | remark | application 

| --- | --- | --- | --- | ---

| THCHS-30 | 30 | https://openslr.org/18/ |

| Aishell | 150 | https://openslr.org/33/ |

| ST-CMDS | 110 | https://openslr.org/38/ |

| Primewords | 99 | https://openslr.org/47/ |

| aidatatang | 200 | https://openslr.org/62/ |

| MagicData | 755 | https://openslr.org/68/ |

| ASR&SD | 160 | http://ncmmsc2021.org/competition2.html | if available

| Aishell2 | 1000 | http://www.aishelltech.com/aishell_2 | if available

| TAL ASR | 100 | https://ai.100tal.com/dataset |

| Common Voice | 63 | https://commonvoice.mozilla.org/zh-CN/datasets | Common Voice Corpus 7.0 

| ASRU2019 ASR | 500 | https://www.datatang.com/competition | if available

| 2021 SLT CSRC | 398 | https://www.data-baker.com/csrc_challenge.html | if available

| aidatatang_1505zh | 1505 | https://datatang.com/opensource | if available

| WenetSpeech | 10000 | https://github.com/wenet-e2e/WenetSpeech | 

| KeSpeech | 1542 | https://openreview.net/forum?id=b3Zoeq2sCLq |  | speech recognition, speaker verification, subdialect identification, voice conversion

| MagicData-RAMC | 180 | https://arxiv.org/pdf/2203.16844.pdf | conversational speech data recorded from native speakers of Mandarin Chinese |

| Mandarin Heavy Accent Conversational Speech Corpus | 58.78 | https://magichub.com/datasets/mandarin-heavy-accent-conversational-speech-corpus/ |

| Free ST Chinese Mandarin Corpus | - | https://openslr.org/38/ |

**English**

| name | duration/h | address | remark

| --- | --- | --- | ---

| Common Voice | 2015 | https://commonvoice.mozilla.org/zh-CN/datasets | Common Voice Corpus 7.0 

| LibriSpeech | 960 | https://openslr.org/12/ | 

| ST-AEDS-20180100 | 4.7 | http://www.openslr.org/45/ |

| TED-LIUM Release 3 | 430 | https://openslr.org/51/ |

| Multilingual LibriSpeech | 44659 | https://openslr.org/94/ | limited supervision

| SPGISpeech | 5000 | https://datasets.kensho.com/datasets/scribe | if available

| Speech Commands | 10 | https://www.kaggle.com/c/tensorflow-speech-recognition-challenge/data | 

| 2020AESRC | 160 | https://datatang.com/INTERSPEECH2020 | if available

| GigaSpeech | 10000 | https://github.com/SpeechColab/GigaSpeech | 

| The People’s Speech | 31400 | https://openreview.net/pdf?id=R8CwidgJ0yT |

| Earnings-21 | 39 | https://arxiv.org/abs/2104.11348 | 

| VoxPopuli | 24100+543 | https://arxiv.org/pdf/2101.00390.pdf | 24100(unlabeled), 543(transcribed)

| CMU Wilderness Multilingual Speech Dataset | 13 | http://festvox.org/cmu_wilderness/ | Multilingual

**Chinese-English**

| name | duration/h | address | remark

| --- | --- | --- | --- 

| SEAME | 30 | https://www.isca-speech.org/archive_v0/archive_papers/interspeech_2010/i10_1986.pdf |

| TAL CSASR | 587 | https://ai.100tal.com/dataset |

| ASRU2019 CSASR | 200 | https://www.datatang.com/competition | if available

| ASCEND | 10.62 | https://arxiv.org/pdf/2112.06223.pdf |

**Japanese (ja-JP)**

| name | duration/h | address | remark

| --- | --- | --- | ---

| Common Voice | 26 | https://commonvoice.mozilla.org/zh-CN/datasets | Common Voice Corpus 7.0 

| Japanese_Scripted_Speech_Corpus_Daily_Use_Sentence | 18 | https://magichub.io/cn/datasets/japanese-scripted-speech-corpus-daily-use-sentence/ | 

| LaboroTVSpeech | 2000 | https://arxiv.org/pdf/2103.14736.pdf | 

| CSJ | 650 | https://github.com/kaldi-asr/kaldi/tree/master/egs/csj |

| JTubeSpeech | 1300 | https://arxiv.org/pdf/2112.09323.pdf

**Korean (ko-KR)**

| name | duration/h | address | remark

| --- | --- | --- | ---

| korean-scripted-speech-corpus-daily-use-sentence | 4.3 | https://magichub.io/cn/datasets/korean-scripted-speech-corpus-daily-use-sentence/ | 

| korean-conversational-speech-corpus | 5.22 | https://magichub.io/cn/datasets/korean-conversational-speech-corpus/ |

**Russian (ru-RU)**

| name | duration/h | address | remark

| --- | --- | --- | ---

| Common Voice | 148 | https://commonvoice.mozilla.org/zh-CN/datasets | Common Voice Corpus 7.0 

| OpenSTT | 20000 | https://arxiv.org/pdf/2006.08274.pdf | limited supervision

**French (fr-Fr)**

| name | duration/h | address | remark

| --- | --- | --- | ---

| MediaSpeech | 10 | https://arxiv.org/pdf/2103.16193.pdf | ASR system evaluation dataset

**Spanish (es-ES)**

| name | duration/h | address | remark

| --- | --- | --- | ---

| MediaSpeech | 10 | https://arxiv.org/pdf/2103.16193.pdf | ASR system evaluation dataset

**Turkish (tr-TR)**

| name | duration/h | address | remark

| --- | --- | --- | ---

| MediaSpeech | 10 | https://arxiv.org/pdf/2103.16193.pdf | ASR system evaluation dataset

**Arabic (ar)**

| name | duration/h | address | remark

| --- | --- | --- | ---

| MediaSpeech | 10 | https://arxiv.org/pdf/2103.16193.pdf | ASR system evaluation dataset

**noise & nonspeech**

| name | duration/h | address | remark

| --- | --- | --- | ---

| MUSAN | - | https://openslr.org/17/ |

| Room Impulse Response and Noise Database | - | https://openslr.org/28/ | 

| AudioSet | - | https://ieeexplore.ieee.org/document/7952261 |

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

## The Dataset of Speech Synthesis

**Chinese**

| name | duration/h | address | remark

| --- | --- | --- | ---

| Aishell3 | 85 | https://openslr.org/93/ | 

| Opencpop | - | https://wenet.org.cn/opencpop/download/ | Singing Voice Synthesis

**English**

| name | duration/h | address | remark

| --- | --- | --- | ---

| Hi-Fi Multi-Speaker English TTS Dataset | 291.6 | https://openslr.org/109/ | 

| LibriTTS corpus | 585 | https://openslr.org/60/ | 

| Speechocean762 | - | https://www.openslr.org/101/ | 

| RyanSpeech | 10 | http://mohammadmahoor.com/ryanspeech/ |

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

## The Dataset of Speech Recognition & Speaker Diarization

**Chinese**

| name | duration/h | address | remark | application

| --- | --- | --- | --- | ---

| Aishell4 | 120 | https://openslr.org/111/ | 8-channel, conference scenarios | speech recognition, speaker diarization

| ASR&SD | 160 | http://ncmmsc2021.org/competition2.html | if available | speech recognition, speaker diarization

| zhijiangcup | - | https://zhijiangcup.zhejianglab.com/zhijiang/match/details/id/6.html | if available | speech recognition, speaker diarization

| M2MET | 120 | https://arxiv.org/pdf/2110.07393.pdf | 8-channel, conference scenarios | speech recognition, speaker diarization

**English**

| name | duration/h | address | remark | application

| --- | --- | --- | --- | ---

| CHiME-6 | - | https://chimechallenge.github.io/chime6/download.html | if available | speech recognition, speaker diarization

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

## The Dataset of Speaker Recognition

**Chinese**

| name | duration/h | address | remark | application

| --- | --- | --- | --- | --- 

| CN-Celeb | - | https://openslr.org/82/ |

| KeSpeech | 1542 | https://openreview.net/forum?id=b3Zoeq2sCLq |  | speech recognition, speaker verification, subdialect identification, voice conversion

| MTASS | 55.6 | https://github.com/Windstudent/Complex-MTASSNet |  |

| THCHS-30 | 40 | http://www.openslr.org/18/ |  |

**English**

| name | duration/h | address | remark

| --- | --- | --- | ---

| VoxCeleb Data | - | http://www.robots.ox.ac.uk/~vgg/data/voxceleb/ |