{"id":13704390,"url":"https://github.com/double22a/speech_dataset","last_synced_at":"2025-05-05T09:33:48.050Z","repository":{"id":37719449,"uuid":"355441065","full_name":"double22a/speech_dataset","owner":"double22a","description":"The dataset of Speech Recognition","archived":false,"fork":false,"pushed_at":"2024-12-26T01:58:39.000Z","size":72,"stargazers_count":392,"open_issues_count":1,"forks_count":76,"subscribers_count":9,"default_branch":"main","last_synced_at":"2024-12-26T02:39:20.717Z","etag":null,"topics":["asr","audio","automatic-speech-recognition","dataset","deep-learning","deep-neural-networks","speech","speech-diarization","speech-enhancement","speech-recognition","speech-segmentation","speech-separation","speech-synthesis","speech-to-text","speech-translation","text-to-speech","tts","voice-conversion","wav"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/double22a.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-04-07T06:48:08.000Z","updated_at":"2024-12-26T01:58:43.000Z","dependencies_parsed_at":"2024-01-14T20:49:18.858Z","dependency_job_id":"75c73de4-5011-4192-8b3d-577988e22e00","html_url":"https://github.com/double22a/speech_dataset","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/double22a%2Fspeech_dataset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/double22a%2Fspeech_dataset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/double22a%2Fspeech_dataset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/double22a%2Fspeech_dataset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/double22a","download_url":"https://codeload.github.com/double22a/speech_dataset/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252471724,"owners_count":21753239,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","audio","automatic-speech-recognition","dataset","deep-learning","deep-neural-networks","speech","speech-diarization","speech-enhancement","speech-recognition","speech-segmentation","speech-separation","speech-synthesis","speech-to-text","speech-translation","text-to-speech","tts","voice-conversion","wav"],"created_at":"2024-08-02T21:01:08.711Z","updated_at":"2025-05-05T09:33:48.032Z","avatar_url":"https://github.com/double22a.png","language":null,"funding_links":[],"categories":["Speech"],"sub_categories":[],"readme":"## The Dataset of Speech Recognition\n\n**Chinese**\n| name | duration/h | address | remark | application \n| --- | --- | --- | --- | ---\n| THCHS-30 | 30 | https://openslr.org/18/ |\n| Aishell | 150 | https://openslr.org/33/ |\n| ST-CMDS | 110 | https://openslr.org/38/ |\n| Primewords | 99 | https://openslr.org/47/ |\n| aidatatang | 200 | https://openslr.org/62/ |\n| MagicData | 755 | https://openslr.org/68/ |\n| ASR\u0026SD | 160 | http://ncmmsc2021.org/competition2.html | if available\n| Aishell2 | 1000 | http://www.aishelltech.com/aishell_2 | if available\n| TAL ASR | 100 | https://ai.100tal.com/dataset |\n| Common Voice | 63 | https://commonvoice.mozilla.org/zh-CN/datasets | Common Voice Corpus 7.0 \n| ASRU2019 ASR | 500 | https://www.datatang.com/competition | if available\n| 2021 SLT CSRC | 398 | https://www.data-baker.com/csrc_challenge.html | if available\n| aidatatang_1505zh | 1505 | https://datatang.com/opensource | if available\n| WenetSpeech | 10000 | https://github.com/wenet-e2e/WenetSpeech | \n| KeSpeech | 1542 | https://openreview.net/forum?id=b3Zoeq2sCLq |  | speech recognition, speaker verification, subdialect identification, voice conversion\n| MagicData-RAMC | 180 | https://arxiv.org/pdf/2203.16844.pdf | conversational speech data recorded from native speakers of Mandarin Chinese |\n| Mandarin Heavy Accent Conversational Speech Corpus | 58.78 | https://magichub.com/datasets/mandarin-heavy-accent-conversational-speech-corpus/ |\n| Free ST Chinese Mandarin Corpus | - | https://openslr.org/38/ |\n\n**English**\n| name | duration/h | address | remark\n| --- | --- | --- | ---\n| Common Voice | 2015 | https://commonvoice.mozilla.org/zh-CN/datasets | Common Voice Corpus 7.0 \n| LibriSpeech | 960 | https://openslr.org/12/ | \n| ST-AEDS-20180100 | 4.7 | http://www.openslr.org/45/ |\n| TED-LIUM Release 3 | 430 | https://openslr.org/51/ |\n| Multilingual LibriSpeech | 44659 | https://openslr.org/94/ | limited supervision\n| SPGISpeech | 5000 | https://datasets.kensho.com/datasets/scribe | if available\n| Speech Commands | 10 | https://www.kaggle.com/c/tensorflow-speech-recognition-challenge/data | \n| 2020AESRC | 160 | https://datatang.com/INTERSPEECH2020 | if available\n| GigaSpeech | 10000 | https://github.com/SpeechColab/GigaSpeech | \n| The People’s Speech | 31400 | https://openreview.net/pdf?id=R8CwidgJ0yT |\n| Earnings-21 | 39 | https://arxiv.org/abs/2104.11348 | \n| VoxPopuli | 24100+543 | https://arxiv.org/pdf/2101.00390.pdf | 24100(unlabeled), 543(transcribed)\n| CMU Wilderness Multilingual Speech Dataset | 13 | http://festvox.org/cmu_wilderness/ | Multilingual\n| MSR-86K | 9795.46 | https://huggingface.co/datasets/Alex-Song/MSR-86K | Multilingual\n\n**Chinese-English**\n| name | duration/h | address | remark\n| --- | --- | --- | --- \n| SEAME | 30 | https://www.isca-speech.org/archive_v0/archive_papers/interspeech_2010/i10_1986.pdf |\n| TAL CSASR | 587 | https://ai.100tal.com/dataset |\n| ASRU2019 CSASR | 200 | https://www.datatang.com/competition | if available\n| ASCEND | 10.62 | https://arxiv.org/pdf/2112.06223.pdf |\n\n**Japanese (ja-JP)**\n| name | duration/h | address | remark\n| --- | --- | --- | ---\n| Common Voice | 26 | https://commonvoice.mozilla.org/zh-CN/datasets | Common Voice Corpus 7.0 \n| Japanese_Scripted_Speech_Corpus_Daily_Use_Sentence | 18 | https://magichub.io/cn/datasets/japanese-scripted-speech-corpus-daily-use-sentence/ | \n| LaboroTVSpeech | 2000 | https://arxiv.org/pdf/2103.14736.pdf | \n| CSJ | 650 | https://github.com/kaldi-asr/kaldi/tree/master/egs/csj |\n| JTubeSpeech | 1300 | https://arxiv.org/pdf/2112.09323.pdf\n| MSR-86K | 1779.03 | https://huggingface.co/datasets/Alex-Song/MSR-86K | Multilingual\n\n**Korean (ko-KR)**\n| name | duration/h | address | remark\n| --- | --- | --- | ---\n| korean-scripted-speech-corpus-daily-use-sentence | 4.3 | https://magichub.io/cn/datasets/korean-scripted-speech-corpus-daily-use-sentence/ | \n| korean-conversational-speech-corpus | 5.22 | https://magichub.io/cn/datasets/korean-conversational-speech-corpus/ |\n| MSR-86K | 10338.66 | https://huggingface.co/datasets/Alex-Song/MSR-86K | Multilingual\n\n**Russian (ru-RU)**\n| name | duration/h | address | remark\n| --- | --- | --- | ---\n| Common Voice | 148 | https://commonvoice.mozilla.org/zh-CN/datasets | Common Voice Corpus 7.0 \n| OpenSTT | 20000 | https://arxiv.org/pdf/2006.08274.pdf | limited supervision\n| MSR-86K | 3188.52 | https://huggingface.co/datasets/Alex-Song/MSR-86K | Multilingual\n\n**French (fr-Fr)**\n| name | duration/h | address | remark\n| --- | --- | --- | ---\n| MediaSpeech | 10 | https://arxiv.org/pdf/2103.16193.pdf | ASR system evaluation dataset\n| MSR-86K | 8316.70 | https://huggingface.co/datasets/Alex-Song/MSR-86K | Multilingual\n\n**Spanish (es-ES)**\n| name | duration/h | address | remark\n| --- | --- | --- | ---\n| MediaSpeech | 10 | https://arxiv.org/pdf/2103.16193.pdf | ASR system evaluation dataset\n| MSR-86K | 13976.84 | https://huggingface.co/datasets/Alex-Song/MSR-86K | Multilingual\n\n**Turkish (tr-TR)**\n| name | duration/h | address | remark\n| --- | --- | --- | ---\n| MediaSpeech | 10 | https://arxiv.org/pdf/2103.16193.pdf | ASR system evaluation dataset\n\n**Arabic (ar)**\n| name | duration/h | address | remark\n| --- | --- | --- | ---\n| MediaSpeech | 10 | https://arxiv.org/pdf/2103.16193.pdf | ASR system evaluation dataset\n| MSR-86K | 873.84 | https://huggingface.co/datasets/Alex-Song/MSR-86K | Multilingual\n\n**noise \u0026 nonspeech**\n| name | duration/h | address | remark\n| --- | --- | --- | ---\n| MUSAN | - | https://openslr.org/17/ |\n| Room Impulse Response and Noise Database | - | https://openslr.org/28/ | \n| AudioSet | - | https://ieeexplore.ieee.org/document/7952261 |\n\n---------------------------------------------------------------------------------------------------------------------\n---------------------------------------------------------------------------------------------------------------------\n\n## The Dataset of Speech Synthesis\n\n**Chinese**\n| name | duration/h | address | remark\n| --- | --- | --- | ---\n| Aishell3 | 85 | https://openslr.org/93/ | \n| Opencpop | - | https://wenet.org.cn/opencpop/download/ | Singing Voice Synthesis\n\n**English**\n| name | duration/h | address | remark\n| --- | --- | --- | ---\n| Hi-Fi Multi-Speaker English TTS Dataset | 291.6 | https://openslr.org/109/ | \n| LibriTTS corpus | 585 | https://openslr.org/60/ | \n| Speechocean762 | - | https://www.openslr.org/101/ | \n| RyanSpeech | 10 | http://mohammadmahoor.com/ryanspeech/ |\n\n---------------------------------------------------------------------------------------------------------------------\n---------------------------------------------------------------------------------------------------------------------\n\n## The Dataset of Speech Recognition \u0026 Speaker Diarization\n**Chinese**\n| name | duration/h | address | remark | application\n| --- | --- | --- | --- | ---\n| Aishell4 | 120 | https://openslr.org/111/ | 8-channel, conference scenarios | speech recognition, speaker diarization\n| ASR\u0026SD | 160 | http://ncmmsc2021.org/competition2.html | if available | speech recognition, speaker diarization\n| zhijiangcup | - | https://zhijiangcup.zhejianglab.com/zhijiang/match/details/id/6.html | if available | speech recognition, speaker diarization\n| M2MET | 120 | https://arxiv.org/pdf/2110.07393.pdf | 8-channel, conference scenarios | speech recognition, speaker diarization\n\n**English**\n| name | duration/h | address | remark | application\n| --- | --- | --- | --- | ---\n| CHiME-6 | - | https://chimechallenge.github.io/chime6/download.html | if available | speech recognition, speaker diarization\n\n---------------------------------------------------------------------------------------------------------------------\n---------------------------------------------------------------------------------------------------------------------\n\n## The Dataset of Speaker Recognition\n**Chinese**\n| name | duration/h | address | remark | application\n| --- | --- | --- | --- | --- \n| CN-Celeb | - | https://openslr.org/82/ |\n| KeSpeech | 1542 | https://openreview.net/forum?id=b3Zoeq2sCLq |  | speech recognition, speaker verification, subdialect identification, voice conversion\n| MTASS | 55.6 | https://github.com/Windstudent/Complex-MTASSNet |  |\n| THCHS-30 | 40 | http://www.openslr.org/18/ |  |\n\n**English**\n| name | duration/h | address | remark\n| --- | --- | --- | ---\n| VoxCeleb Data | - | http://www.robots.ox.ac.uk/~vgg/data/voxceleb/ |\n\n## The Dataset of Voice Activity Detection\n**French**\n| name | duration/h | address | remark | application\n| --- | --- | --- | --- | ---\n| InaGVAD | 5 | https://github.com/ina-foss/InaGVAD | 10 radio and 18 TV channels|  Voice Activity Detection, Speaker Gender Segmentation, Gender Monitoring\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdouble22a%2Fspeech_dataset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdouble22a%2Fspeech_dataset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdouble22a%2Fspeech_dataset/lists"}