Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/persiandataset/PersianSpeech
Persian ASR dataset
https://github.com/persiandataset/PersianSpeech
asr audio-to-text dataset persian-speech-dataset persian-speech-recognition
Last synced: 3 months ago
JSON representation
Persian ASR dataset
- Host: GitHub
- URL: https://github.com/persiandataset/PersianSpeech
- Owner: persiandataset
- License: mit
- Created: 2021-08-06T16:14:15.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-07-15T21:38:54.000Z (over 1 year ago)
- Last Synced: 2024-05-28T13:31:01.886Z (6 months ago)
- Topics: asr, audio-to-text, dataset, persian-speech-dataset, persian-speech-recognition
- Homepage:
- Size: 765 KB
- Stars: 23
- Watchers: 3
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PersianSpeech
In this repository, I put the Persian speech dataset along with the related text.
In this link , I put a dataset related to ASR task in Persian language with a duration of 3 hours.
The label of each audio file is in the form of a sentence and the duration of each file is about 10 seconds.
This dataset is not copied from anywhere and it is my personal project that I publish freely. You can use it in your projects.
Also, if you want to have a 86-hour dataset like this, you can contact me. hubare.ra[at]gmail.com [not free]myaudio_tiny is tiny dataset with a duration of 3 hours.
myaudio_full is big dataset with a duration of 30 hours.
persian_v2 is is big datasat with a duration of 56 hours.Other sources:
1. Mozilla dataset :
Mozilla Company has started to produce a huge Persian dataset. In its version 7, the company has converted 293 hours of Persian audio to text and published it for free at this link. The sounds in this collection are usually short.2. persianspeechcorpus :
You can also use this site. This ~ 2.5-hour Single-Speaker Speech corpus has been developed using the same methodologies used in the PhD work carried out by Nawar Halabi at the University of Southampton.# Donation
I try to publish free Persian datasets in github. Your financial support will encourage me.
Donation link : https://www.patreon.com/persiandataset