Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/AsoSoft/Kurdish-G2P-dataset
Datasets for evaluation of Central Kurdish Grapheme-to-Phoneme Conversion systems
https://github.com/AsoSoft/Kurdish-G2P-dataset
g2p grapheme-to-phoneme kurdish-language-processing
Last synced: 28 days ago
JSON representation
Datasets for evaluation of Central Kurdish Grapheme-to-Phoneme Conversion systems
- Host: GitHub
- URL: https://github.com/AsoSoft/Kurdish-G2P-dataset
- Owner: AsoSoft
- Created: 2019-02-27T14:30:13.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-12-29T20:14:25.000Z (almost 4 years ago)
- Last Synced: 2024-01-28T23:09:08.649Z (11 months ago)
- Topics: g2p, grapheme-to-phoneme, kurdish-language-processing
- Size: 85 KB
- Stars: 4
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-kurdish - Evaluation datasets for Kurdish Grapheme-to-Phoneme Conversion systems
README
[![DOI](https://zenodo.org/badge/172930780.svg)](https://zenodo.org/badge/latestdoi/172930780)
# Kurdish-G2P-dataset
Datasets for evaluation of Central Kurdish Grapheme-to-Phoneme Conversion systems.## Format
Central Kurdish words in Standard Arabic script and its corresponding phoneme string separated by tab character. Syllable start is indicated by full stop. For example:
`ئازادی .ʔa.za.dî`## Datasets
### AsoSoft Kurdish Corpus most frequent tokens
Manually converted First 5000 most frequent words of AsoSoft Kurdish Corpus presented by:Veisi, H., MohammadAmini, M., & Hosseini, H. (2019). “Toward Kurdish language processing: Experiments in collecting and processing the AsoSoft text corpus”. Digital Scholarship in the Humanities.
~~~
@article{veisi2020toward,
title={Toward Kurdish language processing: Experiments in collecting and processing the AsoSoft text corpus},
author={Veisi, Hadi and MohammadAmini, Mohammad and Hosseini, Hawre},
journal={Digital Scholarship in the Humanities},
volume={35},
number={1},
pages={176--193},
year={2020},
publisher={Oxford University Press}
}
~~~### Wergor dataset
Manually converted 5041 unique words of document presented by: https://github.com/sinaahmadi/wergorAhmadi, S. (2019). “A Rule-Based Kurdish Text Transliteration System”. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(2), 18.
~~~
@article{ahmadi2019rule,
title={A Rule-Based Kurdish Text Transliteration System},
author={Ahmadi, Sina},
journal={ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)},
volume={18},
number={2},
pages={18},
year={2019},
publisher={ACM}
}
~~~