Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/KurdishBLARK/KTC-Segmented
A segmented version of KTC
https://github.com/KurdishBLARK/KTC-Segmented
corpus kurdish kurdish-language-processing natural-language-processing
Last synced: 28 days ago
JSON representation
A segmented version of KTC
- Host: GitHub
- URL: https://github.com/KurdishBLARK/KTC-Segmented
- Owner: KurdishBLARK
- License: other
- Created: 2020-03-18T12:45:12.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-05-01T11:02:37.000Z (over 4 years ago)
- Last Synced: 2024-01-28T23:09:08.666Z (11 months ago)
- Topics: corpus, kurdish, kurdish-language-processing, natural-language-processing
- Homepage:
- Size: 2.08 MB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: history.zip
- License: LICENSE
Awesome Lists containing this project
- awesome-kurdish - A sentence-segmented dataset
README
# KTC-Segmented
This repository is the sentence segmented KTC.
It follows the KTC's structure.
Each file is the line sigmented form of its counterpart in the raw corpus.
The segmentation process and related discussions have been presented in a paper entitled
"Using Punkt for Sentence Segmentation in non-Latin Scripts: Experiments on Kurdish (Sorani) Texts".
The paper is appeared at AfricaNLp Workshop at ICLR 2020.
See the presentation of the related article here.
See the related poster here.If you use this data, referring to it, or referring to its related paper, please cite it as follows:
~~~
@inproceedings{abdulrahman2020using,
title = "Using Punkt for Sentence Segmentation in non-Latin Scripts: Experiments on Kurdish (Sorani) Texts",
author = "Abdulrahman, Roshna Omer and Hassani, Hossein},
booktitle = "Proceedings of the AfricaNLP Wrokshop at ICLR 2020",
month = "4",
year = "2020",
address = "Virtual",
url = "http://export.arxiv.org/pdf/2004.14134",
eprint = "2004.14134",
archivePrefix = "arXiv",
primaryClass = "cs.CL",
}
~~~