Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/KurdishBLARK/KTC-Segmented

A segmented version of KTC
https://github.com/KurdishBLARK/KTC-Segmented

corpus kurdish kurdish-language-processing natural-language-processing

Last synced: 28 days ago
JSON representation

A segmented version of KTC

Awesome Lists containing this project

README

        

# KTC-Segmented
This repository is the sentence segmented KTC.
It follows the KTC's structure.
Each file is the line sigmented form of its counterpart in the raw corpus.
The segmentation process and related discussions have been presented in a paper entitled
"Using Punkt for Sentence Segmentation in non-Latin Scripts: Experiments on Kurdish (Sorani) Texts".
The paper is appeared at AfricaNLp Workshop at ICLR 2020.
See the presentation of the related article here.
See the related poster here.

If you use this data, referring to it, or referring to its related paper, please cite it as follows:

~~~
@inproceedings{abdulrahman2020using,
title = "Using Punkt for Sentence Segmentation in non-Latin Scripts: Experiments on Kurdish (Sorani) Texts",
author = "Abdulrahman, Roshna Omer and Hassani, Hossein},
booktitle = "Proceedings of the AfricaNLP Wrokshop at ICLR 2020",
month = "4",
year = "2020",
address = "Virtual",
url = "http://export.arxiv.org/pdf/2004.14134",
eprint = "2004.14134",
archivePrefix = "arXiv",
primaryClass = "cs.CL",
}
~~~