Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/harisbinzia/urdu-word-segmentation

Urdu Word Segmentation using Conditional Random Fields (CRFs)
https://github.com/harisbinzia/urdu-word-segmentation

crf nlp segmentation urdu

Last synced: 13 days ago
JSON representation

Urdu Word Segmentation using Conditional Random Fields (CRFs)

Awesome Lists containing this project

README

        

# Urdu Word Segmentation
This repository contains code & dataset for Urdu word segmentation as described in paper [Urdu Word Segmentation using Conditional Random Fields (CRFs)](http://aclweb.org/anthology/C18-1217).

# Requirement(s)

It is implemented in python and requires [scikit-learn](http://scikit-learn.org/stable/index.html) and [python-crfsuite](https://github.com/scrapinghub/python-crfsuite).

# Dataset

A manually annotated corpus of approximately 111,000 tokens is [available for download](https://github.com/harisbinzia/Urdu-Word-Segmentation/tree/master/Data).

# Reference(s)

If you use this tool in any of your work, please cite below paper.

[Urdu Word Segmentation using Conditional Random Fields (CRFs)](http://aclweb.org/anthology/C18-1217)

```
@InProceedings{C18-1217,
author = "Bin Zia, Haris
and Raza, Agha Ali
and Athar, Awais",
title = "Urdu Word Segmentation using Conditional Random Fields (CRFs)",
booktitle = "Proceedings of the 27th International Conference on Computational Linguistics",
year = "2018",
publisher = "Association for Computational Linguistics",
pages = "2562--2569",
location = "Santa Fe, New Mexico, USA",
url = "http://aclweb.org/anthology/C18-1217"
}
```

# License(s)
Copyright (c) 2018 CSaLT, ITU

Code licensed under the MIT License: http://opensource.org/licenses/MIT
Data licensed under CC-BY 4.0: https://creativecommons.org/licenses/by/4.0/