Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/harisbinzia/urdu-word-segmentation
Urdu Word Segmentation using Conditional Random Fields (CRFs)
https://github.com/harisbinzia/urdu-word-segmentation
crf nlp segmentation urdu
Last synced: 13 days ago
JSON representation
Urdu Word Segmentation using Conditional Random Fields (CRFs)
- Host: GitHub
- URL: https://github.com/harisbinzia/urdu-word-segmentation
- Owner: harisbinzia
- License: mit
- Created: 2018-01-29T17:22:57.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2018-10-03T07:12:05.000Z (about 6 years ago)
- Last Synced: 2024-04-06T22:32:46.934Z (8 months ago)
- Topics: crf, nlp, segmentation, urdu
- Language: Jupyter Notebook
- Homepage:
- Size: 1.71 MB
- Stars: 12
- Watchers: 3
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Urdu Word Segmentation
This repository contains code & dataset for Urdu word segmentation as described in paper [Urdu Word Segmentation using Conditional Random Fields (CRFs)](http://aclweb.org/anthology/C18-1217).# Requirement(s)
It is implemented in python and requires [scikit-learn](http://scikit-learn.org/stable/index.html) and [python-crfsuite](https://github.com/scrapinghub/python-crfsuite).
# Dataset
A manually annotated corpus of approximately 111,000 tokens is [available for download](https://github.com/harisbinzia/Urdu-Word-Segmentation/tree/master/Data).
# Reference(s)
If you use this tool in any of your work, please cite below paper.
[Urdu Word Segmentation using Conditional Random Fields (CRFs)](http://aclweb.org/anthology/C18-1217)
```
@InProceedings{C18-1217,
author = "Bin Zia, Haris
and Raza, Agha Ali
and Athar, Awais",
title = "Urdu Word Segmentation using Conditional Random Fields (CRFs)",
booktitle = "Proceedings of the 27th International Conference on Computational Linguistics",
year = "2018",
publisher = "Association for Computational Linguistics",
pages = "2562--2569",
location = "Santa Fe, New Mexico, USA",
url = "http://aclweb.org/anthology/C18-1217"
}
```# License(s)
Copyright (c) 2018 CSaLT, ITUCode licensed under the MIT License: http://opensource.org/licenses/MIT
Data licensed under CC-BY 4.0: https://creativecommons.org/licenses/by/4.0/