An open API service indexing awesome lists of open source software.

https://github.com/camel-lab/arabic-atb-closed-class-list

A Modern Standard Arabic Closed-Class Word List
https://github.com/camel-lab/arabic-atb-closed-class-list

Last synced: 4 months ago
JSON representation

A Modern Standard Arabic Closed-Class Word List

Awesome Lists containing this project

README

          

# A Modern Standard Arabic Closed-Class Word List

This repository contains a list of Modern Standard Arabic closed-class words,
which can be used as a stop list for a variety of natural language processing
applications. The list contains 740 inflected words and clitics in the Arabic
Treebank (ATB) tokenization scheme (Maamouri et al., 2004; Habash, 2010).
The inflected words are based on 309 lemmas from the Standard Arabic Morphological
Analyzer, SAMA (Graff et al., 2009).

The list was create by Wael Salloum and Nizar Habash.
The repository contains a technical report detailing its design decisions.

If you use this resource, please cite:

* Wael Salloum and Nizar Habash. 2012. A Modern Standard Arabic Closed-Class Word List. [Columbia University's Center for Computational Learning Systems Tech Report #CCLS-12-03](https://academiccommons.columbia.edu/doi/10.7916/D8K93GSN).

## References
1. D. Graff, M. Maamouri, B. Bouziri, S. Krouna, S. Kulick, and T. Buckwalter. Standard Arabic Morphological Analyzer (SAMA) Version 3.1, 2009. Linguistic Data Consortium LDC2009E73.
2. N. Habash. Introduction to Arabic Natural Language Processing. Morgan & Claypool Publishers, 2010.
3. M. Maamouri, A. Bies, T. Buckwalter, and W. Mekki. The Penn Arabic Treebank: Building a Large- Scale Annotated Arabic Corpus. In NEMLAR Conference on Arabic Language Resources and Tools, pages 102–109, Cairo, Egypt, 2004.