https://github.com/linuxscout/arabicnlptoolslist
Arabic NLP tools List inventory
https://github.com/linuxscout/arabicnlptoolslist
arabic catalogue nlp nlp-resources
Last synced: 25 days ago
JSON representation
Arabic NLP tools List inventory
- Host: GitHub
- URL: https://github.com/linuxscout/arabicnlptoolslist
- Owner: linuxscout
- License: gpl-3.0
- Created: 2018-12-14T09:27:30.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-17T20:09:48.000Z (over 2 years ago)
- Last Synced: 2023-03-11T10:12:27.121Z (about 2 years ago)
- Topics: arabic, catalogue, nlp, nlp-resources
- Size: 41 KB
- Stars: 63
- Watchers: 10
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Arabic NLP Tools and Resources Lists
Arabic NLP tools List inventory
## Tools
### STEMMING
- [Tashaphyne Light Stemmer ](https://pypi.org/project/Tashaphyne/) Tashaphyne Light Stemmer
- [Khoja Arabic Stemmer ](http://zeus.cs.pacificu.edu/shereen/research.htm#stemming) Khoja Arabic Stemmer
- Arabic Stemmers: Sebawai and Al-Stem[ Sebawai and Al-Stem](http://tides.umiacs.umd.edu/software.html) (Contact Dr. Kareem Darwish)
- [Larkey’s L-stem](http://www.springerlink.com/content/pr215t0701804h3g/) Larkey’s L-stem (contact authors)
D.2
- [Farasa Segmentor ](https://github.com/qcri/FarasaSegmenter) Farasa
- [ARBML/tkseem](https://github.com/ARBML/tkseem/)
- Alkhalil [Lemmatizer](http://oujda-nlp-team.net/2022/04/27/alkhalil-lemmatizer/)
- Alkhalil [Stemmer ](http://oujda-nlp-team.net/2022/04/27/alkhalil-stemmer/)
- Alkhalil [Root Extractor](http://oujda-nlp-team.net/2022/04/27/alkhalil-rootextractor/)### MORPHOLOGICAL ANALYSIS AND GENERATION
- Qalsadi[Qalsadi](http://github.com/linuxscout/qalsadi) Qalsadi: Arabic mophological analyzer Library for python.
- Buckwalter Arabic Morphological Analyzer ([BAMA](https://catalog.ldc.upenn.edu/LDC2004L02) BAMA)
- Standard Arabic Morphological Analyzer ([SAMA](https://catalog.ldc.upenn.edu/LDC2010L01) SAMA, version 3.0 of BAMA)
- [ElixirFM](https://sourceforge.net/projects/elixir-fm/) ElixirFM : Functional Arabic Morphology
- [Xerox](http://ftp.xrce.xerox.com/competencies/content-analysis/arabic/input/paste_input.html) Arabic Morphological Analysis and Generation
- (~~Deprecated) [NMSU](http://crl.nmsu.edu/Resources/lang_res/arabic.html) NMSU’s Arabic Morphological Analyzer~~
~~- MAGEAD: Morphological Analysis and Generation for Arabic and its Dialects~~
~~- Almorgeana : Arabic Lexeme-based Morphological Generation and Analysis is distributed as part of the MADA system.
~~
-[ Alkhalil ](http://oujda-nlp-team.net/ar/programms-ar/alkhalil-morphology-2-ar/) Alkhalil Morphological Analyzer
- [Araflex ](http://lexanalysis.com/araflex/araflex.html) Araflex### MORPHOLOGICAL DISAMBIGUATION AND POS TAGGING
-[ Khoja Arabic Tagger](http://zeus.cs.pacificu.edu/shereen/research.htm#tagging) Khoja Arabic Tagger
- [AMIRA:](http://nlp.ldeo.columbia.edu/amira/) AMIRA: Toolkit for Arabic tokenization, POS tagging and base phrase chunking
- [MADA](http://www1.ccls.columbia.edu/~cadim/MADA.html) MADA: Morphological Analysis and Disambiguation for Arabic – a tool for tokenization, lemmatization, diacritization and POS tagging### PARSERS
- [The Stanford Parser] (http://nlp.stanford.edu/software/lex-parser.shtml)
- [The Bikel Parser](http://www.cis.upenn.edu/~dbikel/software.html#stat-parser)
- [MALTParser]( http://maltparser.org/)
- [Mohammed Attia](http://www.attiaspace.com/) ’s Rule-based Parser for MSA http://www.attiaspace.com/### NAMED ENTITY RECOGNITION
~~- Yassine Benajiba’s [ANER (Arabic Named Entity Recognition) system](http://www1.ccls.columbia.edu/~ybenajiba/downloads.html) ~~
~~- [BBN’s Identifinder ](http://www.bbn.com/technology/speech/identifinder) BBN’s Identifinder (English, Arabic, Chinese)~~### MACHINE TRANSLATION
- [Statistical MT public resources](http://www.statmt.org/): Giza alignment, Pharaoh and Moses decoders, etc.
- [Turjuman ](https://github.com/UBC-NLP/turjuman): is a neural machine translation toolkit from 20 languages into Modern Standard Arabic. [Demo](https://demos.dlnlp.ai/turjuman/).### TREE EDITING
- [Tred for Arabic ](https://ufal.mff.cuni.cz/padt/PADT_1.0/docs/index.html) Tred for Arabic - Tree Editor with Arabic support
### LEXICOGRAPHY
- [aConCorde](https://www.andy-roberts.net/coding/aconcorde): A concordance generation program for Arabic### Verb conjugator
* [Qutrub](http://qutrub.arabeyes.org) Source on [github](http://github.com/linuxscout/qutrub)
* [The CJKI Arabic Verb Conjugator](http://www.kanji.org/cjk/arabic/cave/cave.htm) (CAVE).
An interactive Arabic-English verb conjugation application for iOS devices that provides conjugation paradigms for over 1,600 Arabic verbs.
* [AraCon](https://github.com/JaouadMousser/Aracon) ARACON is a verb conjugator for Arabic implemented as part of a morphological Analyser and generator (java).### Transcription and transliteration
* [Arabic Transcription and Transliteration](http://www.kanji.org/cjk/trans/transsum.htm).
An overview of some linguistic issues related to transliteration and transcription, with special focus on our Arabic transcription technology.
* The [ARAN](aran.htm) and [NANA](http://www.kanji.org/cjk/arabic/nana.htm) systems automatically transcribe CJK and Latin names to and from Arabic.### Numbers to words
* [Tafqit](https://github.com/MohsenAlyafei/tafqit) : Tafqeet of Arabic Number to Word تحويل الأرقام إلى ما يقابلها كتابة باللغة العربية### Poetry
[Al-Faraheedy-Project](https://github.com/MukhtarSayedSaleh/Al-Faraheedy-Project)## Resources
### Corpora
#### Monolignual corpora
- [Abuelkhair Corpus, 1.5 billion Arabic words corpus](http://www.abuelkhair.net/index.php/en/arabic/abu-el-khair-corpus)
includes more than 5 million newspaper articles, over 1.5 billion words, about 3 million unique words. The corpus is encoded (UTF-8,CP-1256) and marked as XML and SGML.
- [Tashkeela](http://tashkeela.sf.net) Arabic vocalized (diacritized) Texts corpus
- A fully diacritized modern[A fully diacritized modern](http://www.biblegateway.com/versions/?action=getVersionInfo&vid=28) Arabic translation of the Bible (by Biblica).
#### Multilingual corpora### Dicrionaries
* [The CJKI Arabic Learner’s Dictionary](http://www.kanji.org/kanji/dictionaries/cald/cald_overview.pdf) (CALD) (_.pdf_).
A new concept dictionary that enables learners to gain a full understanding of MSA core vocabulary. An Arabic summary is available at [القاموس العربي الإنجليزي للمتعلمين](http://www.kanji.org/kanji/dictionaries/cald/cald_overview_a.pdf) (_.pdf_)
#### Wordlists
* [Comprehensive Word Lists for Arabic](http://www.kanji.org/cjk/samples/cjkaword.htm) (CJKAWORD).
Comprehensive monolingual word lists for Arabic covering general vocabulary, proper nouns and technical terms. Includes both a lexical database for canonical forms and a full-furm lexicon.* [Arabic Broken Plurals](http://www.kanji.org/cjk/arabic/plurals8.htm).
A comprehensive database of broken plurals (unpredictable) in Arabic given in three versions -- voweled, unvoweled, and transcription.
#### ROOT LISTS
- [Buckwalter’s list of Arabic roots](http://www.angelfire.com/tx4/lisan/roots1.htm)
- [Project Root List](http://www.studyquran.co.uk/PRLonline.htm)
- [Root list](http://tides.umiacs.umd.edu/software.html) inside the morphological analyzer Sebawai (Contact Dr. Kareem Darwish)### GAZETTEERS
- [ANERCorp](http://users.dsic.upv.es/~ybenajiba/resources/) : Is a Corpus of more than 150,000 words annotated for the NER task.
- [ANERGazet](http://users.dsic.upv.es/~ybenajiba/resources/): Is a collection of 3 Gazetteers, (i) Locations: a Gazetteer containing names of continents, countries, cities, etc.; (ii) People: a Gazetteer containing names of people recollected manually from different Arabic websites; and finally (iii) Organizations: containing names of Organizations like companies, football teams, etc.
- [FAOTERM](http://www.fao.org/faoterm): United Nations’ Food and Agriculture Organization of the Terminology refer-
ence for country names (six languages including Arabic)
- [Foreignword.com’s country names](http://www.foreignword.com/countries/) in 16 languages including Arabic
- [Geonames.de’s](http://www.geonames.de) multilingual resource for names of geographical entities (and other things)C.5. LEXICAL DATABASES
139
- [U.S. Board on Geographic Names](http://geonames.usgs.gov/) (including Arab countries) – uses SATTS Arabic translit-
eration
* [Database of Arab Names](http://www.kanji.org/cjk/arabic/dan.htm) (DAN).
A comprehensive database covering over 6.5 million Arab names and variants, based on authoritative resources and extensively proofread by a team of Arabic native speaker editors.* [Database of Arab Names in Arabic](http://www.kanji.org/cjk/arabic/dana.htm) (DANA).
A one-of-a-kind resource of Arab personal names and variants, in the original Arabic script. This database covers several hundred thousand Arabic script variants, along with common spelling mistakes.* [Database of Arabic Business Names](http://www.kanji.org/cjk/arabic/dabna.htm) (DABNA).
Arabic Companies and Organizations. A database of Arabic company and organization names is now under development.* [Expanded OFAC](http://www.kanji.org/cjk/arabic/xofac.htm) (XOFAC).
To address the shortcomings of OFAC's SDN List, CJKI has developed a comprehensive "Expanded OFAC" database of OFAC full name variants, the vast majority of which are not listed in OFAC.* [Database of Foreign Names in Arabic](http://www.kanji.org/cjk/arabic/dafna.htm) (DAFNA).
A database of non-Arab names transcribed to Arabic, including Arabic orthographic variants and common orthographic errors.* [Dictionary of Arabic Place Names](http://www.kanji.org/cjk/arabic/dapna.htm) (DAPNA).
A database of Arabic-English place names including systematic coverage for orthographic variants and common orthographic errors.### Question answering
- [Documents](http://users.dsic.upv.es/~ybenajiba/resources/):: more than 11,000 Arabic Wikipedia Articles in SGML format (the format adopted in the CLEF and also the one accepted by the JIRS system).
- [List of Questions]](http://users.dsic.upv.es/~ybenajiba/resources/):: This is a list of 200 questions of different types. The proportion of each type of questions is the same proportion adopted in CLEF.
- [List of Correct Answers]](http://users.dsic.upv.es/~ybenajiba/resources/):: For each of the questions presented in my list of questions, I give you here a list of correct answers for each question. This list is very important for automatic evaluation.
### Ontologies
#### SEMANTIC ONTOLOGIES
- [Arabic Wordnet](http://www.globalwordnet.org/AWN/)
[Arabic VerbNet](https://github.com/JaouadMousser/Arabic-Verbnet) Arabic Verbnet is a lage scale verb lexicon that classifies verbs in Arabic using syntactic alternations inspired by the work of Kipper Schuler (2005) on English VerbNet.