{"id":17718603,"url":"https://github.com/linuxscout/arabicnlptoolslist","last_synced_at":"2026-01-08T08:14:48.047Z","repository":{"id":136198650,"uuid":"161761938","full_name":"linuxscout/arabicnlptoolslist","owner":"linuxscout","description":"Arabic NLP tools List inventory ","archived":false,"fork":false,"pushed_at":"2022-12-17T20:09:48.000Z","size":42,"stargazers_count":63,"open_issues_count":1,"forks_count":6,"subscribers_count":10,"default_branch":"master","last_synced_at":"2023-03-11T10:12:27.121Z","etag":null,"topics":["arabic","catalogue","nlp","nlp-resources"],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/linuxscout.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-12-14T09:27:30.000Z","updated_at":"2023-03-10T18:56:41.000Z","dependencies_parsed_at":null,"dependency_job_id":"86681e5c-4005-4620-b779-f4a1a41c3e74","html_url":"https://github.com/linuxscout/arabicnlptoolslist","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Farabicnlptoolslist","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Farabicnlptoolslist/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Farabicnlptoolslist/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Farabicnlptoolslist/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/linuxscout","download_url":"https://codeload.github.com/linuxscout/arabicnlptoolslist/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246467534,"owners_count":20782325,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arabic","catalogue","nlp","nlp-resources"],"created_at":"2024-10-25T14:55:05.473Z","updated_at":"2026-01-08T08:14:48.018Z","avatar_url":"https://github.com/linuxscout.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Arabic NLP Tools and Resources Lists\n\nArabic NLP tools List inventory \n\n## Tools\n### STEMMING\n- [Tashaphyne Light Stemmer ](https://pypi.org/project/Tashaphyne/) Tashaphyne Light Stemmer \n- [Khoja Arabic Stemmer ](http://zeus.cs.pacificu.edu/shereen/research.htm#stemming) Khoja Arabic Stemmer \n- Arabic Stemmers: Sebawai and Al-Stem[ Sebawai and Al-Stem](http://tides.umiacs.umd.edu/software.html)  (Contact Dr. Kareem Darwish)\n- [Larkey’s L-stem](http://www.springerlink.com/content/pr215t0701804h3g/) Larkey’s L-stem (contact authors)\nD.2\n- [Farasa Segmentor ](https://github.com/qcri/FarasaSegmenter) Farasa \n- [ARBML/tkseem](https://github.com/ARBML/tkseem/)\n- Alkhalil [Lemmatizer](http://oujda-nlp-team.net/2022/04/27/alkhalil-lemmatizer/)\n   - Alkhalil [Stemmer ](http://oujda-nlp-team.net/2022/04/27/alkhalil-stemmer/)\n   - Alkhalil [Root Extractor](http://oujda-nlp-team.net/2022/04/27/alkhalil-rootextractor/) \n\n\n### MORPHOLOGICAL ANALYSIS AND GENERATION\n- Qalsadi[Qalsadi](http://github.com/linuxscout/qalsadi)  Qalsadi: Arabic mophological analyzer Library for python.\n- Buckwalter Arabic Morphological Analyzer ([BAMA](https://catalog.ldc.upenn.edu/LDC2004L02) BAMA) \n- Standard Arabic Morphological Analyzer ([SAMA](https://catalog.ldc.upenn.edu/LDC2010L01) SAMA, version 3.0 of BAMA)\n- [ElixirFM](https://sourceforge.net/projects/elixir-fm/) ElixirFM : Functional Arabic Morphology\n- [Xerox](http://ftp.xrce.xerox.com/competencies/content-analysis/arabic/input/paste_input.html) Arabic Morphological Analysis and Generation\n- (~~Deprecated) [NMSU](http://crl.nmsu.edu/Resources/lang_res/arabic.html) NMSU’s Arabic Morphological Analyzer~~\n~~- MAGEAD: Morphological Analysis and Generation for Arabic and its Dialects~~\n~~- Almorgeana : Arabic Lexeme-based Morphological Generation and Analysis is distributed as part of the MADA system.\n~~\n-[ Alkhalil ](http://oujda-nlp-team.net/ar/programms-ar/alkhalil-morphology-2-ar/)  Alkhalil Morphological Analyzer \n- [Araflex ](http://lexanalysis.com/araflex/araflex.html) Araflex \n\n\n### MORPHOLOGICAL DISAMBIGUATION AND POS TAGGING\n-[ Khoja Arabic Tagger](http://zeus.cs.pacificu.edu/shereen/research.htm#tagging)  Khoja Arabic Tagger\n- [AMIRA:](http://nlp.ldeo.columbia.edu/amira/) AMIRA: Toolkit for Arabic tokenization, POS tagging and base phrase chunking\n- [MADA](http://www1.ccls.columbia.edu/~cadim/MADA.html) MADA: Morphological Analysis and Disambiguation for Arabic – a tool for tokenization, lemmatization, diacritization and POS tagging\n\n### PARSERS\n- [The Stanford Parser] (http://nlp.stanford.edu/software/lex-parser.shtml)\n- [The Bikel Parser](http://www.cis.upenn.edu/~dbikel/software.html#stat-parser)\n- [MALTParser]( http://maltparser.org/)\n- [Mohammed Attia](http://www.attiaspace.com/) ’s Rule-based Parser for MSA http://www.attiaspace.com/\n\n\n\n\n### NAMED ENTITY RECOGNITION\n~~- Yassine Benajiba’s [ANER (Arabic Named Entity Recognition) system](http://www1.ccls.columbia.edu/~ybenajiba/downloads.html) ~~\n~~- [BBN’s Identifinder ](http://www.bbn.com/technology/speech/identifinder) BBN’s Identifinder (English, Arabic, Chinese)~~\n\n### MACHINE TRANSLATION\n- [Statistical MT public resources](http://www.statmt.org/): Giza alignment, Pharaoh and Moses decoders, etc.\n- [Turjuman ](https://github.com/UBC-NLP/turjuman): is a neural machine translation toolkit from 20 languages into Modern Standard Arabic. [Demo](https://demos.dlnlp.ai/turjuman/).\n\n### TREE EDITING\n- [Tred for Arabic ](https://ufal.mff.cuni.cz/padt/PADT_1.0/docs/index.html) Tred for Arabic - Tree Editor with Arabic support\n### LEXICOGRAPHY\n- [aConCorde](https://www.andy-roberts.net/coding/aconcorde): A concordance generation program for Arabic\n\n\n### Verb conjugator \n* [Qutrub](http://qutrub.arabeyes.org) Source on [github](http://github.com/linuxscout/qutrub)\n*   [The CJKI Arabic Verb Conjugator](http://www.kanji.org/cjk/arabic/cave/cave.htm) (CAVE).  \n    An interactive Arabic-English verb conjugation application for iOS devices that provides conjugation paradigms for over 1,600 Arabic verbs.\n* [AraCon](https://github.com/JaouadMousser/Aracon) ARACON is a verb conjugator for Arabic implemented as part of a morphological Analyser and generator (java). \n\n### Transcription and transliteration \n*   [Arabic Transcription and Transliteration](http://www.kanji.org/cjk/trans/transsum.htm).  \n    An overview of some linguistic issues related to transliteration and transcription, with special focus on our Arabic transcription technology.\n*   The [ARAN](aran.htm) and [NANA](http://www.kanji.org/cjk/arabic/nana.htm) systems automatically transcribe CJK and Latin names to and from Arabic.\n\n### Numbers to words\n* [Tafqit](https://github.com/MohsenAlyafei/tafqit) : Tafqeet of Arabic Number to Word تحويل الأرقام إلى ما يقابلها كتابة باللغة العربية \n\n### Poetry \n[Al-Faraheedy-Project](https://github.com/MukhtarSayedSaleh/Al-Faraheedy-Project)\n\n## Resources\n### Corpora\n#### Monolignual corpora\n- [Abuelkhair Corpus, 1.5 billion Arabic words corpus](http://www.abuelkhair.net/index.php/en/arabic/abu-el-khair-corpus)\n includes more than 5 million newspaper articles, over 1.5 billion words, about 3 million unique words. The corpus is encoded (UTF-8,CP-1256)  and marked as XML and SGML.\n- [Tashkeela](http://tashkeela.sf.net) Arabic vocalized (diacritized) Texts corpus\n- A fully diacritized modern[A fully diacritized modern](http://www.biblegateway.com/versions/?action=getVersionInfo\u0026vid=28)  Arabic translation of the Bible (by Biblica).\n#### Multilingual corpora\n\n\n### Dicrionaries\n\n*   [The CJKI Arabic Learner’s Dictionary](http://www.kanji.org/kanji/dictionaries/cald/cald_overview.pdf) (CALD) (_.pdf_).  \n    A new concept dictionary that enables learners to gain a full understanding of MSA core vocabulary. An Arabic summary is available at [القاموس العربي الإنجليزي للمتعلمين](http://www.kanji.org/kanji/dictionaries/cald/cald_overview_a.pdf) (_.pdf_)\n    \n#### Wordlists \n*   [Comprehensive Word Lists for Arabic](http://www.kanji.org/cjk/samples/cjkaword.htm) (CJKAWORD).  \n    Comprehensive monolingual word lists for Arabic covering general vocabulary, proper nouns and technical terms. Includes both a lexical database for canonical forms and a full-furm lexicon.\n\n*   [Arabic Broken Plurals](http://www.kanji.org/cjk/arabic/plurals8.htm).  \n    A comprehensive database of broken plurals (unpredictable) in Arabic given in three versions -- voweled, unvoweled, and transcription.\n    \n#### ROOT LISTS\n- [Buckwalter’s list of Arabic roots](http://www.angelfire.com/tx4/lisan/roots1.htm)\n- [Project Root List](http://www.studyquran.co.uk/PRLonline.htm)\n- [Root list](http://tides.umiacs.umd.edu/software.html) inside the morphological analyzer Sebawai (Contact Dr. Kareem Darwish)\n\n### GAZETTEERS\n- [ANERCorp](http://users.dsic.upv.es/~ybenajiba/resources/) : Is a Corpus of more than 150,000 words annotated for the NER task.\n- [ANERGazet](http://users.dsic.upv.es/~ybenajiba/resources/): Is a collection of 3 Gazetteers, (i) Locations: a Gazetteer containing names of continents, countries, cities, etc.; (ii) People: a Gazetteer containing names of people recollected manually from different Arabic websites; and finally (iii) Organizations: containing names of Organizations like companies, football teams, etc.\n- [FAOTERM](http://www.fao.org/faoterm): United Nations’ Food and Agriculture Organization of the Terminology refer-\nence for country names (six languages including Arabic)\n- [Foreignword.com’s country names](http://www.foreignword.com/countries/) in 16 languages including Arabic\n- [Geonames.de’s](http://www.geonames.de) multilingual resource for names of geographical entities (and other things)C.5. LEXICAL DATABASES\n139\n- [U.S. Board on Geographic Names](http://geonames.usgs.gov/) (including Arab countries) – uses SATTS Arabic translit-\neration\n*   [Database of Arab Names](http://www.kanji.org/cjk/arabic/dan.htm) (DAN).  \n    A comprehensive database covering over 6.5 million Arab names and variants, based on authoritative resources and extensively proofread by a team of Arabic native speaker editors.\n\n\n*   [Database of Arab Names in Arabic](http://www.kanji.org/cjk/arabic/dana.htm) (DANA).  \n    A one-of-a-kind resource of Arab personal names and variants, in the original Arabic script. This database covers several hundred thousand Arabic script variants, along with common spelling mistakes.\n\n\n*   [Database of Arabic Business Names](http://www.kanji.org/cjk/arabic/dabna.htm) (DABNA).  \n    Arabic Companies and Organizations. A database of Arabic company and organization names is now under development.\n\n*   [Expanded OFAC](http://www.kanji.org/cjk/arabic/xofac.htm) (XOFAC).  \n    To address the shortcomings of OFAC's SDN List, CJKI has developed a comprehensive \"Expanded OFAC\" database of OFAC full name variants, the vast majority of which are not listed in OFAC.\n\n*   [Database of Foreign Names in Arabic](http://www.kanji.org/cjk/arabic/dafna.htm) (DAFNA).  \n    A database of non-Arab names transcribed to Arabic, including Arabic orthographic variants and common orthographic errors.\n\n*   [Dictionary of Arabic Place Names](http://www.kanji.org/cjk/arabic/dapna.htm) (DAPNA).  \n    A database of Arabic-English place names including systematic coverage for orthographic variants and common orthographic errors.\n\n\n\n\n\n\n\n\n### Question answering \n- [Documents](http://users.dsic.upv.es/~ybenajiba/resources/):: more than 11,000 Arabic Wikipedia Articles in SGML format (the format adopted in the CLEF and also the one accepted by the JIRS system). \t\n- [List of Questions]](http://users.dsic.upv.es/~ybenajiba/resources/):: This is a list of 200 questions of different types. The proportion of each type of questions is the same proportion adopted in CLEF. \t\n- [List of Correct Answers]](http://users.dsic.upv.es/~ybenajiba/resources/):: For each of the questions presented in my list of questions, I give you here a list of correct answers for each question. This list is very important for automatic evaluation. \t\n### Ontologies\n#### SEMANTIC ONTOLOGIES\n- [Arabic Wordnet](http://www.globalwordnet.org/AWN/)\n[Arabic VerbNet](https://github.com/JaouadMousser/Arabic-Verbnet) Arabic Verbnet is a lage scale verb lexicon that classifies verbs in Arabic using syntactic alternations inspired by the work of Kipper Schuler (2005) on English VerbNet. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinuxscout%2Farabicnlptoolslist","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinuxscout%2Farabicnlptoolslist","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinuxscout%2Farabicnlptoolslist/lists"}