{"id":41847129,"url":"https://github.com/camel-lab/camel_parser","last_synced_at":"2026-01-25T10:03:25.861Z","repository":{"id":202739789,"uuid":"708038513","full_name":"CAMeL-Lab/camel_parser","owner":"CAMeL-Lab","description":null,"archived":false,"fork":false,"pushed_at":"2025-08-06T11:57:13.000Z","size":106,"stargazers_count":14,"open_issues_count":1,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-09-09T22:06:15.786Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CAMeL-Lab.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-10-21T10:39:28.000Z","updated_at":"2025-08-06T11:57:17.000Z","dependencies_parsed_at":null,"dependency_job_id":"420f4759-48c1-4e5c-be15-c55b0384ca2c","html_url":"https://github.com/CAMeL-Lab/camel_parser","commit_stats":null,"previous_names":["camel-lab/camel_parser"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CAMeL-Lab/camel_parser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CAMeL-Lab%2Fcamel_parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CAMeL-Lab%2Fcamel_parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CAMeL-Lab%2Fcamel_parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CAMeL-Lab%2Fcamel_parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CAMeL-Lab","download_url":"https://codeload.github.com/CAMeL-Lab/camel_parser/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CAMeL-Lab%2Fcamel_parser/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28751076,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-25T09:58:17.166Z","status":"ssl_error","status_checked_at":"2026-01-25T09:55:56.104Z","response_time":113,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-25T10:03:11.118Z","updated_at":"2026-01-25T10:03:25.855Z","avatar_url":"https://github.com/CAMeL-Lab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"CamelParser\n=============\n\n.. image:: https://img.shields.io/pypi/l/camel-tools.svg\n   :target: https://opensource.org/licenses/MIT\n   :alt: MIT License\n\nIntroduction\n------------\n\nCamelParser is an open-source Python-based Arabic dependency parser targeting two popular \nArabic dependency formalisms, the Columbia Arabic Treebank (CATiB), and Universal Dependencies (UD).\n\nThe CamelParser pipeline handles the processing of raw text and produces tokenization, \npart-of-speech and rich morphological features. For disambiguation, users can choose between \nthe BERT unfactored disambiguator, or a lighter Maximum Likelihood Estimation (MLE) disambiguator, \nboth of which are included in CAMeL Tools. For dependency parsing, we use the SuPar Biaffine Dependency Parser.\n\n\nInstallation\n------------\n1. Clone this repo\n\n2. Set up a virtual environment using Python 3.11.13 (you can follow the tutorial `here \u003chttps://www.youtube.com/watch?si=g2zx-D0zdkL50knb\u0026v=31WU0Dhw4sk\u0026feature=youtu.be\u003e`_).\n\nCurrently, CamelParser does not work with later versions of python due to issues with some dependencies.\n\n3. Install the required packages:\n\n.. code-block:: bash\n\n    pip install -r requirements.txt\n\n4. Download dependency parsing models:\n\n.. code-block:: bash\n\n    python download_models.py\n\nCurrently, two Arabic script models, CATiB and UD, will be downloaded from the CAMeL Lab's \n`parser models collection \u003chttps://huggingface.co/collections/CAMeL-Lab/camelparser-654a3df21f70b3b5e72f95d9\u003e`_ \non Hugging Face. More models will be added soon!\n\nExamples\n--------\nThe CamelParser allows users to pass either a string or a file containing **one or more** sentences.\nBelow are examples using the different *string* inputs that CamelParser accepts. \nWe pass each example as a string using -s. \nHowever, we do recommend using the file method (-i) along with the path to the file\nwhen passing multiple sentences.\n\nYou can also refer to the **sample_starting_point.py** to use the parser in your code, or for more advanced usage:\n\n.. code-block:: bash\n\n    text_to_conll_cli.py\n\n    handle_multiple_texts.py\n\n    handle_multiple_conll_files.py\n\nPassing text\n\n.. code-block:: bash\n    \n    python text_to_conll_cli.py -f text -s \"جامعة نيويورك أبو ظبي تنشر أول أطلس لكوكب المريخ باللغة العربية.\"\n\nThe verbose version of the above example (default values are shown)\n\n.. code-block:: bash\n\n    python text_to_conll_cli.py -f text -b r13 -d bert -m catib -s \"جامعة نيويورك أبو ظبي تنشر أول أطلس لكوكب المريخ باللغة العربية.\"\n\nPassing preprocessed text (cleaned and whitespace tokenized)\n\n.. code-block:: bash\n    \n    python text_to_conll_cli.py -f preprocessed_text -s \"جامعة نيويورك أبو ظبي تنشر أول أطلس لكوكب المريخ باللغة العربية .\"\n\n*Note that the difference between the -f text and preprocessed_text parser input settings is*\n*that for text we use different utilities from CAMeL Tools to* \n`normalize unicode \u003chttps://camel-tools.readthedocs.io/en/latest/api/utils/normalize.html?highlight=normalize_unicode#camel_tools.utils.normalize.normalize_unicode\u003e`_, \n`dediactritize \u003chttps://camel-tools.readthedocs.io/en/latest/api/utils/dediac.html?highlight=dediac_ar\u003e`_, \n*clean the text using* \n`arclean \u003chttps://camel-tools.readthedocs.io/en/latest/api/utils/charmap.html?highlight=arclean#utility\u003e`_, \n*and perform* \n`whitespace tokenization \u003chttps://camel-tools.readthedocs.io/en/latest/api/tokenizers/word.html?highlight=simple_word_tokenize#camel_tools.tokenizers.word.simple_word_tokenize\u003e`_.\n\ntokenized is used when 1) the text has already been tokenized, and 2) only dependency relations are needed; \nthe POS tags and features will not be generated.\n\n.. code-block:: bash\n    \n    python text_to_conll_cli.py -f tokenized -s \"جامعة نيويورك أبو ظبي تنشر أول أطلس ل+ كوكب المريخ ب+ اللغة العربية .\"\n\ntokenized_tagged is used when the user has the tokens and POS tags. They should be passed as tuples.\n\n.. code-block:: bash\n    \n    python text_to_conll_cli.py -f tokenized_tagged -s \"(جامعة, NOM) (نيويورك, PROP) (أبو, PROP) (ظبي, PROP) (تنشر, VRB) (أول, NOM) (أطلس, NOM) (ل+, PRT) (كوكب, NOM) (المريخ, PROP) (ب+, PRT) (اللغة, NOM) (العربية, NOM) (., PNX)\"\n\n\nUsing a custom model\n------------------\nYou can use your own dependency parser models by\n\n1. placing the model in the models directory (this directory is created when you run download_models.py, but you can create it yourself)\n\n2. place your model in the models directory\n\n3. when running one of the scripts, add -m [model_name]. Just type the model name WITHOUT the path.\n\nExtending the code\n------------------\n\nYou can also use different parts of the code to create your own pipeline. \nThe handle_multiple_texts.py is an example of that. It can be used to parse a directory of text files, \nsaving the resulting CoNLL-X files to a given output directory.\n\nUsing another morphology database\n---------------------------------\n\nCurently, the CamelParser uses CAMeLTools' default morphology database, the morphology-db-msa-r13.\n\nFor our paper, we used the calima-msa-s31 database. To use this database, \nfollow these steps (note that you need an account with the LDC):\n\n\n1. Install camel_tools v1.5.6 or later (you can check this using camel_data -v)\n\n2. Download the camel data for the BERT unfactored (MSA) model, as well as the morphology database:\n\n.. code-block:: bash\n\n    camel_data -i morphology-db-msa-s31 \n    camel_data -i disambig-bert-unfactored-msa\n\n3. Download the LDC2010L01 from the ldc downloads:\n    - go to https://catalog.ldc.upenn.edu/organization/downloads\n    - search for LDC2010L01.tgz and download it\n\n4. DO NOT EXTRACT LDC2010L01.tgz! We'll use the following command from camel tools to install the db:\n\n.. code-block:: bash\n\n    camel_data -p morphology-db-msa-s31 /path/to/LDC2010L01.tgz\n\n5. When running the main script, use -b and pass calima-msa-s31.\n\nReproducing paper results\n-------------------------\n\nTo reproduce the results in our paper *CamelParser2.0: A State-of-the-Art Dependency Parser for Arabic*, please use the code found\nin the *paper_version* branch.\n\nCitation\n--------\n\nIf you find the CamelParser useful in your research, please cite\n\n.. code-block:: bibtex\n\n    @inproceedings{Elshabrawy:2023:camelparser,\n        title = \"{CamelParser2.0: A State-of-the-Art Dependency Parser for Arabic}\",\n        author = {Ahmed Elshabrawy and \n    Muhammed AbuOdeh and\n    Go Inoue and\n    Nizar Habash} ,\n        booktitle = {Proceedings of The First Arabic Natural Language Processing Conference (ArabicNLP 2023)},\n        year = \"2023\"\n    }","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcamel-lab%2Fcamel_parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcamel-lab%2Fcamel_parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcamel-lab%2Fcamel_parser/lists"}