{"id":41847761,"url":"https://github.com/camel-lab/camelbert_morphosyntactic_tagger","last_synced_at":"2026-01-25T10:04:49.096Z","repository":{"id":93673539,"uuid":"475463690","full_name":"CAMeL-Lab/CAMeLBERT_morphosyntactic_tagger","owner":"CAMeL-Lab","description":"Code, models, and data for \"Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects\". Findings of ACL, 2022.","archived":false,"fork":false,"pushed_at":"2022-06-03T09:06:50.000Z","size":35,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-09-09T22:06:38.556Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CAMeL-Lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-03-29T13:48:35.000Z","updated_at":"2024-11-16T13:26:41.000Z","dependencies_parsed_at":null,"dependency_job_id":"f52e49a5-3182-4aaf-a771-013b041609ba","html_url":"https://github.com/CAMeL-Lab/CAMeLBERT_morphosyntactic_tagger","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/CAMeL-Lab/CAMeLBERT_morphosyntactic_tagger","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CAMeL-Lab%2FCAMeLBERT_morphosyntactic_tagger","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CAMeL-Lab%2FCAMeLBERT_morphosyntactic_tagger/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CAMeL-Lab%2FCAMeLBERT_morphosyntactic_tagger/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CAMeL-Lab%2FCAMeLBERT_morphosyntactic_tagger/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CAMeL-Lab","download_url":"https://codeload.github.com/CAMeL-Lab/CAMeLBERT_morphosyntactic_tagger/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CAMeL-Lab%2FCAMeLBERT_morphosyntactic_tagger/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28751108,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-25T09:58:17.166Z","status":"ssl_error","status_checked_at":"2026-01-25T09:55:56.104Z","response_time":113,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-25T10:04:48.947Z","updated_at":"2026-01-25T10:04:49.090Z","avatar_url":"https://github.com/CAMeL-Lab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CAMeLBERT_morphosyntactic_tagger\nCodebase for \"[Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects](https://aclanthology.org/2022.findings-acl.135/)\". Findings of ACL, 2022.\n\nSome of the models are already part of the newer version of [CAMeL Tools](https://github.com/CAMeL-Lab/camel_tools). Please check out the repository if you want to try out our tagger! Currently, unfactored MSA, EGY, GLF, and LEV models are available through CAMeL Tools.\n\n## Requirements\n```bash\ngit clone https://github.com/CAMeL-Lab/CAMeLBERT_morphosyntactic_tagger.git\ncd CAMeLBERT_morphosyntactic_tagger\n\nconda create -n CAMeLBERT_morphosyntactic_tagger python=3.7\nconda activate CAMeLBERT_morphosyntactic_tagger\n\npip install -r requirements.txt\n\n# install the latest camel tools\ngit clone https://github.com/CAMeL-Lab/camel_tools.git\ncd camel_tools\n# Install from source\npip install -e .\n# download models\ncamel_data -i disambig-bert-unfactored-all\n```\n\n## Example: How to tag a sentence\n```python\nfrom camel_tools.tokenizers.word import simple_word_tokenize\nfrom camel_tools.disambig.bert import BERTUnfactoredDisambiguator\n\n# MSA\nunfactored = BERTUnfactoredDisambiguator.pretrained(model_name='msa')\n\ntext = simple_word_tokenize('كيف حالك ؟')\n\n# tag with the analyzer\nunfactored.tag_sentence(text)\n\n# without the analyzer\nunfactored.tag_sentence(text, use_analyzer=False)\n```\n* **Important Note**: The morphological analyzer used in the example is not the same as the one in the paper, which is licensed by LDC. You can download the same morphogical analyzer [here](https://github.com/CAMeL-Lab/CAMeLBERT_morphosyntactic_tagger/releases/tag/v0.0.1). To use this analyzer in CAMeL-Tools, you will need to initialize the model as follows:\n  ```python\n  from camel_tools.disambig.bert import BERTUnfactoredDisambiguator\n  from camel_tools.morphology.database import MorphologyDB\n  from camel_tools.morphology.analyzer import Analyzer\n\n  # MSA\n  db = MorphologyDB(\"/PATH/TO/DB\", 'a')\n  analyzer = Analyzer(db, 'ADD_PROP', cache_size=100000)\n\n  # Make sure to set pretrained_cache=False if you're not using the default analyzer\n  unfactored = BERTUnfactoredDisambiguator.pretrained(model_name='msa', pretrained_cache=False)\n  # Use the specified analyzer instead of the default one in CAMeL-Tools\n  unfactored._analyzer = analyzer\n  ```\n\n## Experiments\nThis repo is organized as follows:\n- [data](https://github.com/CAMeL-Lab/CAMeLBERT_morphosyntactic_tagger/releases/tag/v0.0.1): models and preprocessed datasets used in our experiments.\n- [scripts](https://github.com/CAMeL-Lab/CAMeLBERT_morphosyntactic_tagger/tree/main/scripts): scripts used to fine-tune [CAMeLBERT-MSA](https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-msa) and [CAMeLBERT-Mix](https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-mix) for morphosyntactic tagging task.\n \n\n## Citation\n\n```bibtex\n@inproceedings{inoue-etal-2022-morphosyntactic,\n    title = \"Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects\",\n    author = \"Inoue, Go  and\n      Khalifa, Salam  and\n      Habash, Nizar\",\n    booktitle = \"Proceedings of the Findings of the Association for Computational Linguistics: ACL2022\",\n    month = may,\n    year = \"2022\",\n    address = \"Dublin, Ireland\",\n    publisher = \"Association for Computational Linguistics\",\n    abstract = \"We present state-of-the-art results on morphosyntactic tagging across different varieties of Arabic using fine-tuned pre-trained transformer language models. Our models consistently outperform existing systems in Modern Standard Arabic and all the Arabic dialects we study, achieving 2.6% absolute improvement over the previous state-of-the-art in Modern Standard Arabic, 2.8% in Gulf, 1.6% in Egyptian, and 8.3% in Levantine. We explore different training setups for fine-tuning pre-trained transformer language models, including training data size, the use of external linguistic resources, and the use of annotated data from other dialects in a low-resource scenario. Our results show that strategic fine-tuning using datasets from other high-resource dialects is beneficial for a low-resource dialect Additionally, we show that high-quality morphological analyzers as external linguistic resources are beneficial especially in low-resource settings.\"\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcamel-lab%2Fcamelbert_morphosyntactic_tagger","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcamel-lab%2Fcamelbert_morphosyntactic_tagger","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcamel-lab%2Fcamelbert_morphosyntactic_tagger/lists"}