{"id":41848001,"url":"https://github.com/camel-lab/arabic_error_type_annotation","last_synced_at":"2026-01-25T10:05:41.235Z","repository":{"id":43379256,"uuid":"379480159","full_name":"CAMeL-Lab/arabic_error_type_annotation","owner":"CAMeL-Lab","description":"The Arabic Error Type Annotation tool aims to annotate Arabic error types following the ALC tagset annotation.","archived":false,"fork":false,"pushed_at":"2022-10-28T07:22:26.000Z","size":5307,"stargazers_count":11,"open_issues_count":1,"forks_count":5,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-09-09T22:06:39.060Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CAMeL-Lab.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-06-23T04:38:59.000Z","updated_at":"2025-08-19T10:34:50.000Z","dependencies_parsed_at":"2023-01-20T07:34:11.027Z","dependency_job_id":null,"html_url":"https://github.com/CAMeL-Lab/arabic_error_type_annotation","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CAMeL-Lab/arabic_error_type_annotation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CAMeL-Lab%2Farabic_error_type_annotation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CAMeL-Lab%2Farabic_error_type_annotation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CAMeL-Lab%2Farabic_error_type_annotation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CAMeL-Lab%2Farabic_error_type_annotation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CAMeL-Lab","download_url":"https://codeload.github.com/CAMeL-Lab/arabic_error_type_annotation/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CAMeL-Lab%2Farabic_error_type_annotation/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28751112,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-25T09:58:17.166Z","status":"ssl_error","status_checked_at":"2026-01-25T09:55:56.104Z","response_time":113,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-25T10:05:40.401Z","updated_at":"2026-01-25T10:05:41.213Z","avatar_url":"https://github.com/CAMeL-Lab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Arabic Error Type Annotation\n\n## Description\nThe Arabic Error Type Annotation tool (ARETA) aims to annotate Arabic error types following the Arabic Learner Corpus ([ALC](https://www.arabiclearnercorpus.com/)) tagset annotation. ARETA is described in [Automatic Error Type Annotation for Arabic (Belkebir and Habash, 2021)](https://aclanthology.org/2021.conll-1.47.pdf).\n## Installation\nYou will need Python 3.7 and above (64-bit).\n\n1. Install [CamelTools](https://github.com/CAMeL-Lab/camel_tools#install-using-pip).\n2. Install requirements.\n```\npip install -r requirements.txt\n```\n\n## Usage\n### Error Type Annotation:\n```\nUsage: annotate_err_type_ar.py [OPTIONS] --sys_path system --ref_path reference \nwhere\n    system - the system output\n    reference - the reference file\nOPTIONS\n    --show_edit_paths - whether to show the shortest edit paths or not; defaults to false.\n    --output_path - output file directory; defaults to standard output.\n    \n```\n\nExample:\n\n```\npython annotate_err_type_ar.py --sys_path sample/sys_sample.txt --ref_path sample/ref_sample.txt\n```\n\nThe output lists triplets of system, reference and error types. For the complete list of error types, see [table of error types](#table-of-error-types) below.\n\nExample:\n\nSystem sentence:  إن أمتحان الاستاذة صعبة\n\nReference sentence: إن إمتحان الأستاذ صعب\n\nAnnotation result:\n\n|      System    |   Reference      |  Error type     | \n|----------|---------|-------| \n| إن       | إن      | UC    | \n| أمتحان   | إمتحان  | OH    | \n| الاستاذة | الأستاذ | OH+XG | \n| صعبة     | صعب     | XG    | \n\n \n**Note**: It is important to note that every sentence in the system/reference outputs must start with an `s`. Refer to [sample/sys_sample.txt](https://github.com/CAMeL-Lab/arabic_error_type_annotation/tree/main/sample/sys_sample.txt) and [sample/ref_sample.txt](https://github.com/CAMeL-Lab/arabic_error_type_annotation/tree/main/sample/ref_sample.txt) for examples on how these files should look like.\n\n\n### Annotation and evaluation using m2 files (Command Line):\n```\nUsage: annotate_eval_ar.py [OPTIONS] system source_reference \nwhere\n    system - the system output\n    source_reference - source sentences with gold token edits (.m2 file)\nOPTIONS\n```\n\nExample:\n\n```\npython annotate_eval_ar.py sample/CLMB-1 sample/QALB-Test2014.m2\n```\n\nThis generates:\n1. ```annot_input_ref.tsv``` file in the ```output``` folder that contains  the error types annotation between the input and the reference.\n2. ```annot_input_sys.tsv``` file in the ```output``` folder that contains the error types annotation between the input and the system.\n3.  ```subclasses_results_CLMB-1.tsv``` file in the ```results``` folder. This file contains the results of the evaluation of the system's output against the reference.\n\n## Utilities\n### Generate source from m2 file\n```\nUsage: utilities/generate-m2-source.py m2-file\nwhere\n    m2-file -   the m2 file\n```\n\nExample:\n\n```python utilities/generate-m2-source.py sample/QALB-Test2014.m2  \u003e sample/QALB-Test2014.source.sent```\n\n### Generate reference from m2 file\n```\nUsage: utilities/generate-m2-reference.py m2-file\nwhere\n    m2-file -   the m2 file\n```\n\nExample:\n\n```python utilities/generate-m2-reference.py sample/QALB-Test2014.m2 \u003e sample/QALB-Test2014.reference.cor```\n\n\n### Adjust the alignment\n\nThis re_alignment tool realigns files from Ossama's basic aligner by shifting the word in null -\u003e word pair to the word before or after according to minimum edit distance.  \n```\nUsage: utilities/adjust_align_tool.py file_to_adjust_align\nwhere\n    file_to_adjust_align -   File to be realigned (should follow Ossama's basic alignement file format)\n```\n\nExample:\n\n```python utilities/adjust_align_tool.py sample/align.basic  \u003e sample/align.adjust```\n\n## Configuration\nIn the configuration file ```config.json```, the user should specify the mode of the morphological analyser. The default value is ```analyser``` in which all the analyses are considered. The second option is \n```mle``` and in this case, we need to specify the parameter ```mle_top``` which represents the maximum number of analyses to be considered. The ```uc``` parameter takes the values 0 or 1. 0 to indicate that we do not consider the unchanged error types, 1 otherwise.  \n\n```\n{\n  \"mode\": \"analyser\",\n  \"mle_top\": \"\"\n}\n```\n\n## Table of Error Types\n\n|               |           |                                                     |                            |                            | \n|---------------|-----------|-----------------------------------------------------|----------------------------|----------------------------| \n| **Class**         | **Sub-class** | **Description**                                         | **Arabic Example**             | **Buckwalter Transliteration** | \n| **Orthographic**  | OH        | Hamza error                                         | اكثر← أكثر                 | Akvr → \u003ekvr                | \n|               | OT        | Confusion in Ha and Ta Mutadarrifatin               | مشاركه ← مشاركة            | m$Arkh → m$Arkp            | \n|               | OA        | Confusuion in Alif and Ya Mutadarrifatin            | علي ← على                  | Ely → ElY                  | \n|               | OW        | Confusion in Alif Fariqa                            | وكانو ←  وكانوا            | wkAnw→ wkAnwA              | \n|               | ON        | Confusion Between Nun and Tanwin                    | ثوبن ← ثوبٌ                | vwbn → vwbN                | \n|               | OS        | Shortening the long vowels                          | أوقت ← أوقات               | \u003ewqt → \u003ewqAt               | \n|               | OG        | Lengthening the short vowels                        | نقيمو ← نقيم               | nqymw → nqym               | \n|               | OC        | Wrong order of word characters                      | تبرينا ← تربينا            | tbrynA → trbynA            | \n|               | OR        | Replacement in word character(s)                    | مصلنا ← وصلنا              | mSlnA → wSlnA              | \n|               | OD        | Additional character(s)                             | يعدوم ← يدوم               | yEdwm → ydwm               | \n|               | OM        | Missing character(s)                                | سالين ← سائلين             | sAlyn → sA}lyn             | \n|               | OO        | Other orthographic errors                           | -                          | -                          | \n| **Morphological** | MI        | Word inflection                                     | معروف ← عارف               | mErwf → EArf               | \n|               | MT        | Verb tense                                          | تفرحني ← أفرحتني           | tfrHny → \u003efrHtny           | \n|               | MO        | Other morphological errors                          | -                          | -                          | \n| **Syntax**        | XC        | Case                                                | رائع ← رائعاً              | rA}E → rA}EAF              | \n|               | XF        | Definiteness                                        | السن ← سن                  | Alsn → sn                  | \n|               | XG        | Gender                                              | الغربي ← الغربية           | Algrby → Algrbyp           | \n|               | XN        | Number                                              | فكرتي ← أفكاري             | fkrty → \u003efkAry             | \n|               | XT        | Unnecessary word                                    | على ←Null                  | ElY →Null                  | \n|               | XM        | Missing word                                        | Null ← على                 | Null → ElY                 | \n|               | XO        | Other syntactic errors                              | -                          | -                          | \n| **Semantic**      | SW        | Word selection error                                | من ← عن                    | mn → En                    | \n|               | SF        | Fasl wa wasl (confusion in conjunction use/non-use) | سبحان ← فسبحان             | sbHAn → fsbHAn             | \n|               | SO        | Other semantic errors                               | -                          | -                          | \n| **Punctuation**   | PC        | Punctuation confusion                               | المتوسط. ← المتوسط،        | AlmtwsT. → AlmtwsT،        | \n|               | PT        | Unnecessary punctuation                             | العام,  ← العام            | AlEAm,  → AlEAm            | \n|               | PM        | Missing punctuation                                 | العظيم ←  العظيم،          | AlEZym →  AlEZym،          | \n|               | PO        | Other errors in punctuation                         | -                          | -                          | \n|               |           |                                                     |                            |                            | \n| **Merge**         | MG        | Words are merged                                    | ذهبتالبارحة ← ذهبت البارحة | *hbtAlbArHp → *hbt AlbArHp | \n| **Split**         | SP        | Words are split                                     | المحا دثات ← المحادثات     | AlmHA dvAt → AlmHAdvAt     | \n\n## Citation\nIf you find ARETA useful in your research, please cite\n**Automatic Error Type Annotation for Arabic (Belkebir and Habash, 2021)** ([PDF](https://aclanthology.org/2021.conll-1.47.pdf)) ([BIB](https://aclanthology.org/2021.conll-1.47.bib)).\n\n## License\nThis tool is available under the MIT license. See the [LICENSE file](https://github.com/CAMeL-Lab/arabic_error_type_annotation/blob/main/LICENSE) for more info.\n\n## Contributors\n* [Riadh Belkebir](https://github.com/riadhb88)\n* [Nizar Habash](https://github.com/nizarhabash1)\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcamel-lab%2Farabic_error_type_annotation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcamel-lab%2Farabic_error_type_annotation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcamel-lab%2Farabic_error_type_annotation/lists"}