{"id":17718595,"url":"https://github.com/linuxscout/adawat","last_synced_at":"2025-09-23T10:31:21.730Z","repository":{"id":138754528,"uuid":"164723301","full_name":"linuxscout/adawat","owner":"linuxscout","description":"Adawat: Arabic Text tools","archived":false,"fork":false,"pushed_at":"2020-08-27T19:07:49.000Z","size":82,"stargazers_count":25,"open_issues_count":0,"forks_count":4,"subscribers_count":4,"default_branch":"master","last_synced_at":"2023-03-11T10:12:22.911Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/linuxscout.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.md","dei":null,"publiccode":null,"codemeta":null},"funding":{"patreon":"linuxscout"}},"created_at":"2019-01-08T20:00:53.000Z","updated_at":"2023-03-01T11:28:21.000Z","dependencies_parsed_at":null,"dependency_job_id":"4dd52441-ced8-469a-a71d-899f786ba598","html_url":"https://github.com/linuxscout/adawat","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fadawat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fadawat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fadawat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fadawat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/linuxscout","download_url":"https://codeload.github.com/linuxscout/adawat/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233966463,"owners_count":18758473,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-25T14:55:00.770Z","updated_at":"2025-09-23T10:31:16.350Z","avatar_url":"https://github.com/linuxscout.png","language":"Python","funding_links":["https://patreon.com/linuxscout"],"categories":[],"sub_categories":[],"readme":"# Adawat: Arabic Language Toolkit\n\n# مكتبة أدوات اللغة العربية\nAdawat: Arabic Language Toolkit\n\n![adawat logo](doc/adawat_header.png  \"adawat logo\")\n\n![PyPI - Downloads](https://img.shields.io/pypi/dm/adawat)\n\n\n  Developpers:  Taha Zerrouki: http://tahadz.com\n    taha dot zerrouki at gmail dot com\n\n  \nFeatures |   value\n---------|---------------------------------------------------------------------------------\nAuthors  | [Authors.md](https://github.com/linuxscout/adawat/master/AUTHORS.md)\nRelease  | 0.1\nLicense  |[GPL](https://github.com/linuxscout/adawat/master/LICENSE)\nTracker  |[linuxscout/adawat/Issues](https://github.com/linuxscout/adawat/issues)\nSource  |[Github](http://github.com/linuxscout/adawat)\nFeedbacks  |[Comments](https://github.com/linuxscout/adawat/)\nAccounts  |[@Twitter](https://twitter.com/linuxscout))\n\n## Description\n\nAdawat: Arabic Language Toolkit\n\n\n###  مزايا:\n تجمع هذه المكتبة كل الأدوات المستعملة في معالجة النص العربي\n مثل:\n \n* التشكيل\n  * تشكيل النص العربي، يستحسن استعمال مكتبة مشكال، أو برنامج مشكال\n\n  * تشكيل مع اقتراحات تشكيلات أخرى لكل كلمة\n  * اختزال الحركات من النص المشكول\n  * إزالة التشكيل\n  * مقارنة جملة مشكولة يدويا مع ما ينتج عن برنامج التشكيل\n* وظائف التحويل\n  * نقحرة النص العربي بحروف لاتينية\n  * تعريب نص مكتوب بحروف لاتينية\n  * قلب نص\n  * تفقيط: تحويل عدد إلى نص\n  * تنميط النص: توحيد الهمزات والألفات\n  * فك تشابك الحروف العربية\n* التحليل والتوليد\n  * تحليل صرفي للنص\n  * تفريق النص إلى كلمات وعلامات\n  * تصنيف الكلمات إلى اسم وفعل وحرف\n  * توليد كل الأشكال المختلفة للكلمة\n* استخلاص\n  * استخلاص المتلازمات اللفظية\n  * كشف اللغات المختلفة\n  * استخلاص المسميات\n  * استخلاص العبارات العددية\n* متفرقات\n  * ضبط قصيدة شعرية عمودية\n  * توليد نص عشوائي\n## Features\n\n* Tashkeel\n  * tashkeel     : vocalize text, we recomand to use mishkal-console instead.\n  * tashkeel    with suggestions for every word.\n  * reduce       : strip unnecessary tashkeel from avocalized text \n  * strip        : remove all harakat and shadda\n  * compare      : Compare Tashkeel between input text and the automatic vocalized text\n* Transformation and Converion\n  * romanize     : convert an arabic script text to latin representation\n  * arabize      : convert an transliterated arabic script text to arabic\n  * inverse      : inverse text\n  * numbers to words     : convert numeric value to words\n  * normalize    : normalize letters in arabic text\n  * unshape      : unshape arabic letters\n* Analysis and generation\n  * stem         : morphology analysis of given texts \n  * tokenize     : tokenize a text to words\n  * wordtag      : classify words into (nouns, verbs, stopwords)\n  * affixate     : generate all word forms by affixation\n* Extraction\n  * collocation  : extract collocations from text \n  * language     : detect arabic and latin clauses in text\n  * named        : extract named enteties from text\n  * numbered     : extarct numbred clauses from text\n* Divers\n  * affixate     : generate all word forms by affixation\n  * poetry       : format poetry texts to columns poetry\n  * random       : get a random text\n\n## Citation\n\n```bibtex\n@thesis{zerrouki2020adawat,\nauthor = {Taha Zerrouki},\ntitle = {Towards An Open Platform For Arabic Language Processing},\ntype = {PhD thesis},\ninstitution = {Ecole Nationale Supérieure d'informatique, Alger, Algérie},\ndate = {2020},\n}\n```\n\n### Usage\n\n### install\n```shell\npip install adawat\n```\n\n#### import\n```python\n\u003e\u003e\u003e import adawat.adaat\n```\n## Examples\n\nDetailed examples and features in [Features](doc/features.md) \n\n### Tashkeel\n* tashkeel     : vocalize text, we recomand to use mishkal-console instead.\n* tashkeel    with suggestions for every word.\n* reduce       : strip unnecessary tashkeel from avocalized text \n* strip        : remove all harakat and shadda\n* compare      : Compare Tashkeel between input text and the automatic vocalized text\n\n```python\n\u003e\u003e\u003e lastmark = True\n\u003e\u003e\u003e text = u\"تطلع الشمس صباحا\"\n\u003e\u003e\u003e adawat.adaat.tashkeel_text(text, lastmark)\n' تَطْلُعُ الشَّمْسُ صَبَاحًا'\n\n```\n\n#### [requirement]\n```\nasmai\u003e=0.1\nmishkal\u003e=0.3\nnaftawayh\u003e=0.4\npyarabic\u003e=0.6.8\nqalsadi\u003e=0.3.6\nrepr\u003e=0.3.1\nsylajone\u003e=0.2\ntashaphyne\u003e=0.3.4.1\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinuxscout%2Fadawat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinuxscout%2Fadawat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinuxscout%2Fadawat/lists"}