{"id":17718559,"url":"https://github.com/linuxscout/qalsadi","last_synced_at":"2025-03-02T13:01:43.500Z","repository":{"id":59944368,"uuid":"82103527","full_name":"linuxscout/qalsadi","owner":"linuxscout","description":"Qalsadi: Arabic mophological analyzer Library for python.","archived":false,"fork":false,"pushed_at":"2024-09-02T10:45:47.000Z","size":5764,"stargazers_count":34,"open_issues_count":3,"forks_count":8,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-10-14T04:08:01.551Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/linuxscout.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"patreon":"linuxscout"}},"created_at":"2017-02-15T20:30:57.000Z","updated_at":"2024-09-02T10:45:52.000Z","dependencies_parsed_at":"2023-02-12T11:15:48.177Z","dependency_job_id":"23f27ea4-2f96-48f8-bb97-568a4de85cd5","html_url":"https://github.com/linuxscout/qalsadi","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fqalsadi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fqalsadi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fqalsadi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fqalsadi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/linuxscout","download_url":"https://codeload.github.com/linuxscout/qalsadi/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240286511,"owners_count":19777353,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-25T14:54:51.391Z","updated_at":"2025-02-23T08:11:07.404Z","avatar_url":"https://github.com/linuxscout.png","language":"Python","funding_links":["https://patreon.com/linuxscout"],"categories":[],"sub_categories":[],"readme":"# Qalsadi Arabic Morphological Analyzer and Lemmatizer for Python\n\nالمكتبة البرمجية [القلصادي](https://github.com/linuxscout/qalsadi)  أداة متخصصة في التحليل الصرفي للنصوص العربية. تعتمد على قاعدة بيانات معجمية لتحليل النصوص سواء كانت مشكولة جزئياً أو كلياً. تقدم هذه المكتبة تشكيل الكلمات وتحليلها الصرفي، بالإضافة إلى تقييم درجة شيوع الكلمة في اللغة العربية المعاصرة.\n\nمتوفرة للتجربة على موقع [مشكال](http://tahadz.com/mishkal)، قسم  أدوات/تحليل\n\n[Qalsadi](https://github.com/linuxscout/qalsadi) library is a specialized tool for morphological analysis of Arabic texts. It uses a lexical database to analyze fully or partially vocalized texts, providing both morphological analysis and diacritics. Additionally, it evaluates the frequency of word usage in contemporary Arabic and uses the \"Qutrub\" tool for verb conjugation.\n\nThe demo is available on [Mishkal](http://Tahadz.com/mishkal/ \u003eTools/َAnalysis\n\n  Developpers:  Taha Zerrouki: http://tahadz.com\n    taha dot zerrouki at gmail dot com\n\nFeatures  |   value\n----------|---------------------------------------------------------------------------------\nAuthors   | [Authors.md](https://github.com/linuxscout/qalsadi/master/AUTHORS.md)\nRelease   | 0.5 \nLicense   |[GPL](https://github.com/linuxscout/qalsadi/master/LICENSE)\nTracker   |[linuxscout/qalsadi/Issues](https://github.com/linuxscout/qalsadi/issues)\nWebsite   |[https://pypi.python.org/pypi/qalsadi](https://pypi.python.org/pypi/qalsadi)\nDoc       |[package Documentaion](https://qalsadi.readthedocs.io/)\nSource    |[Github](http://github.com/linuxscout/qalsadi)\nDownload  |[sourceforge](http://qalsadi.sourceforge.net)\nFeedbacks |[Comments](http://tahadz.com/qalsadi/contact)\nAccounts  |[@Twitter](https://twitter.com/linuxscout)  [@Sourceforge](http://sourceforge.net/projects/qalsadi/)\n\n\n\n## Citation\nIf you would cite it in academic work, can you use this citation\n```\nT. Zerrouki‏, Qalsadi, Arabic mophological analyzer Library for python.,  https://pypi.python.org/pypi/qalsadi/\n```\nAnother Citation:\n```\nZerrouki, Taha. \"Towards An Open Platform For Arabic Language Processing.\" (2020).\n```\nor in bibtex format\n\n```bibtex\n@misc{zerrouki2012qalsadi,\n  title={qalsadi, Arabic mophological analyzer Library for python.},\n  author={Zerrouki, Taha},\n  url={https://pypi.python.org/pypi/qalsadi},\n  year={2012}\n}\n\n```bibtex\n@thesis{zerrouki2020towards,\n  title={Towards An Open Platform For Arabic Language Processing},\n  author={Zerrouki, Taha},\n  year={2020}\n}\n\n```\n\n\n## Features  مزايا\n - Lemmatization\n - Vocalized Text Analyzer, \n - Use Qutrub library to analyze verbs.\n - give word frequency in Arabic modern use.\n\n### Applications\n\n* Stemming texts\n* Text Classification and categorization\n* Sentiment Analysis\n* Named Entities Recognition\n\n### Installation\n\n```\npip install qalsadi\n```\n#### Requirements\n\n``` \npip install -r requirements.txt \n```\n\n## Usage\n### Demo\nThe demo is available on [Tahadz.com](http://tahadz.com/mishkal) \u003eTools/َAnalysis قسم أدوات - تحليل\n### Example \n#### Lemmatization\n```python\n\u003e\u003e\u003e import qalsadi.lemmatizer \n\u003e\u003e\u003e text = u\"\"\"هل تحتاج إلى ترجمة كي تفهم خطاب الملك؟ اللغة \"الكلاسيكية\" (الفصحى) موجودة في كل اللغات وكذلك اللغة \"الدارجة\" .. الفرنسية التي ندرس في المدرسة ليست الفرنسية التي يستخدمها الناس في شوارع باريس .. وملكة بريطانيا لا تخطب بلغة شوارع لندن .. لكل مقام مقال\"\"\"\n\u003e\u003e\u003e lemmer = qalsadi.lemmatizer.Lemmatizer()\n\u003e\u003e\u003e # lemmatize a word\n... lemmer.lemmatize(\"يحتاج\")\n'احتاج'\n\u003e\u003e\u003e # lemmatize a word with a specific pos\n\u003e\u003e\u003e lemmer.lemmatize(\"وفي\")\n'في'\n\u003e\u003e\u003e lemmer.lemmatize(\"وفي\", pos=\"v\")\n'وفى'\n\n\u003e\u003e\u003e lemmas = lemmer.lemmatize_text(text)\n\u003e\u003e\u003e print(lemmas)\n['هل', 'احتاج', 'إلى', 'ترجمة', 'كي', 'تفهم', 'خطاب', 'ملك', '؟', 'لغة', '\"', 'كلاسيكي', '\"(', 'فصحى', ')', 'موجود', 'في', 'كل', 'لغة', 'ذلك', 'لغة', '\"', 'دارج', '\"..', 'فرنسي', 'التي', 'درس', 'في', 'مدرسة', 'ليست', 'فرنسي', 'التي', 'استخدم', 'ناس', 'في', 'شوارع', 'باريس', '..', 'ملك', 'بريطانيا', 'لا', 'خطب', 'بلغة', 'شوارع', 'دنو', '..', 'كل', 'مقام', 'مقالي']\n\u003e\u003e\u003e # lemmatize a text and return lemma pos\n... lemmas = lemmer.lemmatize_text(text, return_pos=True)\n\u003e\u003e\u003e print(lemmas)\n[('هل', 'stopword'), ('احتاج', 'verb'), ('إلى', 'stopword'), ('ترجمة', 'noun'), ('كي', 'stopword'), ('تفهم', 'noun'), ('خطاب', 'noun'), ('ملك', 'noun'), '؟', ('لغة', 'noun'), '\"', ('كلاسيكي', 'noun'), '\"(', ('فصحى', 'noun'), ')', ('موجود', 'noun'), ('في', 'stopword'), ('كل', 'stopword'), ('لغة', 'noun'), ('ذلك', 'stopword'), ('لغة', 'noun'), '\"', ('دارج', 'noun'), '\"..', ('فرنسي', 'noun'), ('التي', 'stopword'), ('درس', 'verb'), ('في', 'stopword'), ('مدرسة', 'noun'), ('ليست', 'stopword'), ('فرنسي', 'noun'), ('التي', 'stopword'), ('استخدم', 'verb'), ('ناس', 'noun'), ('في', 'stopword'), ('شوارع', 'noun'), ('باريس', 'all'), '..', ('ملك', 'noun'), ('بريطانيا', 'noun'), ('لا', 'stopword'), ('خطب', 'verb'), ('بلغة', 'noun'), ('شوارع', 'noun'), ('دنو', 'verb'), '..', ('كل', 'stopword'), ('مقام', 'noun'), ('مقالي', 'noun')]\n\n\u003e\u003e\u003e # Get vocalized output lemmas\n\u003e\u003e\u003e lemmer.set_vocalized_lemma()\n\u003e\u003e\u003e lemmas = lemmer.lemmatize_text(text)\n\u003e\u003e\u003e print(lemmas)\n['هَلْ', 'اِحْتَاجَ', 'إِلَى', 'تَرْجَمَةٌ', 'كَيْ', 'تَفَهُّمٌ', 'خَطَّابٌ', 'مَلَكٌ', '؟', 'لُغَةٌ', '\"', 'كِلاَسِيكِيٌّ', '\"(', 'فُصْحَى', ')', 'مَوْجُودٌ', 'فِي', 'كُلَّ', 'لُغَةٌ', 'ذَلِكَ', 'لُغَةٌ', '\"', 'دَارِجٌ', '\"..', 'فَرَنْسِيّ', 'الَّتِي', 'دَرَسَ', 'فِي', 'مَدْرَسَةٌ', 'لَيْسَتْ', 'فَرَنْسِيّ', 'الَّتِي', 'اِسْتَخْدَمَ', 'نَاسٌ', 'فِي', 'شَوَارِعٌ', 'باريس', '..', 'مَلَكٌ', 'برِيطانِيا', 'لَا', 'خَطَبَ', 'بَلَغَةٌ', 'شَوَارِعٌ', 'أَدَانَ', '..', 'كُلَّ', 'مَقَامٌ', 'مَقَالٌ']\n\u003e\u003e\u003e \n```\n\n#### Morphology analysis\n``` python\nfilename=\"samples/text.txt\"\nimport qalsadi.analex as qa\ntry:\n    myfile=open(filename)\n    text=(myfile.read()).decode('utf8');\n\n    if text == None:\n        text=u\"السلام عليكم\"\nexcept:\n    text=u\"أسلم\"\n    print \" given text\"\n\ndebug=False;\nlimit=500\nanalyzer = qa.Analex()\nanalyzer.set_debug(debug);\nresult = analyzer.check_text(text);\nprint '----------------python format result-------'\nprint result\nfor i in range(len(result)):\n#       print \"--------تحليل كلمة  ------------\", word.encode('utf8');\n    print \"-------------One word detailed case------\";\n    for analyzed in  result[i]:\n        print \"-------------one case for word------\";\n        print repr(analyzed);\n```\n\n\n\n#### Output description\nCategory   | Applied on | feature              | example         a|شرح\n-----------|------------|----------------------|------------------|---\naffix      | all        | affix_key            | ال--َاتُ-       a|مفتاح الزوائد\naffix      | all        | affix                |                 a|الزوائد\ninput      | all        | word                 | البيانات        a|الكلمة المدخلة\ninput      | all        | unvocalized          |                 a|غير مشكول\nmorphology | noun       | tag_mamnou3          |0                a|ممنوع من الصرف\nmorphology | verb       | tag_confirmed        |0                a|خاصية الفعل المؤكد\nmorphology | verb       | tag_mood             |0                a|حالة الفعل المضارع (منصوب، مجزوم، مرفوع)\nmorphology | verb       | tag_pronoun          |0                a|الضمير\nmorphology | verb       | tag_transitive       |0                a|التعدي اللزوم\nmorphology | verb       | tag_voice            |0                a|البناء للمعلوم/ البناء للمجهول\nmorphology | noun       | tag_regular          |1                a|قياسي/ سماعي\nmorphology | noun/verb  | tag_gender           |3                a|النوع ( مؤنث مذكر)\nmorphology | verb       | tag_person           |4                a|الشخص (المتكلم الغائب المخاطب)\nmorphology | noun       | tag_number           |21               a|العدد(فرد/مثنى/جمع)\noriginal   | noun/verb  | freq                 |694644           a|درجة شيوع الكلمة\noriginal   | all        | original_tags        | (u              a|خصائص الكلمة الأصلية\noriginal   | all        | original             | بَيَانٌ         a|الكلمة الأصلية\noriginal   | all        | root                 | بين             a|الجذر\noriginal   | all        | tag_original_gender  | مذكر            a|جنس الكلمة الأصلية\noriginal   | noun       | tag_original_number  | مفرد            a|عدد الكلمة الأصلية\noutput     | all        | type                 | Noun:مصدر       a|نوع الكلمة\noutput     | all        | semivocalized        | الْبَيَانَات    a|الكلمة مشكولة بدون علامة الإعراب\noutput     | all        | vocalized            | الْبَيَانَاتُ   a|الكلمةمشكولة\noutput     | all        | stem                 | بيان            a|الجذع\nsyntax     | all        | tag_break            |0                a|الكلمة منفصلة عمّا قبلها\nsyntax     | all        | tag_initial          |0                a|خاصية نحوية، الكلمة في بداية الجملة\nsyntax     | all        | tag_transparent      |0                a|البدل\nsyntax     | noun       | tag_added            |0                a|خاصية نحوية، الكلمة مضاف\nsyntax     | all        | need                 |                 a|الكلمة تحتاج إلى كلمة أخرى (المتعدي، العوامل) غير منجزة\nsyntax     | tool       | action               |                 a|العمل\nsyntax     | tool       | object_type          |                 a|نوع المعمول، بالنسبة للعامل، مثلا اسم لحرف الجر\n\n#### Unsing Cache\nQalsadi can use Cache to speed up the process, there are 4 kinds of cache,\n\n* Memory cache\n* Pickle cache\n* Pickledb cache\n* CodernityDB cache.\n\nTo use one of it, you can see the followng examples:\n* Using a factory method\n```python\n\u003e\u003e\u003e import qalsadi.analex\n\u003e\u003e\u003e from qalsadi.cache_factory import Cache_Factory\n\u003e\u003e\u003e analyzer = qalsadi.analex.Analex()\n\u003e\u003e\u003e # list available cache names\n\u003e\u003e\u003e Cache_Factory.list()\n['', 'memory', 'pickle', 'pickledb', 'codernity']\n\u003e\u003e\u003e # configure cacher\n\u003e\u003e\u003e # configure path used to store the cache\n\u003e\u003e\u003e path = 'cache/qalsasicache.pickledb'\n\u003e\u003e\u003e cacher = Cache_Factory.factory(\"pickledb\", path)\n\u003e\u003e\u003e analyzer.set_cacher(cacher)\n\u003e\u003e\u003e # to enable the use of cacher\n\u003e\u003e\u003e analyzer.enable_allow_cache_use()\n```\n* Memory cache\n\n```python\n\u003e\u003e\u003e import qalsadi.analex\n\u003e\u003e\u003e analyzer = qalsadi.analex.Analex()\n\u003e\u003e\u003e # configure cacher\n\u003e\u003e\u003e import qalsadi.cache\n\u003e\u003e\u003e cacher = qalsadi.cache.Cache()\n\u003e\u003e\u003e analyzer.set_cacher(cacher)\n\u003e\u003e\u003e # to enable the use of cacher\n\u003e\u003e\u003e analyzer.enable_allow_cache_use()\n\u003e\u003e\u003e # to disable the use of cacher\n\u003e\u003e\u003e analyzer.disable_allow_cache_use()\n```\n* Pickle cache\n\n```python\n\u003e\u003e\u003e import qalsadi.analex\n\u003e\u003e\u003e from qalsadi.cache_pickle import Cache\n\u003e\u003e\u003e analyzer = qalsadi.analex.Analex()\n\u003e\u003e\u003e # configure cacher\n\u003e\u003e\u003e # configure path used to store the cache\n\u003e\u003e\u003e path = 'cache/qalsadiCache.pickle'\n\u003e\u003e\u003e cacher = Cache(path)\n\u003e\u003e\u003e analyzer.set_cacher(cacher)\n\u003e\u003e\u003e # to enable the use of cacher\n\u003e\u003e\u003e analyzer.enable_allow_cache_use()\n\n```\n* Pickledb cache\n\n```python\n\u003e\u003e\u003e import qalsadi.analex\n\u003e\u003e\u003e from qalsadi.cache_pickledb import Cache\n\u003e\u003e\u003e analyzer = qalsadi.analex.Analex()\n\u003e\u003e\u003e # configure cacher\n\u003e\u003e\u003e # configure path used to store the cache\n\u003e\u003e\u003e path = 'cache/qalsadiCache.pickledb'\n\u003e\u003e\u003e cacher = Cache(path)\n\u003e\u003e\u003e analyzer.set_cacher(cacher)\n\u003e\u003e\u003e # to enable the use of cacher\n\u003e\u003e\u003e analyzer.enable_allow_cache_use()\n\n```\n* CodernityDB cache\n\n\n```python\n\u003e\u003e\u003e import qalsadi.analex\n\u003e\u003e\u003e from qalsadi.cache_codernity import Cache\n\u003e\u003e\u003e analyzer = qalsadi.analex.Analex()\n\u003e\u003e\u003e # configure cacher\n\u003e\u003e\u003e # configure path used to store the cache\n\u003e\u003e\u003e path = 'cache'\n\u003e\u003e\u003e cacher = Cache(path)\n\u003e\u003e\u003e analyzer.set_cacher(cacher)\n\u003e\u003e\u003e # to enable the use of cacher\n\u003e\u003e\u003e analyzer.enable_allow_cache_use()\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinuxscout%2Fqalsadi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinuxscout%2Fqalsadi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinuxscout%2Fqalsadi/lists"}