{"id":17718550,"url":"https://github.com/linuxscout/tashaphyne","last_synced_at":"2025-10-08T09:09:56.811Z","repository":{"id":46116724,"uuid":"81855321","full_name":"linuxscout/tashaphyne","owner":"linuxscout","description":"Tashaphyne: Arabic Light Stemmer","archived":false,"fork":false,"pushed_at":"2024-09-02T10:23:46.000Z","size":862,"stargazers_count":100,"open_issues_count":0,"forks_count":22,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-09-19T04:04:14.739Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/linuxscout.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":"support/requirements.txt","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"patreon":"linuxscout"}},"created_at":"2017-02-13T18:09:43.000Z","updated_at":"2025-08-13T20:52:49.000Z","dependencies_parsed_at":"2022-07-21T22:33:37.413Z","dependency_job_id":"967491c1-db1c-4560-8cff-2c3af6cba439","html_url":"https://github.com/linuxscout/tashaphyne","commit_stats":{"total_commits":33,"total_committers":4,"mean_commits":8.25,"dds":0.09090909090909094,"last_synced_commit":"e7b34444691e9b6ee5147af6c51aaa0c1fd947c0"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/linuxscout/tashaphyne","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Ftashaphyne","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Ftashaphyne/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Ftashaphyne/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Ftashaphyne/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/linuxscout","download_url":"https://codeload.github.com/linuxscout/tashaphyne/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Ftashaphyne/sbom","scorecard":{"id":591568,"data":{"date":"2025-08-11","repo":{"name":"github.com/linuxscout/tashaphyne","commit":"56cafad8fa4458610f47558005d150d9e76d1a22"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.2,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/draft-pdf.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/draft-pdf.yml:11: update your workflow using https://app.stepsecurity.io/secureworkflow/linuxscout/tashaphyne/draft-pdf.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/draft-pdf.yml:13: update your workflow using https://app.stepsecurity.io/secureworkflow/linuxscout/tashaphyne/draft-pdf.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/draft-pdf.yml:19: update your workflow using https://app.stepsecurity.io/secureworkflow/linuxscout/tashaphyne/draft-pdf.yml/master?enable=pin","Info:   0 out of   2 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   1 third-party GitHubAction dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: GNU General Public License v3.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":7,"reason":"3 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2025-49 / GHSA-5rjg-fvgr-3xxf","Warn: Project is vulnerable to: GHSA-cx63-2mw6-8hw5","Warn: Project is vulnerable to: PYSEC-2022-43012 / GHSA-r9hx-vwmv-q579"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-20T22:04:20.455Z","repository_id":46116724,"created_at":"2025-08-20T22:04:20.455Z","updated_at":"2025-08-20T22:04:20.455Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276754043,"owners_count":25698829,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-24T02:00:09.776Z","response_time":97,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-25T14:54:48.890Z","updated_at":"2025-10-08T09:09:56.762Z","avatar_url":"https://github.com/linuxscout.png","language":"Python","funding_links":["https://patreon.com/linuxscout"],"categories":[],"sub_categories":[],"readme":"# Tashaphyne\n![downloads](https://img.shields.io/pypi/dm/tashaphyne?style=plastic)\n\n\n**Tashaphyne**: Arabic Light Stemmer تاشفين: التجذيع الخفيف للنصوص العربية\n\n[تاشفين](https://github.com/linuxscout/tashaphyne) برنامج تجذيع عربي خفيف ومقطع للكلمات. يدعم بشكل أساسي التجذيع الخفيف (إزالة السوابق واللواحق) ويعطي الجذوع  الممكنة. يستخدم ألة ذات وضعيات محدودة معدّلة، مما يسمح له باستخلاص كل الجذوع الممكنة.\n\nيوفر تاشفين استخلاص الجذع والجذر من الكلمة في نفس الوقت، على عكس برامج التجذيع مثل Khoja وISRI وAssem وFarasa.\n\n**تاشفين** يأتي بقائمة افتراضية للسوابق واللواحق، ويقبل استخدام قوائم مخصصة للزوائد، مما يسمح له بالتعامل مع المزيد من الجوانب الصرفية، وإنشاء زوائد مخصصة دون تغيير الكود.\n\n**تاشفين** هي مكتبة بايثون، وهي متاحة للتجربة في برنامج مشكال على [Mishkal](http://tahadz.com/mishkal)، اختر أدوات/تحليل والمصدر مفتوح على [Github](http://github.com/linuxscout/tashaphyne)\n**Tashaphyne** is an Arabic light stemmer and segmentor. It mainly supports light stemming (removing prefixes and suffixes) and gives all possible segmentations. It uses a modified finite state automaton, which allows it to generate all segmentations.\n\nIt offers stemming and root extraction at the same time, unlike the Khoja stemmer, ISRI stemmer, Assem stemmer, and Farasa stemmer.\n\n**Tashaphyne** comes with default prefixes and suffixes, and accepts the use of customized prefixes and suffixes lists, which allow it to handle more aspects and make customized stemmers without changing code.\n\n**Tashaphyne** is a python library, it's available as a demo on  [Mishkal](http://tahadz.com/mishkal), choose Tools/Analysis and as source code on [Github](http://github.com/linuxscout/tashaphyne) \n\n  Developpers:  Taha Zerrouki: http://tahadz.com\n    taha dot zerrouki at gmail dot com\n\n---------|---------------------------------------------------------------------------------\nFeatures |   value\n---------|---------------------------------------------------------------------------------\nAuthors  | [Authors.md](https://github.com/linuxscout/tashaphyne/master/AUTHORS.md)\nRelease  | 0.3.7 \nLicense  |[GPL](https://github.com/linuxscout/tashaphyne/master/LICENSE)\nTracker  |[linuxscout/tashaphyne/Issues](https://github.com/linuxscout/tashaphyne/issues)\nWebsite  |[https://pypi.python.org/pypi/Tashaphyne](https://pypi.python.org/pypi/Tashaphyne)\nDoc      |[package Documentaion](https://tashaphyne.readthedocs.io/)\nSource  |[Github](http://github.com/linuxscout/tashaphyne)\nDownload  |[sourceforge](http://tashaphyne.sourceforge.net)\nFeedbacks  |[Comments](http://tahadz.com/contact.html)\nAccounts  |[@Twitter](https://twitter.com/linuxscout)  [@Sourceforge](http://sourceforge.net/projects/tashaphyne/)\n\n\n\n## Citation\nIf you would cite it in academic work, can you use this citation\n\n* T. Zerrouki‏, **Tashaphyne, Arabic light stemmer**‏,  https://pypi.python.org/pypi/Tashaphyne/0.2\n* Zerrouki, T. (2024). **Tashaphyne: A python package for arabic light stemming**. Journal of Open Source\nSoftware, 9(93), 6063.  doi: http://doi.org/10.21105/joss.06063\n*  Alkhatib, R. M., Zerrouki, T., Shquier, M. M. A., \u0026 Balla, A. (2023). **Tashaphyne0.4: A new arabic light\nstemmer based on rhyzome modeling approach**. Information Retrieval Journa, 26(14). doi: https://doi.org/10.1007/s10791-023-09429-y\n* Alkhatib, R. M., Zerrouki, T., Shquier, M. M. A., Balla, A., \u0026 Al-Khateeb, A. (2021). **A new enhanced arabic light stemmer for ir in medical documents**. CMC-COMPUTERS MATERIALS \u0026 CONTINUA, 68(1), 1255–1269. \n\nor in bibtex format\n```bibtex\n@misc{zerrouki2012tashaphyne,\ntitle={Tashaphyne, Arabic light stemmer},\nauthor={Zerrouki, Taha},\nurl={https://pypi.python.org/pypi/Tashaphyne/0.2},\nyear={2012}\n}\n```\n\n** bibtex\n```bibtex\n@article{Zerrouki2024,\n\ttitle        = {Tashaphyne: A Python package for Arabic Light Stemming},\n\tauthor       = {Taha Zerrouki},\n\tyear         = 2024,\n\tjournal      = {Journal of Open Source Software},\n\tpublisher    = {The Open Journal},\n\tvolume       = 9,\n\tnumber       = 93,\n\tpages        = 6063,\n\tdoi          = {10.21105/joss.06063},\n\turl          = {https://doi.org/10.21105/joss.06063}\n}\n```\n\n```bibtex\n@article{raed20223,\n  title={Tashaphyne0.4: a new arabic light stemmer based  on rhyzome modeling approach},\n  author={Alkhatib, Read M and Zerrouki, Taha and Shquier, Mohammed M Abu and Balla, Amar},\n  journal={Information Retrieval Journa},\n  year={2023},\n  pages={},\n  volume={26},\n  number={14}, \n  doi={https://doi.org/10.1007/s10791-023-09429-y}\n}\n\n@article{raed2021,\n  title={A New Enhanced Arabic Light Stemmer for IR in Medical Documents},\n  author={Alkhatib, Read M and Zerrouki, Taha and Shquier, Mohammed M Abu and Balla, Amar and Al-Khateeb, Asef},\n  journal={CMC-COMPUTERS MATERIALS \\\u0026 CONTINUA},\n  year={2021},\n  pages={1255-1269},\n  volume={68},\n  number={1}\n}\n```\n\n\n##   مزايا\n - تجذيع الكلمة العربية إلى أبسط جذع ممكن\n - إمكانية استخراج الجذر\n - تقطيع الكلمة إلى جميع الحالات الممكنة.\n - تنميط الكلمة ( توحيد الحروف ذات الأشكال المختلفة.\n - قائمة مسبقة للزوائد العربية، وحروف الزيادة\n - إمكانية ضبط إعدادات المجذع والمقطع، من خلال تعديل قوائم الزوائد.\n \n## Features\n - Arabic word Light Stemming.\n - Root Extraction.\n - Word Segmentation \n - Word normalization\n - Default Arabic Affixes list.\n - An customizable Light stemmer: possibility of change stemmer options and data.\n - Data independent stemmer.\n\n\n## Applications\n* Stemming texts\n* Text Classification and categorization\n* Sentiment Analysis\n* Named Entities Recognition\n\n## Installation\n\n```\npip install tashaphyne\n```    \n    \nUsage\n=====\n\n\nTahsphyne is a finite state automaton stem-based; it extracts affixes (prefixes and suffixes) from a predefined affix list.\n\nIt extracts all possible affixations from a word and cites all possible configurations stemming from a given word.\n\n\n\n### Functions الدوال \n\n\n* تجذيع الكلمة\n\nتجذيع الكلمة واستخلاص كل المعلومات منها بواسطة الدوال المناسبة\n\nStemming function: stem an Arabic word and return a stem. This function stores in the instance the stemming positions (left, right), and then it's possible to get other calculated attributes like stem, prefix, suffix, and root.\n\n```python\n\u003e\u003e\u003e from tashaphyne.stemming import ArabicLightStemmer\n\u003e\u003e\u003e ArListem = ArabicLightStemmer()\n\u003e\u003e\u003e word = 'أفتضاربانني'\n\u003e\u003e\u003e # stemming word\n... stem = ArListem.light_stem(word)\n\u003e\u003e\u003e # extract stem\n... print(ArListem.get_stem())\nضارب\n\u003e\u003e\u003e # extract root\n... print(ArListem.get_root())\nضرب\n\u003e\u003e\u003e \n\u003e\u003e\u003e # get prefix position index\n... print(ArListem.get_left())\n3\n\u003e\u003e\u003e # get prefix \n... print(ArListem.get_prefix())\nأفت\n\u003e\u003e\u003e # get prefix with a specific index\n... print(ArListem.get_prefix(2))    \nأف\n\u003e\u003e\u003e \n\u003e\u003e\u003e # get suffix position index\n... print(ArListem.get_right())\n7\n\u003e\u003e\u003e # get suffix \n... print(ArListem.get_suffix())   \nانني\n\u003e\u003e\u003e # get suffix with a specific index\n... print(ArListem.get_suffix(10))    \nي\n\u003e\u003e\u003e # get affix\n\u003e\u003e\u003e print(ArListem.get_affix())\nأفت-انني\n\u003e\u003e\u003e # get affix tuple\n... print(ArListem.get_affix_tuple())\n{'prefix': 'أفت', 'root': '', 'stem': '', 'suffix': 'أفتضاربانني'}\n\u003e\u003e\u003e # star words\n... print(ArListem.get_starword())\nأفت*ا**انني\n\u003e\u003e\u003e # get star stem\n... print(ArListem.get_starstem())\n*ا**\n\u003e\u003e\u003e \n\u003e\u003e\u003e #  get unvocalized word\n... print(ArListem.get_unvocalized())\nأفتضاربانني\n```\n\nfunction | Description | وصف|\n---------|-------------|----|\nget_root()|Get the root of the treated word by the stemmer. |استخلاص الجذر|\nget_stem()|Get the stem of the treated word by the stemmer.|استخلاص الجذع يمكن استخلاص الجذع التلقائي مباشرة، عند الرغبة في الحصول على جذع معين، نحدد دليل السابق، ودليل اللاحق.|\nget_left()| Get the prefix end position | موضع نهاية السابقة|\nget_right()|Get the suffix start position| موضع بداية اللاحقة |\nget_prefix()|return the prefix/suffix of the treated word by the stemmer.|استرجاع السابقة التلقائية أو سابقة معينة بموضع|\nget_suffix()| Get default suffix, or suffix by suffix index| استرجاع اللاحقة التلقائية أو بواسطة دليل اللاحقة\nget_affix()|Get default Affix or specific by left and right indexes|استرجاع الزائدة التلقائية أو المعينة بدليلي السابق واللاحق|\nget_affix_tuple()|Get affixe tuple | استرجاع الزائدة بتفاصيلها\nget_starword()|Get starred word, radical letters replaced by \"*\"|استرجاع الكلمة المنجمة، الحروف الأصلية مخفية بنجوم\nget_starstem()|Get starred stem, radical letters replaced by \"*\"|استرجاع الجذع المنجم، الحروف الأصلية مخفية بنجوم\nget_unvocalized()|return the unvocalized form of the treated word by the stemmer. Harakat are striped.| استرجاع الكلمة غير مشكولة|\n\n\n* استخلاص كل التقسيمات المحتملة\n* تقسيم الكلمة إلى كل الزوائد المحتملة\n\nGenerate a list of all possible segmentation positions (left, right) of the treated word by the stemmer.\n\n```python\n\n\u003e\u003e\u003e word = 'أفتضاربانني'\n\n\u003e\u003e\u003e # Detect all possible segmentation\n... print(ArListem.segment(word))\nset([(2, 7), (3, 8), (0, 8), (2, 9), (2, 8), (3, 10), (2, 11), (1, 8), (0, 7), (2, 10), (3, 11), (1, 10), (0, 11), (3, 9), (0, 10), (1, 7), (0, 9), (3, 7), (1, 11), (1, 9)])\n\n\u003e\u003e\u003e# Get all segment \n\u003e\u003e\u003eprint(ArListem.get_segment_list())\nset([(2, 7), (3, 8), (0, 8), (2, 9), (2, 8), (3, 10), (2, 11), (1, 8), (0, 7), (2, 10), (3, 11), (1, 10), (0, 11), (3, 9), (0, 10), (1, 7), (0, 9), (3, 7), (1, 11), (1, 9)])\n\n\u003e\u003e\u003e # get affix list\n... print(ArListem.get_affix_list())\n[{'prefix': 'أف', 'root': 'ضرب', 'stem': 'تضارب', 'suffix': 'انني'},\n {'prefix': 'أفت', 'root': 'ضرب', 'stem': 'ضاربا', 'suffix': 'نني'},\n {'prefix': '', 'root': 'أفضرب', 'stem': 'أفتضاربا', 'suffix': 'نني'}, \n {'prefix': 'أف', 'root': 'ضربن', 'stem': 'تضاربان', 'suffix': 'ني'}, \n {'prefix': 'أف', 'root': 'ضرب', 'stem': 'تضاربا', 'suffix': 'نني'}, \n {'prefix': 'أفت', 'root': 'ضربنن', 'stem': 'ضاربانن', 'suffix': 'ي'}, ...]\n\u003e\u003e\u003e \n```\n* segment() / get_segment_list()\nاستخلاص قائمة مواضع كل التقسيمات المحتملة على شكل أعداد\nreturn a list of segmentation positions (left, right) of the treated word by the stemmer.\n\n* get_affix_list\n\n استخلاص قائمة كل الزوائد المحتملة\n\nreturn a list of affix tuple of the treated word by the stemmer.\n\n### Customized Affix list\nتخصيص قوائم الزوائد\nيمكنن تخصيص قوائم السوابق واللواحق للحصول على نتائج افضل حسب السياق\n\nفي المثال الموالي، سنستعمل مجذع تاشفين حسب قوائمه التلقائية، ثم نصنع مجذعا آخر يعطي نتائج مختلفة بتخصيص قوائم السوابق واللواحق\n\nYou can modify and customize  the default affixes list by\n\n```python\n\u003e\u003e\u003e import tashaphyne.stemming\n\n\u003e\u003e\u003e CUSTOM_PREFIX_LIST = [u'كال', 'أفبال', 'أفك', 'فك', 'أولل', '', 'أف', 'ول', 'أوال', 'ف', 'و', 'أو', 'ولل', 'فب', 'أول', 'ألل', 'لل', 'ب', 'وكال', 'أوب', 'بال', 'أكال', 'ال', 'أب', 'وب', 'أوبال', 'أ', 'وبال', 'أك', 'فكال', 'أوك', 'فلل', 'وك', 'ك', 'أل', 'فال', 'وال', 'أوكال', 'أفلل', 'أفل', 'فل', 'أال', 'أفكال', 'ل', 'أبال', 'أفال', 'أفب', 'فبال']\n\u003e\u003e\u003e CUSTOM_SUFFIX_LIST = [u'كما', 'ك', 'هن', 'ي', 'ها', '', 'ه', 'كم', 'كن', 'هم', 'هما', 'نا']\n\n\u003e\u003e\u003e # simple stemmer with default affixes list\n... simple_stemmer = tashaphyne.stemming.ArabicLightStemmer()\n\n\u003e\u003e\u003e # create a cعstomized stemmer object for stemming enclitics and procletics\n... custom_stemmer = tashaphyne.stemming.ArabicLightStemmer()\n\u003e\u003e\u003e # configure the stemmer object\n... custom_stemmer.set_prefix_list(CUSTOM_PREFIX_LIST)\n\u003e\u003e\u003e custom_stemmer.set_suffix_list(CUSTOM_SUFFIX_LIST)\n\u003e\u003e\u003e \n\u003e\u003e\u003e word = \"بالمدرستين\"\n\u003e\u003e\u003e # segment word as \n... simple_stemmer.segment(word)\nset([(4, 10), (4, 7), (4, 9), (4, 8), (3, 10), (0, 7), (3, 8), (1, 10), (1, 8), (3, 9), (0, 10), (1, 7), (0, 9), (3, 7), (0, 8), (1, 9)])\n\u003e\u003e\u003e print(simple_stemmer.get_affix_list())\n[{'prefix': 'بالم', 'root': 'درستين', 'stem': 'درستين', 'suffix': ''}, {'prefix': 'بالم', 'root': 'درس', 'stem': 'درس', 'suffix': 'تين'}, {'prefix': 'بالم', 'root': 'درستي', 'stem': 'درستي', 'suffix': 'ن'}, {'prefix': 'بالم', 'root': 'درست', 'stem': 'درست', 'suffix': 'ين'}, {'prefix': 'بال', 'root': 'مدرستين', 'stem': 'مدرستين', 'suffix': ''}, {'prefix': '', 'root': 'بالمدرس', 'stem': 'بالمدرس', 'suffix': 'تين'}, ...]\n\u003e\u003e\u003e \n\u003e\u003e\u003e custom_stemmer.segment(word)\nset([(1, 10), (3, 10), (0, 10)])\n\u003e\u003e\u003e \n\u003e\u003e\u003e print(custom_stemmer.get_affix_list())\n[{'prefix': 'ب', 'root': 'المدرستين', 'stem': 'المدرستين', 'suffix': ''}, {'prefix': 'بال', 'root': 'مدرستين', 'stem': 'مدرستين', 'suffix': ''}, {'prefix': '', 'root': 'بالمدرستين', 'stem': 'بالمدرستين', 'suffix': ''}]\n\u003e\u003e\u003e \n\n```\n\nThis command *set_prefix_list*  and  *set_suffix_list\" will rebuild the Finite state automaton to consider new affixes list.\n\n### Stemming a text\n\nTo stem all words in a text, we use tokenization preprocessing:\n```\n\u003e\u003e\u003e import pyarabic.araby as araby\n\u003e\u003e\u003e from tashaphyne.stemming import ArabicLightStemmer\n\u003e\u003e\u003e stemmer  = ArabicLightStemmer()\n\u003e\u003e\u003e text = \"الأطفال يستريحون في المكتبة للمطالعة\"\n\u003e\u003e\u003e tokens = araby.tokenize(text)\n\u003e\u003e\u003e tokens\n['الأطفال', 'يستريحون', 'في', 'المكتبة', 'للمطالعة']\n\u003e\u003e\u003e for tok in tokens:\n...     stem = stemmer.light_stem(tok)\n...     print(tok, stem)\n... \nالأطفال أطفال\nيستريحون يستريح\nفي في\nالمكتبة مكتب\nللمطالعة مطالع\n\u003e\u003e\u003e \n\n```\nPackage Documentation\n=====\n\nFiles\n=====\n* file/directory    category    description \n\n* [docs]\n    docs/   docs    documentation\n\n* [support]\n    - pyarabic  : basic arabic library\n\n* [test]\n    - output/   test    test output\n    - samples/  test    sample files\n    - tools/    test    script to use tashaphyne\n\n\n## Featured Posts\nIf you would cite it in academic work, can you use this citation\n```\nT. Zerrouki‏, Tashaphyne, Arabic light stemmer‏,  https://pypi.python.org/pypi/Tashaphyne/0.2\n```\nor in bibtex format\n```bibtex\n@misc{zerrouki2012tashaphyne,\ntitle={Tashaphyne, Arabic light stemmer},\nauthor={Zerrouki, Taha},\nurl={https://pypi.python.org/pypi/Tashaphyne/0.2},\nyear={2012}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinuxscout%2Ftashaphyne","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinuxscout%2Ftashaphyne","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinuxscout%2Ftashaphyne/lists"}