{"id":13824982,"url":"https://github.com/AsoSoft/AsoSoft-Library-py","last_synced_at":"2025-07-08T21:30:38.646Z","repository":{"id":217090723,"uuid":"740345940","full_name":"AsoSoft/AsoSoft-Library-py","owner":"AsoSoft","description":"AsoSoft's Library for Kurdish language processing tasks in python","archived":false,"fork":false,"pushed_at":"2024-07-11T01:35:19.000Z","size":51,"stargazers_count":15,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-16T17:20:40.816Z","etag":null,"topics":["central-kurdish","kurdish","kurdish-language-processing","natural-language-processing","normalization","sorani","unicode-normalization"],"latest_commit_sha":null,"homepage":"https://asosoft.com/en/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AsoSoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-08T06:45:33.000Z","updated_at":"2025-04-27T19:39:54.000Z","dependencies_parsed_at":"2024-01-18T01:57:49.089Z","dependency_job_id":"2e089ca9-b1b7-4064-a8e5-9eb0f8d63283","html_url":"https://github.com/AsoSoft/AsoSoft-Library-py","commit_stats":null,"previous_names":["asosoft/asosoft-library-py"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AsoSoft/AsoSoft-Library-py","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AsoSoft%2FAsoSoft-Library-py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AsoSoft%2FAsoSoft-Library-py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AsoSoft%2FAsoSoft-Library-py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AsoSoft%2FAsoSoft-Library-py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AsoSoft","download_url":"https://codeload.github.com/AsoSoft/AsoSoft-Library-py/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AsoSoft%2FAsoSoft-Library-py/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264352577,"owners_count":23594925,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["central-kurdish","kurdish","kurdish-language-processing","natural-language-processing","normalization","sorani","unicode-normalization"],"created_at":"2024-08-04T09:01:13.042Z","updated_at":"2025-07-08T21:30:38.400Z","avatar_url":"https://github.com/AsoSoft.png","language":"Python","readme":"# AsoSoft Library (Python)\nAsoSoft Library offers the following natural language processing (NLP) algorithms for the Kurdish Language (ckb: Central branch of Kurdish):\n- **Grapheme-to-Phoneme (G2P) converter and Transliterator**: converts Kurdish text into syllabified phoneme string. Also transliterates Kurdish texts from Arabic script into Latin script and vice versa.\n- **Normalizer**: normalizes the Kurdish text and punctuation marks, unifies numerals, replaces Html Entities, extracts and replaces URLs and emails, and more.\n- **Numeral Converter**: converts any type of numbers into Kurdish words.\n- **Sort**: Sorts a list in correct Kurdish alphabet order.\n- **Poem Meter Classifier**: Classifies the meter of the input Kurdish poem\n\nAsoSoft Library is originally written in C# by [Aso Mahmudi](https://github.com/aso-mehmudi) and this library is its Python port.\n\n## How to use?\n- **Requierements**: Python 3.8+\n- **Install the package using pip**: [pip install asosoft](https://pypi.org/project/asosoft/)\n- **Import the package in your Python file**: \n```python\nimport asosoft\n```\n\n## Grapheme-to-Phoneme (G2P) converter and Transliteration\nThis function is based on the study \"[Automated Grapheme-to-Phoneme Conversion for Central Kurdish based on Optimality Theory](https://www.sciencedirect.com/science/article/abs/pii/S0885230821000292)\". \n\n### Kurdish G2P converter\nConverts Central Kurdish text in standard Arabic script into **syllabified phonemic** Latin script (i.e. graphemes to phonems)\n\nGeneral format:\n```python\nasosoft.KurdishG2P(text, convertNumbersToWord = False, backMergeConjunction = True, singleOutputPerWord = True)\n```\nAn example:\n```python\n\u003e\u003e\u003e print(asosoft.KurdishG2P(\"شەو و ڕۆژ بووین بە گرفت. درێژیی دیوارەکەی گرتن\"))\nˈşeˈwû ˈřoj ˈbûyn ˈbe ˈgiˈrift. ˈdiˈrêˈjîy ˈdîˈwaˈreˈkey ˈgirˈtin\n```\n### Transliteration\n\nArabic script into Hawar Latin script (ح‌غ‌ڕڵ→ḧẍřł):\n```python\n\u003e\u003e\u003e print(asosoft.Ar2La(\"گیرۆدەی خاڵی ڕەشتە؛ گوێت لە نەغمەی تویوورە؟\"))\ngîrodey xałî řeşte; gwêt le neẍmey tuyûre?\n```\n\nArabic script into the Latin script suggested by Dr. Feryad Fazil Omar:\n```python\n\u003e\u003e\u003e print(asosoft.Ar2LaFeryad(\"گیرۆدەی خاڵی ڕەشتە؛ گوێت لە نەغمەی تویوورە؟\"))\ngîrodey xaḻî ṟeşte; gwêt le nex̱mey tuyûre?\n```\n\nArabic script into simplified (ح‌غ‌ڕڵ→hxrl) Hawar Latin script:\n```python\n\u003e\u003e\u003e print(asosoft.Ar2LaSimple(\"گیرۆدەی خاڵی ڕەشتە؛ گوێت لە نەغمەی تویوورە؟\"))\ngîrodey xalî reşte; gwêt le nexmey tuyûre?\n```\n\nLatin script (Hawar) into Arabic script:\n```python\n\u003e\u003e\u003e print(asosoft.La2Ar(\"Gelî keç û xortên kurdan, hûn hemû bi xêr biçin\"))\nگەلی کەچ و خۆرتێن کوردان، هوون هەموو ب خێر بچن\n```\n\nArabic script into IPA:\n```python\n\u003e\u003e\u003e print(asosoft.Phonemes2IPA(asosoft.KurdishG2P(\"شەو و ڕۆژ بووین بە گرفت. درێژیی دیوارەکە گرتن\")))\nʃa·wu ro̞ʒ bujn ba gɪ·ɾɪft. dɪ·ɾɛ·ʒij di·wä·ɾa·ka gɪɾ·tɪn\n```\n## Kurdish Text Normalizer\nSeveral functions needed for Central Kurdish text normalization:\n\n### Normalize Kurdish\nTwo character replacement lists are provided  as the resources of the library:\n- Deep Unicode Corrections:\n  - replacing deprecated Arabic Presentation Forms (FB50–FDFF and FE70–FEFF) with corresponding standard characters.\n  - replacing different types of dashes and spaces\n  - removing Unicode control character\n- Additional Unicode Corrections\n  - replacing special Arabic math signs with corresponding Latin characters\n  - replacing similar, but different letters with standard characters  (e.g. ڪ,ے,ٶ with ک,ی,ؤ)\n\nThe normalization task in this function:\n- for all Arabic scripts (including Kurdish, Arabic, and Persian):\n  - Character-based replacement:\n    - Above mentioned replacement lists\n    - Private Use Area (U+E000 to U+F8FF) with White Square character\n - Standardizing and removing duplicated or unnecessary Zero-Width characters\n - removing unnecessary Tatweels (U+0640)\n- only for Central Kurdish:\n  - standardizing Kurdish characters: ە, هـ, ی, and ک \n  - correcting miss-converted characters from non-Unicode fonts\n  - replacing word-initial ر with ڕ\n\nthe simple overloading:\n```python\n\u003e\u003e\u003e print(asosoft.Normalize(\"دەقے شیَعري خـــۆش. ره‌نگه‌كاني خاك\"))\nدەقی شێعری خۆش. ڕەنگەکانی خاک\n```\n\nor the complete overloading:\n```python\n\u003e\u003e\u003e asosoft.Normalize(text, isOnlyKurdish=True, changeInitialR=True, deepUnicodeCorrectios=True, additionalUnicodeCorrections=True, usersReplaceList)\n```\n\n### AliK to Unicode\n`AliK2Unicode` converts Kurdish text written in AliK fonts (developed by Abas Majid in 1997) into Unicode standard. Ali-K fonts: *Alwand, Azzam, Hasan, Jiddah, kanaqen, Khalid, Sahifa, Sahifa Bold, Samik, Sayid, Sharif, Shrif Bold, Sulaimania, Traditional*\n```python\n\u003e\u003e\u003e print(asosoft.AliK2Unicode(\"ئاشناكردنى خويَندكار بة طوَرِانكاريية كوَمةلاَيةتييةكان\"))\nئاشناکردنی خوێندکار بە گۆڕانکارییە کۆمەڵایەتییەکان\n```\n\n### AliWeb to Unicode\n`AliWeb2Unicode` converts Kurdish text written in AliK fonts into Unicode standard. Ali-Web fonts: *Malper, Malper Bold, Samik, Traditional, Traditional Bold*\n```python\n\u003e\u003e\u003e print(asosoft.AliWeb2Unicode(\"هةر جةرةيانصکي مصذووُيي کة أوو دةدا\"))\nھەر جەرەیانێکی مێژوویی کە ڕوو دەدا\n```\n\n### Dylan to Unicode\n`Dylan2Unicode` converts Kurdish text written in Dylan fonts (developed by Dylan Saleh at [KurdSoft](  https://web.archive.org/web/20020528231610/http://www.kurdsoft.com/) in 2001) into Unicode standard.\n```python\n\u003e\u003e\u003e print(asosoft.Dylan2Unicode(\"لثكؤلثنةران بؤيان دةركةوتووة كة دةتوانث بؤ لةش بةكةصك بث\"))\nلێکۆلێنەران بۆیان دەرکەوتووە کە دەتوانێ بۆ لەش بەکەڵک بێ\n```\n### Zarnegar to Unicode\n`Zarnegar2Unicode` converts Kurdish text written in Zarnegar word processor (developed by [SinaSoft](http://www.sinasoft.com/fa/zarnegar.html) with RDF converter by [NoorSoft](https://www.noorsoft.org/fa/software/view/6561)) and into Unicode standard.\n```python\n\u003e\u003e\u003e print(asosoft.Zarnegar2Unicode(\"بلٌيٌين و بگه‌رٍيٌين بوٌ هه‌لاٌلٌه‌ى سىٌيه‌مى فه‌لسه‌فه\"))\nبڵێین و بگەڕێین بۆ هەڵاڵەی سێیەمی فەلسەفە\n```\n### NormalizePunctuations\n`NormalizePunctuations` corrects spaces before and after of the punctuations. When `seprateAllPunctuations` is true, \n```python\n\u003e\u003e\u003e print(asosoft.NormalizePunctuations(\"دەقی«کوردی » و ڕێنووس ،((خاڵبەندی )) چۆنە ؟\", false))\nدەقی «کوردی» و ڕێنووس، «خاڵبەندی» چۆنە؟\n```\n### Trim Line\nTrim starting and ending white spaces (including zero width spaces) of line,\n`TrimLine`\n```python\n\u003e\u003e\u003e print(TrimLine(\"   دەق\\u200c  \"))\nدەق\n```\n\n### Replace Html Entities\n`ReplaceHtmlEntity` replaces HTML Entities with single Unicode characters (e.g. \"\u0026eacute;\" with \"é\"). It is useful in web crawled corpora.\n```python\n\u003e\u003e\u003e print(asosoft.ReplaceHtmlEntity(\"ئێوە \u0026quot;دەق\u0026quot; بە زمانی \u0026lt;کوردی\u0026gt; دەنووسن\"))\nئێوە \"دەق\" بە زمانی \u003cکوردی\u003e دەنووسن\n```\n### Replace URLs and emails\n`ReplaceUrlEmail` replaces URLs and emails with a certain word. It improves language models.\n\n### Unify Numerals\n`UnifyNumerals` unifies numeral characters into desired numeral type from `en` (0123456789) or `ar` (٠١٢٣٤٥٦٧٨٩)\n```python\n\u003e\u003e\u003e print(asosoft.UnifyNumerals(\"ژمارەکانی ٤٥٦ و ۴۵۶ و 456\", \"en\"))\nژمارەکانی 456 و 456 و 456\n```\n\n### Seperate Digits from words\n`SeperateDigits` add a space between joined numerals and words (e.g. replacing \"12کەس\" with \"12 کەس\"). It improves language models.\n```python\n\u003e\u003e\u003e print(asosoft.SeperateDigits(\"لە ساڵی1950دا1000دۆلاریان بە 5کەس دا\"))\nلە ساڵی 1950 دا 1000 دۆلاریان بە 5 کەس دا\n```\n\n### Word to Word Replacment\n`Word2WordReplacement` applies a \"string to string\" replacement dictionary on the text. It replaces the full-matched words not a part of them.\n```python\n\u003e\u003e\u003e print(asosoft.Word2WordReplacement(\"مال، نووری مالیکی\", {\"مال\": \"ماڵ\", \"سلاو\": \"سڵاو\"}))\nماڵ، نووری مالیکی\n```\n\n### Character to Character Replacment\n`Char2CharReplacment` applies a \"char to char\" replacement dictionary on the text. It uses as the final step needed for some non-Unicode systems.\n\n## Kurdish Numeral converter\nIt converts numerals into Central Kurdish words. It is useful in text-to-speech tools.\n- integers (1100 =\u003e )\n- floats (10.11)\n- negatives (-10.11)\n- percent (100% or %100)\n- querency marks ($100, £100, and €100)\n\n```python\n\u003e\u003e\u003e print(asosoft.Number2Word(\"لە ساڵی 1999دا بڕی 40% لە پارەکەیان واتە $102.1 یان وەرگرت\"))\nلە ساڵی هەزار و نۆسەد و نەوەد و نۆدا بڕی چل لە سەد لە پارەکەیان واتە سەد و دوو پۆینت یەک دۆلاریان وەرگرت\n```\n\n## Kurdish Sort\nSorting a string list in correct order of Kurdish alphabet (\"ئءاآأإبپتثجچحخدڎذرڕزژسشصضطظعغفڤقكکگلڵمنوۆۊۉهھەیێ\")\n```python\n\u003e\u003e\u003e myList = [\"یەک\", \"ڕەنگ\", \"ئەو\", \"ئاو\", \"ڤەژین\", \"فڵان\"]\n\u003e\u003e\u003e print(asosoft.KurdishSort(myList))\n[\"ئاو\", \"ئەو\", \"ڕەنگ\", \"فڵان\", \"ڤەژین\", \"یەک\"]\n```\nor using your custom order:\n```python\n\u003e\u003e\u003e input_list = [\"یەک\", \"ڕەنگ\", \"ئەو\", \"ئاو\", \"ڤەژین\", \"فڵان\"]\n\u003e\u003e\u003e input_order = list(\"ئءاآأإبپتثجچحخدڎڊذرڕزژسشصضطظعغفڤقكکگڴلڵمنوۆۊۉۋهھەیێ\")\n\u003e\u003e\u003e print(asosoft.CustomSort(input_list, input_order))\n[\"ئاو\", \"ئەو\", \"ڕەنگ\", \"فڵان\", \"ڤەژین\", \"یەک\"]\n```\n## Poem Meter Classifier\nIt classifies the meter of the input Kurdish poem typed in Arabic script. The lines of the poem should be seprated by new line char ('\\n').\nYou can find Kurdish poems in https://books.vejin.net/.\n```python\n\u003e\u003e\u003e poem = f\"گەرچی تووشی ڕەنجەڕۆیی و حەسرەت و دەردم ئەمن\\nقەت لەدەس ئەم چەرخە سپڵە نابەزم مەردم ئەمن\\nمن لە زنجیر و تەناف و دار و بەند باکم نییە\\nلەت لەتم کەن، بمکوژن، هێشتا دەڵێم کوردم ئەمن\"\n\u003e\u003e\u003e classified = asosoft.ClassifyKurdishPoem(poem)\n\u003e\u003e\u003e print(\"Poem Type= \" + classified.overalMeterType)\nQuantitative/عەرووزی\n\u003e\u003e\u003e print(\"Poem Meter= \" + classified.overalPattern)\nفاعلاتن فاعلاتن فاعلاتن فاعلن\n```","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAsoSoft%2FAsoSoft-Library-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAsoSoft%2FAsoSoft-Library-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAsoSoft%2FAsoSoft-Library-py/lists"}