{"id":41578500,"url":"https://github.com/iamjazzar/matn","last_synced_at":"2026-01-24T08:26:53.141Z","repository":{"id":38014424,"uuid":"499017799","full_name":"iamjazzar/matn","owner":"iamjazzar","description":"A shared space for Arabic text processors.","archived":false,"fork":false,"pushed_at":"2023-09-05T07:21:09.000Z","size":68,"stargazers_count":1,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-17T07:50:02.687Z","etag":null,"topics":["arabic","jummal","nlp","stemmers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iamjazzar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-06-02T06:27:26.000Z","updated_at":"2022-08-20T09:57:25.000Z","dependencies_parsed_at":"2023-01-25T22:17:07.927Z","dependency_job_id":null,"html_url":"https://github.com/iamjazzar/matn","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/iamjazzar/matn","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamjazzar%2Fmatn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamjazzar%2Fmatn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamjazzar%2Fmatn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamjazzar%2Fmatn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iamjazzar","download_url":"https://codeload.github.com/iamjazzar/matn/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamjazzar%2Fmatn/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28720543,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-24T05:53:42.649Z","status":"ssl_error","status_checked_at":"2026-01-24T05:53:41.698Z","response_time":89,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arabic","jummal","nlp","stemmers"],"created_at":"2026-01-24T08:26:52.468Z","updated_at":"2026-01-24T08:26:53.135Z","avatar_url":"https://github.com/iamjazzar.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003ch1 align=\"center\"\u003e\n  Matn | مَتن\n  \u003cbr\u003e\n  \u003ca href=\"https://github.com/iamjazzar/matn/actions/workflows/ci.yml\"\u003e\n    \u003cimg style=\"max-width: 100%;\" alt=\"Tests\" src=\"https://github.com/iamjazzar/matn/actions/workflows/ci.yml/badge.svg\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://badge.fury.io/py/matn\"\u003e\n    \u003cimg style=\"max-width: 100%;\" alt=\"Tests\" src=\"https://badge.fury.io/py/matn.svg\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/iamjazzar/matn/actions/workflows/codeql-analysis.yml\"\u003e\n    \u003cimg src=\"https://github.com/iamjazzar/matn/actions/workflows/codeql-analysis.yml/badge.svg\" /\u003e\n  \u003c/a\u003e\n\u003c/h1\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://www.ahmedjazzar.com/\"\u003e\n  \u003cpicture\u003e\n      \u003csource srcset=\"https://user-images.githubusercontent.com/11036472/172036047-b60ad299-e30f-4a16-85f7-645d95edd1b8.png\" media=\"(prefers-color-scheme: dark)\" /\u003e\n      \u003cimg width=\"400\" id=\"screenshot\" src=\"https://user-images.githubusercontent.com/11036472/172036055-b0a9c55c-3986-411d-955f-790130c49c27.png\" /\u003e\n    \u003c/picture\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003cbr\u003e\n    A shared space for Arabic text processors.\n  \u003cbr\u003e\n\u003c/p\u003e\n\n\n## 1. Getting started\n\n```bash\npip install matn\n```\n## 2. Counters\n### 2.1. Jummal | حِسَاب ٱلْجُمَّل\nOr Abjad numerals, a decimal alphabetic numeral system/alphanumeric code, in which the 28 letters of the Arabic alphabet are assigned numerical values. They have been used in the Arabic-speaking world since before the eighth century when positional Arabic numerals were adopted.\n\n#### 2.1.1. Methods\nThere are different ways and values people use for jummal.\n1. The normal method which doesn't include the hamza count.\n1. The method that considers hamza as a seperate character.\n1. The tarkeeb method; Used to express the numbers from 2000 to 1,000,000, using the rule based on the letter \"غ\". The rule is fairly simple, any character that comes before \"غ\" its value will be multiplied with 1000 instead of accumalated to it.\n1. Normalized hamzas method, where we treat all hamza forms as a regular alef instead of the letter it appears on. Defaults to False.\n\n#### 2.1.2. Usage\n##### Python\n```python\n\u003e\u003e\u003e from matn.counters import jummal\n\n\u003e\u003e\u003e text = \"شغل الدموع عن الديار بكاؤنا   لبكاء فاطمــة على أولادها\"\n\n\u003e\u003e\u003e jummal(text)\n2_273  # شغ's value is 1000 + 300 and hamza value is 0\n\n# To include Hamza count\n\u003e\u003e\u003e jummal(text, use_hamza=True)\n2_274  # شغ's value is 1000 + 300 and hamza value is 1\n\n# To include hamza normalization\n\u003e\u003e\u003e jummal(text, normalize_hamza=True)\n2_268  # شغ's value is 1000 + 300, hamza value is 1, and ؤ value is 1\n\n# To use tarkeeb\n\u003e\u003e\u003e jummal(text, use_tarkeeb=True)\n300_973  # شغ's value is 300 * 1000 and hamza value is 0\n\n# To use hamza and tarkeeb\n\u003e\u003e\u003e jummal(text, use_hamza=True, use_tarkeeb=True)\n300_974  # شغ's value is 300 * 1000 and hamza value is 1\n```\n\n##### CLI\n```shell\nmatn jummal \"شغل الدموع عن الديار بكاؤنا   لبكاء فاطمــة على أولادها\"\n\n# To include Hamza count\nmatn jummal --use-hamza \"شغل الدموع عن الديار بكاؤنا   لبكاء فاطمــة على أولادها\"\n\n# To use tarkeeb\nmatn jummal --use-tarkeeb \"شغل الدموع عن الديار بكاؤنا   لبكاء فاطمــة على أولادها\"\n\n# To normalize hamza\nmatn jummal --normalize-hamza \"شغل الدموع عن الديار بكاؤنا   لبكاء فاطمــة على أولادها\"\n\n# All methods at once\nmatn jummal -z -n -t  \"شغل الدموع عن الديار بكاؤنا   لبكاء فاطمــة على أولادها\"\n```\n\n### 2.2. Word Count\nCounts the number of characters in a given string.\n\n#### 2.2.1. Methods\nThe method is very obvious. However, some researchers tend to split words into multiple parts. The only word we took interest in, so far, is بعدما. The `word_count` method will give you the option to split it into two words or count it as one.\n\n#### 2.2.2. Usage\n##### Python\n```python\n\u003e\u003e\u003e from matn.counters import word_count\n\n\u003e\u003e\u003e text = \"فَمَنۢ بَدَّلَهُۥ بَعۡدَمَا سَمِعَهُۥ\"\n\n\u003e\u003e\u003e word_count(text)\n4\n\n# To split badama\n\u003e\u003e\u003e word_count(text, split_badama=True)\n5  # بَعۡدَمَا was split into two words\n```\n\n##### CLI\n```shell\nmatn wc \"فَمَنۢ بَدَّلَهُۥ بَعۡدَمَا سَمِعَهُۥ\"\n\n# To split badama\nmatn wc --split-badama \"فَمَنۢ بَدَّلَهُۥ بَعۡدَمَا سَمِعَهُۥ\"\n```\n\n### 2.3. Char Count\nCounts the number of characters in a given string.\n\n#### 2.3.1. Methods\n- In some cases, we need to consinder spaces as seperate characters, in some cases we don't.\n- In some cases, we consider the hamza-madda (أٓ) character two characters. This character appears in the word الأٓخرة for example.\n\n#### 2.3.2. Usage\n##### Python\n```python\n\u003e\u003e\u003e from matn.counters import char_count\n\n\u003e\u003e\u003e text = \"ٱلدَّارُ ٱلۡأٓخِرَةُ\"\n\n\u003e\u003e\u003e char_count(text)\n11\n\n# To Include spaces\n\u003e\u003e\u003e char_count(text, include_spaces=True)\n12\n\n# To Include hamza-madda\n\u003e\u003e\u003e char_count(text, hamza_madda=True)\n12\n\n# To Include hamza-madda and spaces\n\u003e\u003e\u003e char_count(text, hamza_madda=True)\n13\n```\n\n##### CLI\n```shell\nmatn cc \"ٱلدَّارُ ٱلۡأٓخِرَةُ\"\n\n# To Include hamza-madda\nmatn wc --hamza-madda \"فَمَنۢ بَدَّلَهُۥ بَعۡدَمَا سَمِعَهُۥ\"\n\n# To Include spaces\nmatn wc --include-spaces \"فَمَنۢ بَدَّلَهُۥ بَعۡدَمَا سَمِعَهُۥ\"\n\n# To Include hamza-madda and spaces\nmatn wc --include-spaces --hamza-madda \"فَمَنۢ بَدَّلَهُۥ بَعۡدَمَا سَمِعَهُۥ\"\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiamjazzar%2Fmatn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiamjazzar%2Fmatn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiamjazzar%2Fmatn/lists"}