{"id":17718569,"url":"https://github.com/linuxscout/shellshal","last_synced_at":"2025-09-11T02:12:46.303Z","repository":{"id":138755583,"uuid":"77242673","full_name":"linuxscout/shellshal","owner":"linuxscout","description":"Shell Scripts for Arabic Language","archived":false,"fork":false,"pushed_at":"2023-01-05T08:27:06.000Z","size":90,"stargazers_count":17,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2023-03-11T10:12:36.286Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/linuxscout.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"patreon":"linuxscout"}},"created_at":"2016-12-23T18:06:43.000Z","updated_at":"2023-01-06T16:01:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"f2bbb775-151b-4681-8b86-f94689989182","html_url":"https://github.com/linuxscout/shellshal","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"purl":"pkg:github/linuxscout/shellshal","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fshellshal","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fshellshal/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fshellshal/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fshellshal/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/linuxscout","download_url":"https://codeload.github.com/linuxscout/shellshal/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fshellshal/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261768478,"owners_count":23207013,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-25T14:54:54.166Z","updated_at":"2025-06-24T22:36:08.973Z","avatar_url":"https://github.com/linuxscout.png","language":"Shell","funding_links":["https://patreon.com/linuxscout"],"categories":[],"sub_categories":[],"readme":"# Shellshal\nShell Scripts for Arabic Language processing\nسكريبتات سطر الأوامر للتعامل مع النصوص العربية\n\n![shellshel logo](logo.png  \"shellshel logo\")\n\n## Description\nThis project is a collection of small shell scripts used to process arabic texts, like:\n\n- Tokenize a file text into tokens (duplicate, unique)\n- Strip Tashkeel from text ( all diacritics, keep shadda, last haraka )\n- Strip only the last mark from every word in text.\n- Convert Alef_Wasla into Alef\n- Build a list from csv file\n\nهذا المشروع يجمع سكريبتات بسيطة لمعالجة الملفات النصية العربية مثل:\n\n- تفريق نص إلى كلمات\n- تفريق النص، وحذف المكررات\n- حذف التشكيل، حذف الحركات وحفظ الشدة، حذف آخر حركة\n- تحويل ألف الوصلة إلى ألف عادية\n- تحويل ملف نصي csv إلى قائمة\n\n## التسمية\nشَلْشَلَ\n    [ ش ل ش ل ]. ( فعل : رباعي لازم متعد ). :- شَلْشَلْتُ ، أُشَلْشِلُ ، شَلْشِلْ ، مصدر شَلْشَلَةٌ .\n    1 . :- شَلْشَلَ الْمَاءَ :- : صَبَّهُ مُتَتَابِعاً .\n    2 . :- شَلْشَلَ الْمَاءُ :- : قَطَرَ وَسَالَ مُتَتَابِعاً .\n    3 . :- شَلْشَلَ السَّيْفُ الدَّمَ :- : صَبَّهُ .\n    \nالاسم مأخوذ من شبهه بكلمة shell التي تعني سطر الأوامر، \n\nوالمعنى في الشلشلة هي التتابع\n\n## Usage\n\n### Install\n\n```shell\nmake install\n```\n### Test\n```shell\nmake test\n```\n### Scripts\n\nDisplay all possible command by using \n\n```shell\nshellshal\n```\n\n### Commands\n\n#### Tokenize\n\n1- You can tokenize a text file by the following script.\n```\ntokenize.sh filename\n```\n###### source\n```shell\nsed 's/[[:punct:][:space:]×،؛]/\\n/g'  \u003c $1 |sed '/^\\s*$/d'\n```\n\n2- Tokenize, sort, removre duplicates and count frequencies for words in file. The result file is filename.unq.\n```\ntokenize_uniq.sh filename\n```\n###### source\n```shell\nsed 's/[[:punct:][:space:]×،؛]/\\n/g'  \u003c $1 |sed '/^\\s*$/d' | sort | uniq -c | sort -nr \u003e$1.unq\n```\n\n##### Tashkeel Removing\n1- Remove Harakat (diacritics), Tatweel and Shadda from text\n\n```\nstrip_tashkeel.sh filename\n```\nsource\n\n```shell\nCHARS=$(python -c 'print u\"\\u064b\\u064c\\u064d\\u064e\\u064f\\u0651\\u0652\".encode(\"utf8\")')\nsed 's/['\"$CHARS\"']//g' \u003c $1\n```\n\n2- Remove Harakat (diacritics) and Tatweel  from text, but keep Shadda\n\n```\nstrip_harakat.sh filename\n```\n\nsource\n\n```shell\nCHARS=$(python -c 'print (u\"\\u064b\\u064c\\u064d\\u064e\\u064f\\u0650\\u0652\\u0670\".encode(\"utf8\"))')\nsed 's/['\"$CHARS\"']//g' \u003c $1\n```\n\n\n\n3- Remove last Haraka (diacritic) from the end of words from text\n\n```\nstrip_lastmark.sh filename\n```\nsource\n\n```shell\nCHARS=$(python -c 'print u\"\\u064b\\u064c\\u064d\\u064e\\u064f\\u0651\\u0652\".encode(\"utf8\")')\nsed 's/['\"$CHARS\"']$//g' \u003c $1\n```\n\n4- Replace Alef wasla to simple alef in  words from text\n\n\n\n```shell\nreplace_wasla.sh filename\n```\n\nsource\n\n```shell\nCHARS=$(python -c 'print (u\"\\u0671\".encode(\"utf8\"))')\nTO=$(python -c 'print (u\"\\u0627\".encode(\"utf8\"))')\nsed 's/['\"$CHARS\"']/'\"$TO\"'/g' \u003c $1\n```\n\n##### Build lists and dictionary\n\n1-Makelist Convert file into list; csv file or one word per line\n\n```\nshellshal/makelist.sh testfile.csv\n```\nsource\n\n```\nawk 'BEGIN{print \"MyList=[\"};/^[^#]/{printf \"u\\\"%s\\\",\\n\",$1};END{print \"]\"}' $1\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinuxscout%2Fshellshal","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinuxscout%2Fshellshal","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinuxscout%2Fshellshal/lists"}