{"id":17718552,"url":"https://github.com/linuxscout/maskouk-pysqlite","last_synced_at":"2025-09-23T10:30:58.333Z","repository":{"id":46108811,"uuid":"285549886","full_name":"linuxscout/maskouk-pysqlite","owner":"linuxscout","description":"Arabic collocations library and data for Python","archived":false,"fork":false,"pushed_at":"2021-11-14T06:28:22.000Z","size":6826,"stargazers_count":10,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-10-06T07:48:20.302Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/linuxscout.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"patreon":"linuxscout"}},"created_at":"2020-08-06T11:12:17.000Z","updated_at":"2024-03-04T05:49:48.000Z","dependencies_parsed_at":"2022-09-02T14:41:58.185Z","dependency_job_id":null,"html_url":"https://github.com/linuxscout/maskouk-pysqlite","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fmaskouk-pysqlite","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fmaskouk-pysqlite/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fmaskouk-pysqlite/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fmaskouk-pysqlite/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/linuxscout","download_url":"https://codeload.github.com/linuxscout/maskouk-pysqlite/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233965886,"owners_count":18758368,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-25T14:54:50.229Z","updated_at":"2025-09-23T10:30:52.062Z","avatar_url":"https://github.com/linuxscout.png","language":"Python","funding_links":["https://patreon.com/linuxscout"],"categories":[],"sub_categories":[],"readme":"# Maskouk-pysqlite مكتبة مسكوك\n\n\nArabic collocations library and data for Python +SQLite API\n![maskouk logo](doc/maskouk_header.png  \"maskouk logo\")\n\n[![downloads]( https://img.shields.io/sourceforge/dt/maskouk.svg)](http://sourceforge.org/projects/maskouk)\n[![downloads]( https://img.shields.io/sourceforge/dm/maskouk.svg)](http://sourceforge.org/projects/maskouk)\n\n\n  Developer:  Taha Zerrouki: http://tahadz.com\n    taha dot zerrouki at gmail dot com\n\n  \nFeatures |   value\n---------|---------------------------------------------------------------------------------\nAuthors  | [Authors.md](https://github.com/linuxscout/maskouk-pysqlite/master/AUTHORS.md)\nRelease  | 0.1\nLicense  |[GPL](https://github.com/linuxscout/maskouk-pysqlite/master/LICENSE)\nTracker  |[linuxscout/maskouk/Issues](https://github.com/linuxscout/maskouk-pysqlite/issues)\nWebsite  |[http://maskouk.sourceforge.net](http://maskouk-pysqlite.sourceforge.net)\nSource  |[Github](http://github.com/linuxscout/maskouk-pysqlite)\nDownload  |[sourceforge](http://maskouk.sourceforge.net)\nFeedbacks  |[Comments](https://github.com/linuxscout/maskouk-pysqlite/)\nAccounts  |[@Twitter](https://twitter.com/linuxscout)  [@Sourceforge](http://sourceforge.net/projects/maskouk/)\n\n## Description\n\nMaskouk is a database of arab ic collocations  extracted from Wikipedia.\n\nArabic wikipedia data base 2011-Jun-21.\n\n###  مزايا:\n\u003cdiv dir=\"rtl\"\u003e\n\n-   التحقق أنّ كلمتين معا تشكلان متلازمة موجودة في القاموس\n-   استخلاص كل المتلازمات الموجودة في نص معين\n-   وسم المتلازمات في نص معين بعلامات بداية ونهاية المتلازمة\n-   البحث عن المتلازمات الطويلة مثل “بسم الله الرحمن الرحيم”، و”السلام عليكم ورحمة الله تعالى وبركاته”\n-   البحث عن الكلمات المرشحة لتكون متلازمات، يعني استخلاص متلازمات جديدة.\n\n\u003c/div\u003e\n\n### Install\n```shell\npip install maskouk-pysqlite\n```\n\n### Usage\n\n#### import\n```python\n\u003e\u003e\u003e import pyarabic.araby as araby\n\u003e\u003e\u003e import maskouk.collocations as msk\n\u003e\u003e\u003e mydict = msk.CollocationClass()\n```\n#### Test if collocation exists in database\n```python\n\u003e\u003e\u003e wlist = [u\"كرة\", u\"القدم\"]\n\u003e\u003e\u003e # test if collocation exists\n\u003e\u003e\u003e results = mydict.is_collocated(wlist)\n\u003e\u003e\u003e print(\"inuput:\", wlist)\n\u003e\u003e\u003e print(\"output:\",results)\ninuput: ['كرة', 'القدم']\noutput: كرة القدم\n\u003e\u003e\u003e wlist = [u\"شمس\", u\"النهار\"]\n\u003e\u003e\u003e results = mydict.is_collocated(wlist)\n\u003e\u003e\u003e print(\"inuput:\", wlist)\n\u003e\u003e\u003e print(\"output:\",results)\ninuput: ['شمس', 'النهار']\noutput: False\n```\n#### Test if a word has collocations in database\n```python\n\u003e\u003e\u003e # get all collocations for a specific word\n\u003e\u003e\u003e word1 = u\"كرة\"\n\u003e\u003e\u003e results  = mydict.is_collocated_word(word1)\n\u003e\u003e\u003e print(\"inuput:\", word1)\n\u003e\u003e\u003e print(\"output:\",results)\ninuput: كرة\noutput: {'القدم': 'كُرَة الْقَدَمِ'}\n\u003e\u003e\u003e\n\u003e\u003e\u003e word = u\"بيت\"\n\u003e\u003e\u003e # get all collocations for a specific word\n\u003e\u003e\u003e results  = mydict.is_collocated_word(word)\n\u003e\u003e\u003e print(\"inuput:\", word)\n\u003e\u003e\u003e print(\"output:\",results)\ninuput: بيت\noutput: {'العدة': 'بَيْت الْعِدَّةِ', 'المستأجر': 'بَيْت الْمُسْتَأْجِرِ', 'المشتري': 'بَيْتِ الْمُشْتَرِي', 'الرجل': 'بَيْت الرَّجُلِ', 'البناء': 'بَيْت الْبِنَاءِ', 'الزوج': 'بَيْت الزَّوْجِ', 'المال': 'بيت المال', 'المقدس': 'بَيْت الْمَقْدِسِ', 'البائع': 'بَيْت الْبَائِعِ', 'الخلاء': 'بَيْت الْخَلَاءِ', 'الأب': 'بَيْت الْأَبِ', 'الله': 'بَيْت اللّهِ'}\n```\n#### Detect collocation in a phrase\n It can be presented asseparated lists or tagged forms\n\n```python\n\u003e\u003e\u003e # detect collocations in phrase    \n\u003e\u003e\u003e text = u\"لعبنا مباراة كرة القدم في بيت المقدس\"\n\u003e\u003e\u003e wordlist = araby.tokenize(text)\n\u003e\u003e\u003e results  = mydict.ngramfinder(2, wordlist)\n\u003e\u003e\u003e print(\"inuput:\", text)\n\u003e\u003e\u003e print(\"output:\",results)\ninuput: لعبنا مباراة كرة القدم في بيت المقدس\noutput: ['لعبنا', 'مباراة', 'كرة القدم', 'في', 'بيت المقدس']\n\u003e\u003e\u003e # detect collocations in phrase    \n\u003e\u003e\u003e text = u\"لعبنا مباراة كرة القدم في بيت المقدس\"\n\u003e\u003e\u003e wordlist = araby.tokenize(text)\n\u003e\u003e\u003e results   = mydict.lookup(wordlist)\n\u003e\u003e\u003e print(\"inuput:\", text)\n\u003e\u003e\u003e print(\"output:\",results)\ninuput: لعبنا مباراة كرة القدم في بيت المقدس\noutput: (['لعبنا', 'مباراة', 'كُرَة', 'الْقَدَمِ', 'في', 'بَيْت', 'الْمَقْدِسِ'], ['CO', 'CO', 'CB', 'CI', 'CO', 'CB', 'CI'])\n\u003e\u003e\u003e \n```\n#### detect long collocations in a phrase\nSome collocations are too long to be used in a bigrams database like\n\"بسم الله الرحمن الرحيم\"\n\"السلام عليكم ورحمة الله وبركاته\"\n\"أهلا وسهلا بكم\"\n```python\n\u003e\u003e\u003e # get Long collocations\n... text = u\" قلت لهم السلام عليكم ورحمة الله تعالى وبركاته ثم رجعت\"\n\u003e\u003e\u003e results  = mydict.lookup4long_collocations(text)\n\u003e\u003e\u003e print(\"inuput:\", text)\ninuput:  قلت لهم السلام عليكم ورحمة الله تعالى وبركاته ثم رجعت\n\u003e\u003e\u003e print(\"output:\",results)   \noutput:  قلت لهم السّلامُ عَلَيكُمْ وَرَحْمَةُ اللهِ تَعَالَى وبركاته ثم رجعت\n```\n#### Detect candidate collocations in phrase\nThe candidate collocation doesn't exists in the database, this feature is used to extract collocations based on rules.\nIt returns a rule code, 100 as default (no collocation)\n```python\n\u003e\u003e\u003e text = u\"ظهر رئيس الوزراء السيد عبد الملك بن عامر ومعه أمير دولة غرناطة ونهر النيل انطلاق السباق\"\n\u003e\u003e\u003e wordlist = araby.tokenize(text)\n\u003e\u003e\u003e previous = \"__\"\n\u003e\u003e\u003e for wrd in wordlist:\n...     wlist = [previous, wrd]\n...     results  = mydict.is_possible_collocation(wlist, lenght = 2)\n...     print(\"inuput:\", wlist)\n...     print(\"output:\", results)   \n...     previous  = wrd\n... \ninuput: ['__', 'ظهر']\noutput: 100\ninuput: ['ظهر', 'رئيس']\noutput: 100\ninuput: ['رئيس', 'الوزراء']\noutput: 100\ninuput: ['الوزراء', 'السيد']\noutput: 20\ninuput: ['السيد', 'عبد']\noutput: 100\ninuput: ['عبد', 'الملك']\noutput: 15\ninuput: ['الملك', 'بن']\noutput: 100\ninuput: ['بن', 'عامر']\noutput: 15\ninuput: ['عامر', 'ومعه']\noutput: 100\ninuput: ['ومعه', 'أمير']\noutput: 100\ninuput: ['أمير', 'دولة']\noutput: 100\ninuput: ['دولة', 'غرناطة']\noutput: 10\ninuput: ['غرناطة', 'ونهر']\noutput: 100\ninuput: ['ونهر', 'النيل']\noutput: 100\ninuput: ['النيل', 'انطلاق']\noutput: 100\ninuput: ['انطلاق', 'السباق']\noutput: 100\n\u003e\u003e\u003e \n\n\n```\n#### [requirement]\n  \n    1- pyarabic \n    2. sqlite\n\n\n## Data Structure:\n### Colocations database\n```sql\nCREATE TABLE \"collocations\" (\n    \"id\" INTEGER PRIMARY KEY  NOT NULL , \n    \"vocalized\" VARCHAR,\n    \"unvocalized\" VARCHAR,\n    \"rule\" VARCHAR, \n    \"category\" VARCHAR, \n    \"note\" VARCHAR,\n    \"first\" VARCHAR,\n    \"second\" VARCHAR\n    );\n```\n\nCSV Structure:\n\n1.   id             : id unique in the database\n2.  vocalized   : vocalized collocation\n3.  unvocalized : unvocalized collocation\n4.  rule        : the extraction rule number\n5.  category    : collocation category\n6.  note        : \n7. first: first word\n8. second: second word\n\n\u003c!--\n### Semantic database\n```sql\nCREATE TABLE sqlite_sequence(name,seq);\nCREATE TABLE \"derivations\" (\n    \"id\" INTEGER PRIMARY KEY  AUTOINCREMENT  NOT NULL  UNIQUE ,\n    \"verb\" varchar NOT NULL ,\n    \"transitive\" BOOL NOT NULL  DEFAULT 1,\n    \"derived\" VARCHAR NOT NULL ,\n    \"type\" VARCHAR NOT NULL \n );\n\n```\n\nCSV Structure:\n\n * Derivattion\n1.   id             : id unique in the database\n2.  verb    : vocalized collocation\n3.  transtive : if the verb is transitive\n4.  derived         :  derived word from verb number\n5.  type    : type \n\n* semantic relations\n\nCREATE TABLE \"relations\" (\n    \"id\" INTEGER PRIMARY KEY  NOT NULL ,\n    \"first\" VARCHAR NOT NULL  DEFAULT ('') ,\n    \"second\" VARCHAR NOT NULL  DEFAULT ('') ,\n    \"rule\" VARCHAR NOT NULL  DEFAULT (0) \n );\n \n \nCSV Structure:\n\n1.   id             : id unique in the database\n2. first: first word\n3. second: second word\n4.  rule        : the extraction rule number\n        : \n--\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinuxscout%2Fmaskouk-pysqlite","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinuxscout%2Fmaskouk-pysqlite","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinuxscout%2Fmaskouk-pysqlite/lists"}