{"id":17718584,"url":"https://github.com/linuxscout/mishtar","last_synced_at":"2025-08-03T03:07:58.786Z","repository":{"id":138755287,"uuid":"131498624","full_name":"linuxscout/mishtar","owner":"linuxscout","description":"Mishtar: Named and temporal entities chunker","archived":false,"fork":false,"pushed_at":"2020-08-19T21:20:08.000Z","size":190,"stargazers_count":13,"open_issues_count":1,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2023-03-11T10:12:33.097Z","etag":null,"topics":["arabic-language","arabic-nlp","chunking","named-entity-recognition","nlp","temporal-entities-chunker"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/linuxscout.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-04-29T13:54:19.000Z","updated_at":"2022-12-23T00:45:42.000Z","dependencies_parsed_at":null,"dependency_job_id":"ae61571a-5c70-454e-98d7-44abd5bf4ee4","html_url":"https://github.com/linuxscout/mishtar","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fmishtar","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fmishtar/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fmishtar/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linuxscout%2Fmishtar/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/linuxscout","download_url":"https://codeload.github.com/linuxscout/mishtar/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233965887,"owners_count":18758368,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arabic-language","arabic-nlp","chunking","named-entity-recognition","nlp","temporal-entities-chunker"],"created_at":"2024-10-25T14:54:55.856Z","updated_at":"2025-01-14T22:14:59.248Z","avatar_url":"https://github.com/linuxscout.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# مشطار: استخلاص المسميات والعبارات الزمنية Mishtar: Named and temporal entities chunker\n\n\nاستخلاص العبارات الاسمية والزمنية من النص مفيدة للتحليل النحوي،\nChunking is to extract named entities and temporal expression.\n\n\n  Developpers:  Taha Zerrouki: http://tahadz.com\n    taha dot zerrouki at gmail dot com\n\nFeatures |   value\n------------|-----------\nAuthors  | Taha Zerrouki: http://tahadz.com,  taha dot zerrouki at gmail dot com\nRelease  | 0.3\nLicense  |[GPL](https://github.com/linuxscout/mishtar/master/LICENSE)\nTracker  |[linuxscout/mishtar/Issues](https://github.com/linuxscout/mishtar/issues)\nWebsite  |[https://pypi.python.org/pypi/mishtar](https://pypi.python.org/pypi/mishtar)\nSource  |[Github](http://github.com/linuxscout/mishtar)\nFeedbacks  |[Comments](https://github.com/linuxscout/mishtar/issues)\nAccounts  |[@Twitter](https://twitter.com/linuxscout)  [@Sourceforge](http://sourceforge.net/projects/mishtar/)\n\n\u003c!--Doc  |[package Documentaion](http://pythonhosted.org/mishtar/)--\u003e\n\u003c!--Download  |[pypi.python.org](https://pypi.python.org/pypi/mishtar)--\u003e\n\n\n\n\u003c!--\n## Citation\nIf you would cite it in academic work, can you use this citation\n```\nT. Zerrouki‏, mishtar,  Arabic Word Tagger,\n  https://pypi.python.org/pypi/mishtar/, 2018\n```\nor in bibtex format\n\n```bibtex\n@misc{zerrouki2012mishtar,\n  title={mishtar : Arabic Word Tagger},\n  author={Zerrouki, Taha},\n  url={https://pypi.python.org/pypi/mishtar,\n  year={2010}\n}\n```\n--\u003e\n\n## مزايا\n* استخلاص المسميات\n* استخلاص العبارات الزمنية (تواريخ ميلادية وهجرية، ونسبية)\n\n## Features\n* Extract named entities\n* Extract Tempral expressions\n\nApplications\n====\n* Text mining.\n* Text summarizing.\n* Sentences identification.\n* Grammar analysis.\n* Morphological analysis acceleration.\n* Extraction of ngrams..\n\nتطبيقات \n====\n* التنقيب عن المعلومات.\n* تلخيص النص.\n* التعرف على الجمل.\n* التحليل النحوي.\n* تسريع التحليل الصرفي.\n* استخراج المصطلحات والمسكوكات والمتلازمات.\n\n\n\nDemo جرّب\n====\nيمكن التجربة على [موقع مشكال](http://tahadz.com/mishkal)\n، اختر أدوات، ثم استخلاص ثم المكونات\nYou can test it on [Mishkal Site](http://tahadz.com/mishkal), choose: Tool \u003e extraction \u003e Entities\n![mishtar Demo](doc/images/mishtar_demo.png \"mishtar Demo\")\n\n\n\n### Installation\n\n```\npip install mishtar\n```\n\n### Usage\n```python\nimport mishtar.mynamed as mynamed\n```\n* Example **Test named entities**\n\n```python\nimport mishtar.mynamed\nimport pyarabic.araby as araby\nTEXTS = [\n    u\"جاء  خالد بن الوليد وقاتل مسيلمة بن حذام الكذاب في موقعة الحديقة\",\n    u'''روى أحمد بن عقيل الشامي عن أبي طلحة\n المغربي أنّ عقابا بن مسعود بن أبي سعاد قال''',\n    u\"صرّح الأمير تشارلز الأول\",\n]\nchunker = mishtar.mynamed.myNamed()\nfor text1 in TEXTS:\n    word_list = araby.tokenize(text1)\n    tag_list2 = chunker.detect_chunks(word_list)\n    result = chunker.pretashkeel(word_list)\n    print(\"tashkeel\", (u' '.join(result)))\n    tuples = (zip(tag_list2, word_list))\n    for tup in tuples:\n        print(tup)\n****Result ****\n المغربي أنّ عقابا بْنَ مسعود بْنِ أبي سعاد قال\n(u'0', u'روى')\n('NB', u'أحمد')\n('NI', u'بن')\n('NI', u'عقيل')\n('NI', u'الشامي')\n(u'0', u'عن')\n('NB', u'أبي')\n('NI', u'طلحة')\n(u'0', u'')\n(u'0', u'المغربي')\n(u'0', u'أنّ')\n('NB', u'عقابا')\n('NI', u'بن')\n('NI', u'مسعود')\n('NI', u'بن')\n('NI', u'أبي')\n('NI', u'سعاد')\n(u'0', u'قال')\ntashkeel صرّح الأمير تشارلز الأول\n(u'0', u'صرّح')\n(u'0', u'الأمير')\n(u'0', u'تشارلز')\n(u'0', u'الأول')\n```\n\n* Test Temporal expressions\n\n\n```python\nimport pyarabic.araby as araby\nimport mishtar.mytemped as mytemped\ntexts =[\n'* قسم واحد فقط: شهر نوفمبر سنة 2015، ',\nu'* قسمين : شهر أكتوبر 1973، الخامس من نوفمبر، ', \nu'* ثلاثة اقسام: يوم الجمعة الخامس عشر من شهر رمضان سنة 1435 هجرية.', \n]\nchunker = mytemped.myTemped()\nfor text1 in texts:\n    word_list = araby.tokenize(text1)\n    tag_list2 = chunker.detect_chunks(word_list)\n    print(text1)\n\n    tuples = (zip(tag_list2, word_list))\n    for tup in tuples:\n        print(tup)\n***Result ***\n* قسم واحد فقط: شهر نوفمبر سنة 2015، \n(u'0', u'*')\n(u'0', u'قسم')\n(u'0', u'واحد')\n(u'0', u'فقط')\n(u'0', u':')\n(u'NB', u'شهر')\n(u'NI', u'نوفمبر')\n(u'NB', u'سنة')\n(u'NI', u'2015')\n(u'0', u'،')\n* قسمين : شهر أكتوبر 1973، الخامس من نوفمبر، \n(u'0', u'*')\n(u'0', u'قسمين')\n(u'0', u':')\n(u'NB', u'شهر')\n(u'NI', u'أكتوبر')\n(u'NI', u'1973')\n(u'0', u'،')\n(u'NB', u'الخامس')\n(u'NI', u'من')\n(u'NI', u'نوفمبر')\n(u'0', u'،')\n* ثلاثة اقسام: يوم الجمعة الخامس عشر من شهر رمضان سنة 1435 هجرية.\n(u'0', u'*')\n(u'0', u'ثلاثة')\n(u'0', u'اقسام')\n(u'0', u':')\n(u'NB', u'يوم')\n(u'NI', u'الجمعة')\n(u'NI', u'الخامس')\n(u'NI', u'عشر')\n(u'NI', u'من')\n(u'NI', u'شهر')\n(u'NI', u'رمضان')\n(u'NB', u'سنة')\n(u'NI', u'1435')\n(u'NI', u'هجرية')\n(u'0', u'.')\n\n```\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinuxscout%2Fmishtar","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinuxscout%2Fmishtar","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinuxscout%2Fmishtar/lists"}