{"id":13826898,"url":"https://github.com/kargaranamir/parstdex","last_synced_at":"2025-07-09T02:32:40.061Z","repository":{"id":37451634,"uuid":"426235970","full_name":"kargaranamir/parstdex","owner":"kargaranamir","description":"A package that extracts Persian time and date markers by applying regexes -- AACL 2022","archived":false,"fork":false,"pushed_at":"2022-11-29T02:47:16.000Z","size":690,"stargazers_count":24,"open_issues_count":2,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-10-11T16:36:38.253Z","etag":null,"topics":["datetime","event-extract","event-extraction","hengam","hengamtagger","information-extraction","nlp","parstdex","persian","persian-calendar","persian-datetime","persian-time","regex-pattern","time-date"],"latest_commit_sha":null,"homepage":"https://huggingface.co/spaces/kargaranamir/parstdex","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kargaranamir.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"contributing.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-11-09T13:16:57.000Z","updated_at":"2024-08-26T14:14:00.000Z","dependencies_parsed_at":"2023-01-21T12:00:16.281Z","dependency_job_id":null,"html_url":"https://github.com/kargaranamir/parstdex","commit_stats":null,"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kargaranamir%2Fparstdex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kargaranamir%2Fparstdex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kargaranamir%2Fparstdex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kargaranamir%2Fparstdex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kargaranamir","download_url":"https://codeload.github.com/kargaranamir/parstdex/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225481044,"owners_count":17481141,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datetime","event-extract","event-extraction","hengam","hengamtagger","information-extraction","nlp","parstdex","persian","persian-calendar","persian-datetime","persian-time","regex-pattern","time-date"],"created_at":"2024-08-04T09:01:46.278Z","updated_at":"2024-11-20T06:30:34.494Z","avatar_url":"https://github.com/kargaranamir.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# HengamTagger or Parstdex (persian time date extractor)\n\n[![Pypi Package](https://badgen.net/pypi/v/parstdex)](https://pypi.org/project/parstdex/)\n[![Documentation Status](https://readthedocs.org/projects/parstdex/badge/?version=latest)](https://parstdex.readthedocs.io)\n[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/kargaranamir/parstdex)\n[![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kargaranamir/parstdex/blob/main/performance_test.ipynb)\n\n## Description \n**Parstdex** (knwon as **HengamTagger** in our paper at [aacl](https://aclanthology.org/2022.aacl-main.74/)) is a rule-based Persian temporal extractor built on top of regular expressions specifying pattern units and patterns that can match temporal expressions. \n\n## How to Install parstdex\n\n```bash\npip install parstdex\n```\n\n## How to use\n\n```python\nfrom parstdex import Parstdex\n\nmodel = Parstdex()\n\nsentence = \"\"\"ماریا شنبه عصر راس ساعت ۱۷ و بیست و سه دقیقه به نادیا زنگ زد اما تا سه روز بعد در تاریخ ۱۸ شهریور سال ۱۳۷۸ ه.ش. خبری از نادیا نشد\"\"\"\n```\n### Extract spans\n```python\nmodel.extract_span(sentence)\n```\noutput :\n```json\n{\"datetime\": [[6, 47], [68, 78], [82, 111]], \"date\": [[6, 10], [68, 78], [82, 111]], \"time\": [[11, 47]]}\n```\n\n### Extract markers\n```python\nmodel.extract_marker(sentence)\n```\n\n```json\n{\n   \"datetime\":{\n      \"[6, 47]\":\"شنبه عصر راس ساعت ۱۷ و بیست و سه دقیقه به\",\n      \"[68, 78]\":\"سه روز بعد\",\n      \"[82, 111]\":\"تاریخ ۱۸ شهریور سال ۱۳۷۸ ه.ش.\"\n   },\n   \"date\":{\n      \"[6, 10]\":\"شنبه\",\n      \"[68, 78]\":\"سه روز بعد\",\n      \"[82, 111]\":\"تاریخ ۱۸ شهریور سال ۱۳۷۸ ه.ش.\"\n   },\n   \"time\":{\n      \"[11, 47]\":\"عصر راس ساعت ۱۷ و بیست و سه دقیقه به\"\n   }\n}\n```\n\n### Extract TimeML scheme\n```python\nmodel.extract_time_ml(sentence)\n```\noutput :\n```html\nماریا \n\u003cTIMEX3 type='DATE'\u003e\nشنبه\n\u003c/TIMEX3\u003e\n\u003cTIMEX3 type='TIME'\u003e\nعصر راس ساعت ۱۷ و بیست و سه دقیقه به\n\u003c/TIMEX3\u003e\n نادیا زنگ زد اما \n\u003cTIMEX3 type='DURATION'\u003e\nتا سه روز بعد\n\u003c/TIMEX3\u003e\n در \n\u003cTIMEX3 type='DATE'\u003e\nتاریخ ۱۸ شهریور سال ۱۳۷۸ ه.ش.\n\u003c/TIMEX3\u003e\nخبری از نادیا نشد\n```\n\n\n### Extract markers' NER tags\n#### DATTIM mode (Default):\n```python\nmodel.extract_ner(sentence, mode=\"dattim\")\n```\noutput :\n```\n[\n    (\"ماریا\", \"O\"),\n    (\"شنبه\", \"B-DAT\"),\n    (\"عصر\", \"B-TIM\"),\n    (\"راس\", \"I-TIM\"),\n    (\"ساعت\", \"I-TIM\"),\n    (\"۱۷\", \"I-TIM\"),\n    (\"و\", \"I-TIM\"),\n    (\"بیست\", \"I-TIM\"),\n    (\"و\", \"I-TIM\"),\n    (\"سه\", \"I-TIM\"),\n    (\"دقیقه\", \"I-TIM\"),\n    (\"به\", \"I-TIM\"),\n    (\"نادیا\", \"O\"),\n    (\"زنگ\", \"O\"),\n    (\"زد\", \"O\"),\n    (\"اما\", \"O\"),\n    (\"تا\", \"B-DAT\"),\n    (\"سه\", \"I-DAT\"),\n    (\"روز\", \"I-DAT\"),\n    (\"بعد\", \"I-DAT\"),\n    (\"در\", \"I-DAT\"),\n    (\"تاریخ\", \"I-DAT\"),\n    (\"۱۸\", \"I-DAT\"),\n    (\"شهریور\", \"I-DAT\"),\n    (\"سال\", \"I-DAT\"),\n    (\"۱۳۷۸\", \"I-DAT\"),\n    (\"ه\", \"I-DAT\"),\n    (\".\", \"I-DAT\"),\n    (\"ش\", \"I-DAT\"),\n    (\".\", \"I-DAT\"),\n    (\"خبری\", \"O\"),\n    (\"از\", \"O\"),\n    (\"نادیا\", \"O\"),\n    (\"نشد\", \"O\"),\n]\n\n```\n\n#### TMP mode:\n```python\nmodel.extract_ner(sentence, mode=\"tmp\")\n```\noutput :\n```\n[\n    (\"ماریا\", \"O\"),\n    (\"شنبه\", \"B-TMP\"),\n    (\"عصر\", \"I-TMP\"),\n    (\"راس\", \"I-TMP\"),\n    (\"ساعت\", \"I-TMP\"),\n    (\"۱۷\", \"I-TMP\"),\n    (\"و\", \"I-TMP\"),\n    (\"بیست\", \"I-TMP\"),\n    (\"و\", \"I-TMP\"),\n    (\"سه\", \"I-TMP\"),\n    (\"دقیقه\", \"I-TMP\"),\n    (\"به\", \"I-TMP\"),\n    (\"نادیا\", \"O\"),\n    (\"زنگ\", \"O\"),\n    (\"زد\", \"O\"),\n    (\"اما\", \"O\"),\n    (\"تا\", \"B-TMP\"),\n    (\"سه\", \"I-TMP\"),\n    (\"روز\", \"I-TMP\"),\n    (\"بعد\", \"I-TMP\"),\n    (\"در\", \"I-TMP\"),\n    (\"تاریخ\", \"I-TMP\"),\n    (\"۱۸\", \"I-TMP\"),\n    (\"شهریور\", \"I-TMP\"),\n    (\"سال\", \"I-TMP\"),\n    (\"۱۳۷۸\", \"I-TMP\"),\n    (\"ه\", \"I-TMP\"),\n    (\".\", \"I-TMP\"),\n    (\"ش\", \"I-TMP\"),\n    (\".\", \"I-TMP\"),\n    (\"خبری\", \"O\"),\n    (\"از\", \"O\"),\n    (\"نادیا\", \"O\"),\n    (\"نشد\", \"O\"),\n]\n\n\n```\n\n## File Structure:\nParstdex architecture is very flexible and scalable and therefore suggests an easy solution to adapt to new patterns which haven't been considered yet.\n```\n├── parstdex                 \n│   └── utils\n|   |   └── annotation\n|   |   |   └── ...\n|   |   └── pattern\n|   |   |   └── ...\n|   |   └── special_words\n|   |   |   └── words.txt\n|   |   └── const.py\n|   |   └── normalizer.py\n|   |   └── pattern_to_regex.py\n|   |   └── deprecation.py\n|   |   └── regex_tool.py\n|   |   └── spans.py\n|   |   └── tokenizer.py\n|   └── marker_extractor.py\n|   └── settings.py\n└── Test           \n│   └── data.json\n|   └── test_parstdex.py\n|      \n└── examples.py\n└── performance_test.ipynb\n└── requirement.txt\n└── setup.py\n```\n\n## Performance Test \nExecutable codes and performance test results are accessible on [google colab](https://colab.research.google.com/github/kargaranamir/parstdex/blob/main/performance_test.ipynb).\n\nThe average time required to obtain temporal expressions is `6 ms`. This test was conducted using 264 sentences with an average length of 50 characters that covered all of the patterns.\n\n## How to contribute\n\nPlease feel free to provide us with any feedback or suggestions.  You can find more information on how to contribute to Parstdex by reading the \n[contribution document](https://github.com/kargaranamir/parstdex/blob/main/contributing.md).\n\n## Citation\n\nIf you use any part of this repository in your research, please cite it using the following BibTex entry.\n```python\n@inproceedings{mirzababaei-etal-2022-hengam,\n\ttitle        = {Hengam: An Adversarially Trained Transformer for {P}ersian Temporal Tagging},\n\tauthor       = {Mirzababaei, Sajad  and Kargaran, Amir Hossein  and Sch{\\\"u}tze, Hinrich  and Asgari, Ehsaneddin},\n\tyear         = 2022,\n\tbooktitle    = {Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing},\n\tpublisher    = {Association for Computational Linguistics},\n\taddress      = {Online only},\n\tpages        = {1013--1024},\n\turl          = {https://aclanthology.org/2022.aacl-main.74}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkargaranamir%2Fparstdex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkargaranamir%2Fparstdex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkargaranamir%2Fparstdex/lists"}