{"id":17168887,"url":"https://github.com/alirezatheh/perke","last_synced_at":"2025-08-20T22:32:05.430Z","repository":{"id":44372109,"uuid":"237978237","full_name":"alirezatheh/perke","owner":"alirezatheh","description":"A keyphrase extractor for Persian ","archived":false,"fork":false,"pushed_at":"2025-08-11T18:53:42.000Z","size":146,"stargazers_count":69,"open_issues_count":2,"forks_count":8,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-16T19:33:24.139Z","etag":null,"topics":["data-mining","data-processing","information-retrieval","keyphrase","keyphrase-extraction","keyphrase-extractor","keyword","keyword-extraction","keyword-extractor","machine-learning","ml","natural-language-processing","nlp","persian","persian-language","python","text-mining","text-processing","unsupervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alirezatheh.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-02-03T14:00:01.000Z","updated_at":"2025-07-03T13:24:31.000Z","dependencies_parsed_at":"2022-07-12T18:19:53.250Z","dependency_job_id":"a883443c-2fe8-4b70-afd9-aae4f751a167","html_url":"https://github.com/alirezatheh/perke","commit_stats":{"total_commits":87,"total_committers":4,"mean_commits":21.75,"dds":0.4482758620689655,"last_synced_commit":"988e334744fe4c2f4490dc45ed974d0a1c9b56d2"},"previous_names":["alirezah320/perke"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/alirezatheh/perke","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alirezatheh%2Fperke","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alirezatheh%2Fperke/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alirezatheh%2Fperke/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alirezatheh%2Fperke/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alirezatheh","download_url":"https://codeload.github.com/alirezatheh/perke/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alirezatheh%2Fperke/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271397960,"owners_count":24752641,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-20T02:00:09.606Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-mining","data-processing","information-retrieval","keyphrase","keyphrase-extraction","keyphrase-extractor","keyword","keyword-extraction","keyword-extractor","machine-learning","ml","natural-language-processing","nlp","persian","persian-language","python","text-mining","text-processing","unsupervised-learning"],"created_at":"2024-10-14T23:13:11.187Z","updated_at":"2025-08-20T22:32:03.437Z","avatar_url":"https://github.com/alirezatheh.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Perke\n[![tests](https://github.com/alirezatheh/perke/workflows/tests/badge.svg)](https://github.com/alirezatheh/perke/actions/workflows/tests.yaml)\n[![pre-commit.ci](https://results.pre-commit.ci/badge/github/AlirezaTheH/perke/main.svg)](https://results.pre-commit.ci/latest/github/alirezatheh/perke/main)\n[![PyPI Version](https://img.shields.io/pypi/v/perke)](https://pypi.python.org/pypi/perke)\n[![Python Versions](https://img.shields.io/pypi/pyversions/perke)](https://pypi.org/project/perke)\n[![Documentation Status](https://readthedocs.org/projects/perke/badge/?version=stable)](https://perke.readthedocs.io/en/stable/?badge=stable)\n\nPerke is a Python keyphrase extraction package for Persian language. It\nprovides an end-to-end keyphrase extraction pipeline in which each component\ncan be easily modified or extended to develop new models.\n\n## Installation\n- The easiest way to install is from PyPI:\n  ```bash\n  pip install perke\n  ```\n  Alternatively, you can install directly from GitHub:\n  ```bash\n  pip install git+https://github.com/alirezatheh/perke.git\n  ```\n- Perke also requires a trained POS tagger model. We use\n  [Hazm's](https://github.com/roshan-research/hazm) POS tagger model. You can\n  easily download latest [Hazm's](https://github.com/roshan-research/hazm) POS\n  tagger using the following command:\n  ```bash\n  python -m perke download\n  ```\n  Alternatively, you can use another model with same tag names and structure,\n  and put it in the\n  [`resources`](https://github.com/alirezatheh/perke/tree/main/perke/resources)\n  directory.\n\n## Simple Example\nPerke provides a standardized API for extracting keyphrases from a text. Start\nby typing the 4 lines below to use `TextRank` keyphrase extractor.\n\n\n```python\nfrom perke.unsupervised.graph_based import TextRank\n\n# 1. Create a TextRank extractor.\nextractor = TextRank()\n\n# 2. Load the text.\nextractor.load_text(input='text or path/to/input_file')\n\n# 3. Build the graph representation of the text and weight the\n#    words. Keyphrase candidates are composed of the 33 percent\n#    highest weighted words.\nextractor.weight_candidates(top_t_percent=0.33)\n\n# 4. Get the 10 highest weighted candidates as keyphrases.\nkeyphrases = extractor.get_n_best(n=10)\n```\n\nFor more in depth examples see the\n[`examples`](https://github.com/alirezatheh/perke/tree/main/examples)\ndirectory.\n\n## Documentation\nDocumentation and references are available at\n[Read The Docs](https://perke.readthedocs.io).\n\n## Implemented Models\nPerke currently, implements the following keyphrase extraction models:\n\n- Unsupervised models\n    - Graph-based models\n        - TextRank: [article](http://www.aclweb.org/anthology/W04-3252.pdf)\n          by Mihalcea and Tarau, 2004\n        - SingleRank: [article](https://www.aaai.org/Papers/AAAI/2008/AAAI08-136.pdf)\n          by Wan and Xiao, 2008\n        - TopicRank: [article](http://aclweb.org/anthology/I13-1062.pdf)\n          by Bougouin, Boudin and Daille, 2013\n        - PositionRank: [article](http://www.aclweb.org/anthology/P17-1102.pdf)\n          by Florescu and Caragea, 2017\n        - MultipartiteRank: [article](https://www.aclweb.org/anthology/N18-2105.pdf)\n          by Boudin, 2018\n\n## Acknowledgements\nPerke is inspired by [pke](https://github.com/boudinfl/pke).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falirezatheh%2Fperke","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falirezatheh%2Fperke","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falirezatheh%2Fperke/lists"}