{"id":13564222,"url":"https://github.com/boudinfl/pke","last_synced_at":"2025-05-15T02:10:19.965Z","repository":{"id":38711670,"uuid":"46108782","full_name":"boudinfl/pke","owner":"boudinfl","description":"Python Keyphrase Extraction module","archived":false,"fork":false,"pushed_at":"2023-07-12T16:18:04.000Z","size":86590,"stargazers_count":1583,"open_issues_count":4,"forks_count":290,"subscribers_count":30,"default_branch":"master","last_synced_at":"2025-05-14T21:53:10.085Z","etag":null,"topics":["computational-linguistics","information-retrieval","keyphrase","keyphrase-extraction","keyword","keyword-extraction","natural-language-processing","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/boudinfl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-11-13T08:11:45.000Z","updated_at":"2025-05-12T14:04:21.000Z","dependencies_parsed_at":"2024-06-21T04:29:11.942Z","dependency_job_id":null,"html_url":"https://github.com/boudinfl/pke","commit_stats":{"total_commits":334,"total_committers":17,"mean_commits":"19.647058823529413","dds":"0.20059880239520955","last_synced_commit":"f2d4f5d2252c64d23defccd32fdac8809cfd7ce0"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boudinfl%2Fpke","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boudinfl%2Fpke/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boudinfl%2Fpke/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boudinfl%2Fpke/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/boudinfl","download_url":"https://codeload.github.com/boudinfl/pke/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254259387,"owners_count":22040821,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computational-linguistics","information-retrieval","keyphrase","keyphrase-extraction","keyword","keyword-extraction","natural-language-processing","python"],"created_at":"2024-08-01T13:01:28.256Z","updated_at":"2025-05-15T02:10:14.943Z","avatar_url":"https://github.com/boudinfl.png","language":"Python","readme":"# `pke` - python keyphrase extraction\n\n`pke` is an **open source** python-based **keyphrase extraction** toolkit. It\nprovides an end-to-end keyphrase extraction pipeline in which each component can\nbe easily modified or extended to develop new models. `pke` also allows for \neasy benchmarking of state-of-the-art keyphrase extraction models, and \nships with supervised models trained on the\n[SemEval-2010 dataset](http://aclweb.org/anthology/S10-1004).\n\n![python-package workflow](https://github.com/boudinfl/pke/actions/workflows/python-package.yml/badge.svg)\n\n## Table of Contents\n\n* [Installation](#installation)\n* [Minimal example](#minimal-example)\n* [Getting started](#getting-started)\n* [Implemented models](#implemented-models)\n* [Model performances](#model-performances)\n* [Citing pke](#citing-pke)\n\n## Installation\n\nTo pip install `pke` from github:\n\n```bash\npip install git+https://github.com/boudinfl/pke.git\n```\n\n`pke` relies on `spacy` (\u003e= 3.2.3) for text processing and requires [models](https://spacy.io/usage/models) to be installed: \n\n```bash\n# download the english model\npython -m spacy download en_core_web_sm\n```\n\n## Minimal example\n\n`pke` provides a standardized API for extracting keyphrases from a document.\nStart by typing the 5 lines below. For using another model, simply replace\n`pke.unsupervised.TopicRank` with another model ([list of implemented models](#implemented-models)).\n\n```python\nimport pke\n\n# initialize keyphrase extraction model, here TopicRank\nextractor = pke.unsupervised.TopicRank()\n\n# load the content of the document, here document is expected to be a simple \n# test string and preprocessing is carried out using spacy\nextractor.load_document(input='text', language='en')\n\n# keyphrase candidate selection, in the case of TopicRank: sequences of nouns\n# and adjectives (i.e. `(Noun|Adj)*`)\nextractor.candidate_selection()\n\n# candidate weighting, in the case of TopicRank: using a random walk algorithm\nextractor.candidate_weighting()\n\n# N-best selection, keyphrases contains the 10 highest scored candidates as\n# (keyphrase, score) tuples\nkeyphrases = extractor.get_n_best(n=10)\n```\n\nA detailed example is provided in the [`examples/`](examples/) directory.\n\n## Getting started\n\nTo get your hands dirty with `pke`, we invite you to try our tutorials out.\n\n|                          Name                   |     Link     |\n| ----------------------------------------------  |  ----------  |\n| Getting started with `pke` and keyphrase extraction | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/keyphrasification/hands-on-with-pke/blob/main/part-1-graph-based-keyphrase-extraction.ipynb) |\n| Model parameterization                          | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/keyphrasification/hands-on-with-pke/blob/main/part-2-parameterization.ipynb) |\n| Benchmarking models                             | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/keyphrasification/hands-on-with-pke/blob/main/part-3-benchmarking-models.ipynb) |\n\n## Implemented models\n\n`pke` currently implements the following keyphrase extraction models:\n\n* Unsupervised models\n  * Statistical models\n    * FirstPhrases\n    * TfIdf\n    * KPMiner [(El-Beltagy and Rafea, 2010)](http://www.aclweb.org/anthology/S10-1041.pdf)\n    * YAKE [(Campos et al., 2020)](https://doi.org/10.1016/j.ins.2019.09.013)\n  * Graph-based models\n    * TextRank [(Mihalcea and Tarau, 2004)](http://www.aclweb.org/anthology/W04-3252.pdf)\n    * SingleRank  [(Wan and Xiao, 2008)](http://www.aclweb.org/anthology/C08-1122.pdf)\n    * TopicRank [(Bougouin et al., 2013)](http://aclweb.org/anthology/I13-1062.pdf)\n    * TopicalPageRank [(Sterckx et al., 2015)](http://users.intec.ugent.be/cdvelder/papers/2015/sterckx2015wwwb.pdf)\n    * PositionRank [(Florescu and Caragea, 2017)](http://www.aclweb.org/anthology/P17-1102.pdf)\n    * MultipartiteRank [(Boudin, 2018)](https://arxiv.org/abs/1803.08721)\n* Supervised models\n  * Feature-based models\n    * Kea [(Witten et al., 2005)](https://www.cs.waikato.ac.nz/ml/publications/2005/chap_Witten-et-al_Windows.pdf)\n\n## Model performances \n\nFor comparison purposes, overall results of implemented models on commonly-used benchmark datasets are available in [results](results.md).\nCode for reproducing these experiments are in the [benchmarking](examples/benchmarking-models.ipynb) notebook\n(also available on [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/boudinfl/pke/blob/main/examples/benchmarking-models.ipynb)).\n\n## Citing pke\n\nIf you use `pke`, please cite the following paper:\n\n```\n@InProceedings{boudin:2016:COLINGDEMO,\n  author    = {Boudin, Florian},\n  title     = {pke: an open source python-based keyphrase extraction toolkit},\n  booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations},\n  month     = {December},\n  year      = {2016},\n  address   = {Osaka, Japan},\n  pages     = {69--73},\n  url       = {http://aclweb.org/anthology/C16-2015}\n}\n```\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboudinfl%2Fpke","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fboudinfl%2Fpke","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboudinfl%2Fpke/lists"}