https://github.com/boudinfl/pke

Python Keyphrase Extraction module
https://github.com/boudinfl/pke

computational-linguistics information-retrieval keyphrase keyphrase-extraction keyword keyword-extraction natural-language-processing python

Last synced: about 1 year ago
JSON representation

Python Keyphrase Extraction module

Host: GitHub
URL: https://github.com/boudinfl/pke
Owner: boudinfl
License: gpl-3.0
Created: 2015-11-13T08:11:45.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2023-07-12T16:18:04.000Z (about 3 years ago)
Last Synced: 2025-05-14T21:53:10.085Z (about 1 year ago)
Topics: computational-linguistics, information-retrieval, keyphrase, keyphrase-extraction, keyword, keyword-extraction, natural-language-processing, python
Language: Python
Homepage:
Size: 82.6 MB
Stars: 1,583
Watchers: 30
Forks: 290
Open Issues: 4
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # `pke` - python keyphrase extraction

`pke` is an **open source** python-based **keyphrase extraction** toolkit. It

provides an end-to-end keyphrase extraction pipeline in which each component can

be easily modified or extended to develop new models. `pke` also allows for 

easy benchmarking of state-of-the-art keyphrase extraction models, and 

ships with supervised models trained on the

[SemEval-2010 dataset](http://aclweb.org/anthology/S10-1004).

![python-package workflow](https://github.com/boudinfl/pke/actions/workflows/python-package.yml/badge.svg)

## Table of Contents

* [Installation](#installation)

* [Minimal example](#minimal-example)

* [Getting started](#getting-started)

* [Implemented models](#implemented-models)

* [Model performances](#model-performances)

* [Citing pke](#citing-pke)

## Installation

To pip install `pke` from github:

```bash

pip install git+https://github.com/boudinfl/pke.git

```

`pke` relies on `spacy` (>= 3.2.3) for text processing and requires [models](https://spacy.io/usage/models) to be installed: 

```bash

# download the english model

python -m spacy download en_core_web_sm

```

## Minimal example

`pke` provides a standardized API for extracting keyphrases from a document.

Start by typing the 5 lines below. For using another model, simply replace

`pke.unsupervised.TopicRank` with another model ([list of implemented models](#implemented-models)).

```python

import pke

# initialize keyphrase extraction model, here TopicRank

extractor = pke.unsupervised.TopicRank()

# load the content of the document, here document is expected to be a simple 

# test string and preprocessing is carried out using spacy

extractor.load_document(input='text', language='en')

# keyphrase candidate selection, in the case of TopicRank: sequences of nouns

# and adjectives (i.e. `(Noun|Adj)*`)

extractor.candidate_selection()

# candidate weighting, in the case of TopicRank: using a random walk algorithm

extractor.candidate_weighting()

# N-best selection, keyphrases contains the 10 highest scored candidates as

# (keyphrase, score) tuples

keyphrases = extractor.get_n_best(n=10)

```

A detailed example is provided in the [`examples/`](examples/) directory.

## Getting started

To get your hands dirty with `pke`, we invite you to try our tutorials out.

|                          Name                   |     Link     |

| ----------------------------------------------  |  ----------  |

| Getting started with `pke` and keyphrase extraction | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/keyphrasification/hands-on-with-pke/blob/main/part-1-graph-based-keyphrase-extraction.ipynb) |

| Model parameterization                          | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/keyphrasification/hands-on-with-pke/blob/main/part-2-parameterization.ipynb) |

| Benchmarking models                             | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/keyphrasification/hands-on-with-pke/blob/main/part-3-benchmarking-models.ipynb) |

## Implemented models

`pke` currently implements the following keyphrase extraction models:

* Unsupervised models

  * Statistical models

    * FirstPhrases

    * TfIdf

    * KPMiner [(El-Beltagy and Rafea, 2010)](http://www.aclweb.org/anthology/S10-1041.pdf)

    * YAKE [(Campos et al., 2020)](https://doi.org/10.1016/j.ins.2019.09.013)

  * Graph-based models

    * TextRank [(Mihalcea and Tarau, 2004)](http://www.aclweb.org/anthology/W04-3252.pdf)

    * SingleRank  [(Wan and Xiao, 2008)](http://www.aclweb.org/anthology/C08-1122.pdf)

    * TopicRank [(Bougouin et al., 2013)](http://aclweb.org/anthology/I13-1062.pdf)

    * TopicalPageRank [(Sterckx et al., 2015)](http://users.intec.ugent.be/cdvelder/papers/2015/sterckx2015wwwb.pdf)

    * PositionRank [(Florescu and Caragea, 2017)](http://www.aclweb.org/anthology/P17-1102.pdf)

    * MultipartiteRank [(Boudin, 2018)](https://arxiv.org/abs/1803.08721)

* Supervised models

  * Feature-based models

    * Kea [(Witten et al., 2005)](https://www.cs.waikato.ac.nz/ml/publications/2005/chap_Witten-et-al_Windows.pdf)

## Model performances 

For comparison purposes, overall results of implemented models on commonly-used benchmark datasets are available in [results](results.md).

Code for reproducing these experiments are in the [benchmarking](examples/benchmarking-models.ipynb) notebook

(also available on [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/boudinfl/pke/blob/main/examples/benchmarking-models.ipynb)).

## Citing pke

If you use `pke`, please cite the following paper:

```

@InProceedings{boudin:2016:COLINGDEMO,

  author    = {Boudin, Florian},

  title     = {pke: an open source python-based keyphrase extraction toolkit},

  booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations},

  month     = {December},

  year      = {2016},

  address   = {Osaka, Japan},

  pages     = {69--73},

  url       = {http://aclweb.org/anthology/C16-2015}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/boudinfl/pke

Awesome Lists containing this project

README