https://github.com/DerwenAI/pytextrank

Python implementation of TextRank algorithms ("textgraphs") for phrase extraction
https://github.com/DerwenAI/pytextrank

graph-algorithms machine-learning natural-language natural-language-processing nlp python spacy spacy-extension summarization textgraphs textrank

Last synced: over 1 year ago
JSON representation

Python implementation of TextRank algorithms ("textgraphs") for phrase extraction

Host: GitHub
URL: https://github.com/DerwenAI/pytextrank
Owner: DerwenAI
License: mit
Created: 2016-10-02T18:39:12.000Z (almost 10 years ago)
Default Branch: main
Last Pushed: 2024-07-16T08:39:07.000Z (about 2 years ago)
Last Synced: 2025-03-26T00:12:48.134Z (over 1 year ago)
Topics: graph-algorithms, machine-learning, natural-language, natural-language-processing, nlp, python, spacy, spacy-extension, summarization, textgraphs, textrank
Language: Python
Homepage: https://derwen.ai/docs/ptr/
Size: 1.61 MB
Stars: 2,172
Watchers: 64
Forks: 333
Open Issues: 17
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: code_of_conduct.md
- Citation: CITATION
- Security: SECURITY.md

Awesome Lists containing this project

awesome-list - PyTextRank - A Python implementation of TextRank as a spaCy pipeline extension, for graph-based natural language work. (Natural Language Processing / General Purpose NLP)
awesome-python-machine-learning-resources - GitHub - 19% open · ⏱️ 27.07.2022): (文本数据和NLP)

README

          # PyTextRank

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4637885.svg)](https://doi.org/10.5281/zenodo.4637885)

![Licence](https://img.shields.io/github/license/DerwenAI/pytextrank)

![Repo size](https://img.shields.io/github/repo-size/DerwenAI/pytextrank)

![GitHub commit activity](https://img.shields.io/github/commit-activity/w/DerwenAI/pytextrank?style=plastic)

[![Checked with mypy](http://www.mypy-lang.org/static/mypy_badge.svg)](http://mypy-lang.org/)

[![security: bandit](https://img.shields.io/badge/security-bandit-yellow.svg)](https://github.com/PyCQA/bandit)

![CI](https://github.com/DerwenAI/pytextrank/workflows/CI/badge.svg)

![downloads](https://img.shields.io/pypi/dm/pytextrank)

![sponsor](https://img.shields.io/github/sponsors/ceteri)

**PyTextRank** is a Python implementation of *TextRank* as a

[spaCy pipeline extension](https://spacy.io/universe/project/spacy-pytextrank),

for graph-based natural language work -- and related knowledge graph practices.

This includes the family of 

[*textgraph*](https://derwen.ai/docs/ptr/glossary/#textgraphs) algorithms:

  - *TextRank* by [[mihalcea04textrank]](https://derwen.ai/docs/ptr/biblio/#mihalcea04textrank)

  - *PositionRank* by [[florescuc17]](https://derwen.ai/docs/ptr/biblio/#florescuc17)

  - *Biased TextRank* by [[kazemi-etal-2020-biased]](https://derwen.ai/docs/ptr/biblio/#kazemi-etal-2020-biased)

  - *TopicRank* by [[bougouin-etal-2013-topicrank]](https://derwen.ai/docs/ptr/biblio/#bougouin-etal-2013-topicrank)

Popular use cases for this library include:

  - *phrase extraction*: get the top-ranked phrases from a text document

  - low-cost *extractive summarization* of a text document

  - help infer concepts from unstructured text into more structured representation

See our full documentation at: 

## Getting Started

See the ["Getting Started"](https://derwen.ai/docs/ptr/start/)

section of the online documentation.

To install from [PyPi](https://pypi.python.org/pypi/pytextrank):

```

python3 -m pip install pytextrank

python3 -m spacy download en_core_web_sm

```

If you work directly from this Git repo, be sure to install the

dependencies as well:

```

python3 -m pip install -r requirements.txt

```

Alternatively, to install dependencies using `conda`:

```

conda env create -f environment.yml

conda activate pytextrank

```

Then to use the library with a simple use case:

```python

import spacy

import pytextrank

# example text

text = "Compatibility of systems of linear constraints over the set of natural numbers. Criteria of compatibility of a system of linear Diophantine equations, strict inequations, and nonstrict inequations are considered. Upper bounds for components of a minimal set of solutions and algorithms of construction of minimal generating sets of solutions for all types of systems are given. These criteria and the corresponding algorithms for constructing a minimal supporting set of solutions can be used in solving all the considered types systems and systems of mixed types."

# load a spaCy model, depending on language, scale, etc.

nlp = spacy.load("en_core_web_sm")

# add PyTextRank to the spaCy pipeline

nlp.add_pipe("textrank")

doc = nlp(text)

# examine the top-ranked phrases in the document

for phrase in doc._.phrases:

    print(phrase.text)

    print(phrase.rank, phrase.count)

    print(phrase.chunks)

```

See the **tutorial notebooks** in the `examples` subdirectory for

sample code and patterns to use in integrating **PyTextTank** with

related libraries in Python:

  Contributing Code

We welcome people getting involved as contributors to this open source

project!

For detailed instructions please see:

[CONTRIBUTING.md](https://github.com/DerwenAI/pytextrank/blob/main/CONTRIBUTING.md)

  Build Instructions



Note: unless you are contributing code and updates,

in most use cases won't need to build this package locally.



Instead, simply install from

[PyPi](https://pypi.python.org/pypi/pytextrank)

or use [Conda](https://docs.conda.io/).

To set up the build environment locally, see the 

["Build Instructions"](https://derwen.ai/docs/ptr/build/)

section of the online documentation.

  Semantic Versioning

Generally speaking the major release number of PyTextRank 

will track with the major release number of the associated spaCy

version.

See:

[CHANGELOG.md](https://github.com/DerwenAI/pytextrank/blob/main/CHANGELOG.md)



## License and Copyright

Source code for **PyTextRank** plus its logo, documentation, and examples

have an [MIT license](https://spdx.org/licenses/MIT.html) which is

succinct and simplifies use in commercial applications.

All materials herein are Copyright © 2016-2024 Derwen, Inc.

## Attribution

Please use the following BibTeX entry for citing **PyTextRank** if you 

use it in your research or software:

```bibtex

@software{PyTextRank,

  author = {Paco Nathan},

  title = {{PyTextRank, a Python implementation of TextRank for phrase extraction and summarization of text documents}},

  year = 2016,

  publisher = {Derwen},

  doi = {10.5281/zenodo.4637885},

  url = {https://github.com/DerwenAI/pytextrank}

}

```

Citations are helpful for the continued development and maintenance of

this library.

For example, see our citations listed on

[Google Scholar](https://scholar.google.com/scholar?q=related:5tl6J4xZlCIJ:scholar.google.com/&scioq=&hl=en&as_sdt=0,5).

## Kudos

Many thanks to our open source [sponsors](https://github.com/sponsors/ceteri);

and to our contributors:

[@ceteri](https://github.com/ceteri),

[@louisguitton](https://github.com/louisguitton),

[@Ankush-Chander](https://github.com/Ankush-Chander),

[@tomaarsen](https://github.com/tomaarsen),

[@CaptXiong](https://github.com/CaptXiong),

[@Lord-V15](https://github.com/Lord-V15),

[@anna-droid-beep](https://github.com/anna-droid-beep),

[@dvsrepo](https://github.com/dvsrepo),

[@clabornd](https://github.com/clabornd),

[@dayalstrub-cma](https://github.com/dayalstrub-cma),

[@kavorite](https://github.com/kavorite),

[@0dB](https://github.com/0dB),

[@htmartin](https://github.com/htmartin),

[@williamsmj](https://github.com/williamsmj/),

[@mattkohl](https://github.com/mattkohl),

[@vanita5](https://github.com/vanita5),

[@HarshGrandeur](https://github.com/HarshGrandeur),

[@mnowotka](https://github.com/mnowotka),

[@kjam](https://github.com/kjam),

[@SaiThejeshwar](https://github.com/SaiThejeshwar),

[@laxatives](https://github.com/laxatives),

[@dimmu](https://github.com/dimmu), 

[@JasonZhangzy1757](https://github.com/JasonZhangzy1757), 

[@jake-aft](https://github.com/jake-aft),

[@junchen1992](https://github.com/junchen1992),

[@shyamcody](https://github.com/shyamcody),

[@chikubee](https://github.com/chikubee);

also to [@mihalcea](https://github.com/mihalcea) who leads outstanding NLP research work,

encouragement from the wonderful folks at Explosion who develop [spaCy](https://github.com/explosion/spaCy),

plus general support from [Derwen, Inc.](https://derwen.ai/)

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=derwenai/pytextrank&type=Date)](https://star-history.com/#derwenai/pytextrank&Date)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/DerwenAI/pytextrank

Awesome Lists containing this project

README