https://github.com/summanlp/textrank

TextRank implementation for Python 3.
https://github.com/summanlp/textrank

keywords natural-language-processing nlp python summarization text-summarization textrank

Last synced: 28 days ago
JSON representation

TextRank implementation for Python 3.

Host: GitHub
URL: https://github.com/summanlp/textrank
Owner: summanlp
License: mit
Created: 2014-09-23T15:54:07.000Z (over 11 years ago)
Default Branch: master
Last Pushed: 2023-03-28T13:38:58.000Z (almost 3 years ago)
Last Synced: 2025-12-09T08:29:01.266Z (3 months ago)
Topics: keywords, natural-language-processing, nlp, python, summarization, text-summarization, textrank
Language: Python
Homepage: https://pypi.org/project/summa/
Size: 9.79 MB
Stars: 1,268
Watchers: 27
Forks: 258
Open Issues: 23
Metadata Files:
- Readme: README
- License: LICENSE

Awesome Lists containing this project

README

          ================

summa – textrank

================

TextRank implementation for text summarization and keyword extraction in Python 3,

with `optimizations on the similarity function `_.

Features

--------

* Text summarization

* Keyword extraction

Examples

--------

Text summarization::

    >>> text = """Automatic summarization is the process of reducing a text document with a \

    computer program in order to create a summary that retains the most important points \

    of the original document. As the problem of information overload has grown, and as \

    the quantity of data has increased, so has interest in automatic summarization. \

    Technologies that can make a coherent summary take into account variables such as \

    length, writing style and syntax. An example of the use of summarization technology \

    is search engines such as Google. Document summarization is another."""

    >>> from summa import summarizer

    >>> print(summarizer.summarize(text))

    'Automatic summarization is the process of reducing a text document with a computer

    program in order to create a summary that retains the most important points of the

    original document.'

Keyword extraction::

    >>> from summa import keywords

    >>> print(keywords.keywords(text))

    document

    summarization

    writing

    account

Note that line breaks in the input will be used as sentence separators, so be sure

to preprocess your text accordingly.

Installation

------------

This software is `available in PyPI `_.

It depends on `NumPy `_ and `Scipy `_,

two Python libraries for scientific computing.

Pip will automatically install them along with `summa`::

    pip install summa

For a better performance of keyword extraction, install `Pattern `_.

More examples

-------------

- Command-line usage::

    textrank -t FILE

- Define length of the summary as a proportion of the text (also available in :code:`keywords`)::

    >>> from summa.summarizer import summarize

    >>> summarize(text, ratio=0.2)

- Define length of the summary by aproximate number of words (also available in :code:`keywords`)::

    >>> summarize(text, words=50)

- Define input text language (also available in :code:`keywords`).

  The available languages are arabic, danish, dutch, english, finnish, french, german,

  hungarian, italian, norwegian, polish, porter, portuguese, romanian, russian,

  spanish and swedish::

    >>> summarize(text, language='spanish')

- Get results as a list (also available in :code:`keywords`)::

    >>> summarize(text, split=True)

    ['Automatic summarization is the process of reducing a text document with a

    computer program in order to create a summary that retains the most important

    points of the original document.']

References

-------------

- Mihalcea, R., Tarau, P.:

  `"Textrank: Bringing order into texts" `__.

  In: Lin, D., Wu, D. (eds.)

  Proceedings of EMNLP 2004. pp. 404–411. Association for Computational Linguistics,

  Barcelona, Spain. July 2004.

- Barrios, F., López, F., Argerich, L., Wachenchauzer, R.:

  `"Variations of the Similarity Function of TextRank for Automated Summarization" `__.

  Anales de las 44JAIIO.

  Jornadas Argentinas de Informática, Argentine Symposium on Artificial Intelligence, 2015.

To cite this work::

    @article{DBLP:journals/corr/BarriosLAW16,

      author    = {Federico Barrios and

                 Federico L{\'{o}}pez and

                 Luis Argerich and

                 Rosa Wachenchauzer},

      title     = {Variations of the Similarity Function of TextRank for Automated Summarization},

      journal   = {CoRR},

      volume    = {abs/1602.03606},

      year      = {2016},

      url       = {http://arxiv.org/abs/1602.03606},

      archivePrefix = {arXiv},

      eprint    = {1602.03606},

      timestamp = {Wed, 07 Jun 2017 14:40:43 +0200},

      biburl    = {https://dblp.org/rec/bib/journals/corr/BarriosLAW16},

      bibsource = {dblp computer science bibliography, https://dblp.org}

    }

-------------

Summa is open source software released under the `The MIT License (MIT) `_.

Copyright (c) 2014 – now Summa NLP.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/summanlp/textrank

Awesome Lists containing this project

README