An open API service indexing awesome lists of open source software.

https://github.com/summanlp/textrank

TextRank implementation for Python 3.
https://github.com/summanlp/textrank

keywords natural-language-processing nlp python summarization text-summarization textrank

Last synced: 3 days ago
JSON representation

TextRank implementation for Python 3.

Awesome Lists containing this project

README

          

================
summa – textrank
================

TextRank implementation for text summarization and keyword extraction in Python 3,
with `optimizations on the similarity function `_.

Features
--------

* Text summarization
* Keyword extraction

Examples
--------

Text summarization::

>>> text = """Automatic summarization is the process of reducing a text document with a \
computer program in order to create a summary that retains the most important points \
of the original document. As the problem of information overload has grown, and as \
the quantity of data has increased, so has interest in automatic summarization. \
Technologies that can make a coherent summary take into account variables such as \
length, writing style and syntax. An example of the use of summarization technology \
is search engines such as Google. Document summarization is another."""

>>> from summa import summarizer
>>> print(summarizer.summarize(text))
'Automatic summarization is the process of reducing a text document with a computer
program in order to create a summary that retains the most important points of the
original document.'

Keyword extraction::

>>> from summa import keywords
>>> print(keywords.keywords(text))
document
summarization
writing
account

Note that line breaks in the input will be used as sentence separators, so be sure
to preprocess your text accordingly.

Installation
------------

This software is `available in PyPI `_.
It depends on `NumPy `_ and `Scipy `_,
two Python libraries for scientific computing.
Pip will automatically install them along with `summa`::

pip install summa

For a better performance of keyword extraction, install `Pattern `_.

More examples
-------------

- Command-line usage::

textrank -t FILE

- Define length of the summary as a proportion of the text (also available in :code:`keywords`)::

>>> from summa.summarizer import summarize
>>> summarize(text, ratio=0.2)

- Define length of the summary by aproximate number of words (also available in :code:`keywords`)::

>>> summarize(text, words=50)

- Define input text language (also available in :code:`keywords`).

The available languages are arabic, danish, dutch, english, finnish, french, german,
hungarian, italian, norwegian, polish, porter, portuguese, romanian, russian,
spanish and swedish::

>>> summarize(text, language='spanish')

- Get results as a list (also available in :code:`keywords`)::

>>> summarize(text, split=True)
['Automatic summarization is the process of reducing a text document with a
computer program in order to create a summary that retains the most important
points of the original document.']

References
-------------
- Mihalcea, R., Tarau, P.:
`"Textrank: Bringing order into texts" `__.
In: Lin, D., Wu, D. (eds.)
Proceedings of EMNLP 2004. pp. 404–411. Association for Computational Linguistics,
Barcelona, Spain. July 2004.

- Barrios, F., López, F., Argerich, L., Wachenchauzer, R.:
`"Variations of the Similarity Function of TextRank for Automated Summarization" `__.
Anales de las 44JAIIO.
Jornadas Argentinas de Informática, Argentine Symposium on Artificial Intelligence, 2015.

To cite this work::

@article{DBLP:journals/corr/BarriosLAW16,
author = {Federico Barrios and
Federico L{\'{o}}pez and
Luis Argerich and
Rosa Wachenchauzer},
title = {Variations of the Similarity Function of TextRank for Automated Summarization},
journal = {CoRR},
volume = {abs/1602.03606},
year = {2016},
url = {http://arxiv.org/abs/1602.03606},
archivePrefix = {arXiv},
eprint = {1602.03606},
timestamp = {Wed, 07 Jun 2017 14:40:43 +0200},
biburl = {https://dblp.org/rec/bib/journals/corr/BarriosLAW16},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

-------------

Summa is open source software released under the `The MIT License (MIT) `_.

Copyright (c) 2014 – now Summa NLP.