https://github.com/summanlp/textrank
TextRank implementation for Python 3.
https://github.com/summanlp/textrank
keywords natural-language-processing nlp python summarization text-summarization textrank
Last synced: 3 days ago
JSON representation
TextRank implementation for Python 3.
- Host: GitHub
- URL: https://github.com/summanlp/textrank
- Owner: summanlp
- License: mit
- Created: 2014-09-23T15:54:07.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2023-03-28T13:38:58.000Z (almost 3 years ago)
- Last Synced: 2025-12-09T08:29:01.266Z (about 2 months ago)
- Topics: keywords, natural-language-processing, nlp, python, summarization, text-summarization, textrank
- Language: Python
- Homepage: https://pypi.org/project/summa/
- Size: 9.79 MB
- Stars: 1,268
- Watchers: 27
- Forks: 258
- Open Issues: 23
-
Metadata Files:
- Readme: README
- License: LICENSE
Awesome Lists containing this project
README
================
summa – textrank
================
TextRank implementation for text summarization and keyword extraction in Python 3,
with `optimizations on the similarity function `_.
Features
--------
* Text summarization
* Keyword extraction
Examples
--------
Text summarization::
>>> text = """Automatic summarization is the process of reducing a text document with a \
computer program in order to create a summary that retains the most important points \
of the original document. As the problem of information overload has grown, and as \
the quantity of data has increased, so has interest in automatic summarization. \
Technologies that can make a coherent summary take into account variables such as \
length, writing style and syntax. An example of the use of summarization technology \
is search engines such as Google. Document summarization is another."""
>>> from summa import summarizer
>>> print(summarizer.summarize(text))
'Automatic summarization is the process of reducing a text document with a computer
program in order to create a summary that retains the most important points of the
original document.'
Keyword extraction::
>>> from summa import keywords
>>> print(keywords.keywords(text))
document
summarization
writing
account
Note that line breaks in the input will be used as sentence separators, so be sure
to preprocess your text accordingly.
Installation
------------
This software is `available in PyPI `_.
It depends on `NumPy `_ and `Scipy `_,
two Python libraries for scientific computing.
Pip will automatically install them along with `summa`::
pip install summa
For a better performance of keyword extraction, install `Pattern `_.
More examples
-------------
- Command-line usage::
textrank -t FILE
- Define length of the summary as a proportion of the text (also available in :code:`keywords`)::
>>> from summa.summarizer import summarize
>>> summarize(text, ratio=0.2)
- Define length of the summary by aproximate number of words (also available in :code:`keywords`)::
>>> summarize(text, words=50)
- Define input text language (also available in :code:`keywords`).
The available languages are arabic, danish, dutch, english, finnish, french, german,
hungarian, italian, norwegian, polish, porter, portuguese, romanian, russian,
spanish and swedish::
>>> summarize(text, language='spanish')
- Get results as a list (also available in :code:`keywords`)::
>>> summarize(text, split=True)
['Automatic summarization is the process of reducing a text document with a
computer program in order to create a summary that retains the most important
points of the original document.']
References
-------------
- Mihalcea, R., Tarau, P.:
`"Textrank: Bringing order into texts" `__.
In: Lin, D., Wu, D. (eds.)
Proceedings of EMNLP 2004. pp. 404–411. Association for Computational Linguistics,
Barcelona, Spain. July 2004.
- Barrios, F., López, F., Argerich, L., Wachenchauzer, R.:
`"Variations of the Similarity Function of TextRank for Automated Summarization" `__.
Anales de las 44JAIIO.
Jornadas Argentinas de Informática, Argentine Symposium on Artificial Intelligence, 2015.
To cite this work::
@article{DBLP:journals/corr/BarriosLAW16,
author = {Federico Barrios and
Federico L{\'{o}}pez and
Luis Argerich and
Rosa Wachenchauzer},
title = {Variations of the Similarity Function of TextRank for Automated Summarization},
journal = {CoRR},
volume = {abs/1602.03606},
year = {2016},
url = {http://arxiv.org/abs/1602.03606},
archivePrefix = {arXiv},
eprint = {1602.03606},
timestamp = {Wed, 07 Jun 2017 14:40:43 +0200},
biburl = {https://dblp.org/rec/bib/journals/corr/BarriosLAW16},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
-------------
Summa is open source software released under the `The MIT License (MIT) `_.
Copyright (c) 2014 – now Summa NLP.