Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/davidadamojr/TextRank
Python implementation of TextRank algorithm for automatic keyword extraction and summarization using Levenshtein distance as relation between text units. This project is based on the paper "TextRank: Bringing Order into Text" by Rada Mihalcea and Paul Tarau. https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf
https://github.com/davidadamojr/TextRank
Last synced: 3 months ago
JSON representation
Python implementation of TextRank algorithm for automatic keyword extraction and summarization using Levenshtein distance as relation between text units. This project is based on the paper "TextRank: Bringing Order into Text" by Rada Mihalcea and Paul Tarau. https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf
- Host: GitHub
- URL: https://github.com/davidadamojr/TextRank
- Owner: davidadamojr
- Created: 2013-11-24T16:56:40.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2022-05-05T15:12:55.000Z (almost 3 years ago)
- Last Synced: 2024-08-02T16:30:32.366Z (6 months ago)
- Language: Python
- Homepage:
- Size: 40 KB
- Stars: 758
- Watchers: 40
- Forks: 224
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-pdf - TextRank
README
## TextRank
This is a python implementation of TextRank for automatic keyword and sentence extraction (summarization) as done in https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf. However, this implementation uses Levenshtein Distance as the relation between text units.This implementation carries out automatic keyword and sentence extraction on 10 articles gotten from http://theonion.com
- 100 word summary
- Number of keywords extracted is relative to the size of the text (a third of the number of nodes in the graph)
- Adjacent keywords in the text are concatenated into keyphrases### Usage
To install the library run the `setup.py` module located in the repository's root directory. Alternatively, if you have access to pip you may install the library directly from github:```
pip install git+https://github.com/davidadamojr/TextRank.git
```Use of the library requires downloading nltk resources. Use the `textrank initialize` command to fetch the required data. Once the data has finished downloading you may execute the following commands against the library:
```
textrank extract_summary
textrank extract_phrases
```### Contributing
Install the library as "editable" within a virtual environment.```
pip install -e .
```### Dependencies
Dependencies are installed automatically with pip but can be installed serparately.* Networkx - https://pypi.python.org/pypi/networkx/
* NLTK 3.0 - https://pypi.python.org/pypi/nltk/3.2.2
* Numpy - https://pypi.python.org/pypi/numpy
* Click - https://pypi.python.org/pypi/click