https://github.com/naivehobo/textrank
Implementation of TextRank with the option of using pre-trained Word2Vec embeddings as the similarity metric
https://github.com/naivehobo/textrank
cosine cosine-distance cosine-similarity cosine-similarity-scores cosinesimilarity keyword keyword-extraction keyword-extractor keywords keywords-extraction pagerank pagerank-algorithm pagerank-python similarity textrank textrank-algorithm textrank-python word2vec
Last synced: 22 days ago
JSON representation
Implementation of TextRank with the option of using pre-trained Word2Vec embeddings as the similarity metric
- Host: GitHub
- URL: https://github.com/naivehobo/textrank
- Owner: naiveHobo
- License: mit
- Created: 2018-08-07T21:43:16.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2018-08-29T17:42:17.000Z (over 6 years ago)
- Last Synced: 2025-02-25T07:45:27.137Z (3 months ago)
- Topics: cosine, cosine-distance, cosine-similarity, cosine-similarity-scores, cosinesimilarity, keyword, keyword-extraction, keyword-extractor, keywords, keywords-extraction, pagerank, pagerank-algorithm, pagerank-python, similarity, textrank, textrank-algorithm, textrank-python, word2vec
- Language: Python
- Size: 11.7 KB
- Stars: 57
- Watchers: 4
- Forks: 11
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# TextRank
Implementation of TextRank with the option of using cosine similarity of word vectors from pre-trained Word2Vec embeddings as the similarity metric.## Instructions:
The text extract from which keywords are to be extracted can be stored in sample.txt and keywords can be extracted using main.py
```
python3 main.py --data sample.txt
```## Usage:
```
from keyword_extractor import KeywordExtractortext = "sample text goes here"
word2vec = "path to pre-trained Word2Vec embeddings (None if pre-trained embeddings are not available"extractor = KeywordExtractor(word2vec=word2vec)
keywords = extractor.extract(text, ratio=0.2, split=True, scores=True)
for keyword in keywords:
print(keyword)
```## Dependencies:
```
gensim
nltkUse python3
```## Reference:
- Mihalcea, Rada, 1974- & Tarau, Paul. TextRank: Bringing Order into Texts, paper, July 2004; [Stroudsburg, Pennsylvania]. (digital.library.unt.edu/ark:/67531/metadc30962/: accessed August 7, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT College of Engineering.