Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/adamspannbauer/lexrankr
Extractive Text Summariztion with lexRankr (an R package implementing the LexRank algorithm)
https://github.com/adamspannbauer/lexrankr
lexrank lexrank-algorithm nlp r r-package rstat
Last synced: 4 months ago
JSON representation
Extractive Text Summariztion with lexRankr (an R package implementing the LexRank algorithm)
- Host: GitHub
- URL: https://github.com/adamspannbauer/lexrankr
- Owner: AdamSpannbauer
- License: other
- Created: 2016-07-28T14:40:34.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2022-12-05T17:02:47.000Z (about 2 years ago)
- Last Synced: 2024-10-12T16:26:44.497Z (4 months ago)
- Topics: lexrank, lexrank-algorithm, nlp, r, r-package, rstat
- Language: R
- Size: 745 KB
- Stars: 21
- Watchers: 5
- Forks: 4
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# lexRankr: Extractive Text Summariztion in R
[data:image/s3,"s3://crabby-images/390ed/390ed779e211f5479900eb17d994313a0b0594ae" alt="Build Status"](https://travis-ci.org/AdamSpannbauer/lexRankr) [data:image/s3,"s3://crabby-images/8acaf/8acaf37dc2b0eb47fdb4bb5b11b5a7c92db072db" alt="AppVeyor Build Status"](https://ci.appveyor.com/project/AdamSpannbauer/lexRankr) [data:image/s3,"s3://crabby-images/ace2e/ace2e5e6b93d1d4e8ff7ee63af773ef58a68f40b" alt="Coverage Status"](https://codecov.io/github/AdamSpannbauer/lexRankr?branch=master) [data:image/s3,"s3://crabby-images/899ba/899ba2e2bd08de46cfdd909068e4b6d92db0ea55" alt="CRAN\_Status\_Badge"](https://CRAN.R-project.org/package=lexRankr) data:image/s3,"s3://crabby-images/295e8/295e82c79491d91b2c408778d3366bb52da6e0e2" alt="" [data:image/s3,"s3://crabby-images/1e470/1e4706f33f99dac9ebc894a05a220bcc6174c109" alt="Last Commit"](https://github.com/AdamSpannbauer/lexRankr/commits/master)
## Installation
```r
##install from CRAN
install.packages("lexRankr")#install from this github repo
devtools::install_github("AdamSpannbauer/lexRankr")
```## Overview
lexRankr is an R implementation of the LexRank algorithm discussed by Güneş Erkan & Dragomir R. Radev in [LexRank: Graph-based Lexical Centrality as Salience in Text Summarization](http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html). LexRank is designed to summarize a cluster of documents by proposing which sentences subsume the most information in that particular set of documents. The algorithm may not perform well on a set of unclustered/unrelated set of documents. As the white paper's title suggests, the sentences are ranked based on their centrality in a graph. The graph is built upon the pairwise similarities of the sentences (where similarity is measured with a modified idf cosine similarity function). The paper describes multiple ways to calculate centrality and these options are available in the R package. The sentences can be ranked according to their degree of centrality or by using the Page Rank algorithm (both of these methods require setting a minimum similarity threshold for a sentence pair to be included in the graph). A third variation is Continuous LexRank which does not require a minimum similarity threshold, but rather uses a weighted graph of sentences as the input to Page Rank.*note: the lexrank algorithm is designed to work on a cluster of documents. LexRank is built on the idea that a cluster of docs will focus on similar topics*
*note: pairwise sentence similarity is calculated for the entire set of documents passed to the function. This can be a computationally instensive process (esp with a large set of documents)*
## Basic Usage
```r
library(lexRankr)
library(dplyr)df <- tibble(doc_id = 1:3,
text = c("Testing the system. Second sentence for you.",
"System testing the tidy documents df.",
"Documents will be parsed and lexranked."))
df %>%
unnest_sentences(sents, text) %>%
bind_lexrank(sents, doc_id, level = 'sentences') %>%
arrange(desc(lexrank))
```## More Examples
* [Vignette](https://CRAN.R-project.org/package=lexRankr/vignettes/Analyzing_Twitter_with_LexRankr.html)
* [Summarizing Web Articles with R using lexRankr](https://adamspannbauer.github.io/2017/12/17/summarizing-web-articles-with-r/)
* [lexRankr & Twitter: find a user's most representative tweets](https://adamspannbauer.github.io/2017/03/09/lexrankr--twitter-find-a-users-most-representative-tweets/)