https://github.com/trinker/textcorpus

Last synced: 11 months ago
JSON representation

Host: GitHub
URL: https://github.com/trinker/textcorpus
Owner: trinker
Created: 2017-03-04T03:55:48.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2017-04-16T14:23:19.000Z (about 9 years ago)
Last Synced: 2025-04-08T16:13:59.289Z (about 1 year ago)
Language: R
Size: 7.97 MB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS

Awesome Lists containing this project

README

          ---

title: "textcorpus"

date: "`r format(Sys.time(), '%d %B, %Y')`"

output:

  md_document:

    toc: true      

---

```{r, echo=FALSE}

desc <- suppressWarnings(readLines("DESCRIPTION"))

regex <- "(^Version:\\s+)(\\d+\\.\\d+\\.\\d+)"

loc <- grep(regex, desc)

ver <- gsub(regex, "\\2", desc[loc])

verbadge <- sprintf('', ver, ver)

````

```{r, echo=FALSE, message=FALSE, warning=FALSE}

library(knitr)

knit_hooks$set(htmlcap = function(before, options, envir) {

  if(!before) {

    paste('
',options$htmlcap,"",sep="")

    }

    })

knitr::opts_knit$set(self.contained = TRUE, cache = FALSE)

knitr::opts_chunk$set(fig.path = "tools/figure/")

```

[![Build Status](https://travis-ci.org/trinker/textcorpus.svg?branch=master)](https://travis-ci.org/trinker/textcorpus)

[![Coverage Status](https://coveralls.io/repos/trinker/textcorpus/badge.svg?branch=master)](https://coveralls.io/r/trinker/textcorpus?branch=master)

`r verbadge`

**textcorpus** is collection of text courpus datasets.  The package also contains tools to enable easy community contributions to the package.  The underying premise is that the speech level data is stored with meta data as a list of two tibble data frames with a common key column.

# Installation

To download the development version of **textcorpus**:

Download the [zip ball](https://github.com/trinker/textcorpus/zipball/master) or [tar ball](https://github.com/trinker/textcorpus/tarball/master), decompress and run `R CMD INSTALL` on it, or use the **pacman** package to install the development version:

```r

if (!require("pacman")) install.packages("pacman")

pacman::p_load_gh("trinker/textcorpus")

```

# Data

```{r, echo=FALSE}

pacman::p_load(pander)

pander(description[-4], style = "grid", split.table = Inf, justify = c(rep('left', 4), 'right'))

```

# Demonstration

## Joining Corpus and Meta Data

**dplyr** akes joining the corpus and meta data easy.

```{r}

pacman::p_load(tidyverse, sentimentr, formality, readability)

pacman::p_load_current_gh('trinker/textcorpus')

nixon_tapes

dat <- nixon_tapes$corpus %>%

    dplyr::left_join(nixon_tapes$meta, by = 'id')

dat

```

## Text Scores

Here we calculate formality, sentiment, and readability measures.  An additional call to **dplyr**'s `left_jon` with a `Reduce` makes it easy to merge the various score frames into one frame.

```{r}

n_formality <- dat %>%

    filter(author == "Nixon") %>%

    with(formality(text, list(author, id, date)))

n_sentiment <- dat %>%

    filter(author == "Nixon") %>%

    with(sentiment_by(text, list(author, id, date)))

n_readability <- dat %>%

    filter(author == "Nixon") %>%

    with(readability(text, list(author, id, date)))

stats_dat <- list(n_formality, n_sentiment, n_readability) %>%

    Reduce(function(x, y) left_join(x, y, by=c("author", "id", "date")), .)

```

## Plotting the Text Scores Across Time

```{r}

stats_dat %>%

    select(date, F, ave_sentiment, Average_Grade_Level) %>%

    rename(Formality = F, Sentiment = ave_sentiment, Readbiltiy = Average_Grade_Level) %>%

    gather(Measure, Score, -date) %>%

    mutate(Date = as.factor(date), Date2 = as.numeric(Date)) %>%

    ggplot(aes(x = Date2, y = Score)) +

        geom_point() +

        geom_smooth(span = 0.4, fill = NA) +

        facet_wrap(~Measure, ncol = 1, scales = 'free_y') 

```

# Contact

You are welcome to:    

- submit suggestions and bug-reports at:     

- send a pull request on:     

- compose a friendly e-mail to:

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/trinker/textcorpus

Awesome Lists containing this project

README