An open API service indexing awesome lists of open source software.

https://github.com/hrbrmstr/orangetext

🍊📄 : An #rstats project to keep track of The 🍊 One's speeches
https://github.com/hrbrmstr/orangetext

rstats speeches transcripts

Last synced: 11 months ago
JSON representation

🍊📄 : An #rstats project to keep track of The 🍊 One's speeches

Awesome Lists containing this project

README

          

---
output: rmarkdown::github_document
---
`orangetext` is an #rstats project to keep track of The 🍊 One's speeches and include some code snippets for text analysis on them.

Gladly accepting PRs for legit new transcripts and more analysis scripts.

### Transcripts

```{r echo=FALSE, results='asis'}
cat(sprintf("- `%s`\n", list.files("data/speeches")))
```

### Sample code

```{r message=FALSE}
library(ngram)
library(tidyverse)
library(magrittr)
library(ggalt)
library(hrbrmisc)
library(stringi)
library(rprojroot)
```

Read all the speeches in:

```{r message=FALSE, warning=FALSE}
rprojroot::find_rstudio_root_file() %>%
file.path("data", "speeches") %>%
list.files("*.txt", full.names=TRUE) %>%
map(read_lines) %>%
flatten_chr() %>%
stri_enc_toascii() %>%
stri_trim_both() %>%
discard(equals, "") %>%
paste0(collapse=" ") %>%
stri_replace_all_regex("[[:space:]]+", " ") %>%
preprocess(case="lower", remove.punct=TRUE,
remove.numbers=TRUE, fix.spacing=TRUE) -> texts
```

What have we got:

```{r}
string.summary(texts)
```

The 1-grams are kinda useless but this makes a big tibble for 1:8-grams.

```{r}
map_df(1:8, ~ngram(texts, n=.x) %>%
get.phrasetable() %>%
tbl_df() %>%
rename(words=ngrams) %>%
mutate(words=stri_trim_both(words)) %>%
mutate(ngram=sprintf("ngrams: %s", .x))) %>%
mutate(ngram=factor(ngram, levels=unique(ngram))) %>%
select(ngram, freq, prop, words) -> grams

```

```{r}
glimpse(grams)
```

```{r}
filter(grams, ngram=="ngrams: 3")
```

```{r}
filter(grams, ngram=="ngrams: 4")
```

```{r}
filter(grams, ngram=="ngrams: 5")
```

```{r}
filter(grams, ngram=="ngrams: 6")
```