https://github.com/hrbrmstr/orangetext
🍊📄 : An #rstats project to keep track of The 🍊 One's speeches
https://github.com/hrbrmstr/orangetext
rstats speeches transcripts
Last synced: 11 months ago
JSON representation
🍊📄 : An #rstats project to keep track of The 🍊 One's speeches
- Host: GitHub
- URL: https://github.com/hrbrmstr/orangetext
- Owner: hrbrmstr
- Created: 2017-01-22T01:26:59.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2017-10-18T15:17:04.000Z (over 8 years ago)
- Last Synced: 2025-03-18T01:11:13.709Z (11 months ago)
- Topics: rstats, speeches, transcripts
- Language: R
- Size: 354 KB
- Stars: 52
- Watchers: 9
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
Awesome Lists containing this project
README
---
output: rmarkdown::github_document
---
`orangetext` is an #rstats project to keep track of The 🍊 One's speeches and include some code snippets for text analysis on them.
Gladly accepting PRs for legit new transcripts and more analysis scripts.
### Transcripts
```{r echo=FALSE, results='asis'}
cat(sprintf("- `%s`\n", list.files("data/speeches")))
```
### Sample code
```{r message=FALSE}
library(ngram)
library(tidyverse)
library(magrittr)
library(ggalt)
library(hrbrmisc)
library(stringi)
library(rprojroot)
```
Read all the speeches in:
```{r message=FALSE, warning=FALSE}
rprojroot::find_rstudio_root_file() %>%
file.path("data", "speeches") %>%
list.files("*.txt", full.names=TRUE) %>%
map(read_lines) %>%
flatten_chr() %>%
stri_enc_toascii() %>%
stri_trim_both() %>%
discard(equals, "") %>%
paste0(collapse=" ") %>%
stri_replace_all_regex("[[:space:]]+", " ") %>%
preprocess(case="lower", remove.punct=TRUE,
remove.numbers=TRUE, fix.spacing=TRUE) -> texts
```
What have we got:
```{r}
string.summary(texts)
```
The 1-grams are kinda useless but this makes a big tibble for 1:8-grams.
```{r}
map_df(1:8, ~ngram(texts, n=.x) %>%
get.phrasetable() %>%
tbl_df() %>%
rename(words=ngrams) %>%
mutate(words=stri_trim_both(words)) %>%
mutate(ngram=sprintf("ngrams: %s", .x))) %>%
mutate(ngram=factor(ngram, levels=unique(ngram))) %>%
select(ngram, freq, prop, words) -> grams
```
```{r}
glimpse(grams)
```
```{r}
filter(grams, ngram=="ngrams: 3")
```
```{r}
filter(grams, ngram=="ngrams: 4")
```
```{r}
filter(grams, ngram=="ngrams: 5")
```
```{r}
filter(grams, ngram=="ngrams: 6")
```