https://github.com/hrbrmstr/orangetext

🍊📄 : An #rstats project to keep track of The 🍊 One's speeches
https://github.com/hrbrmstr/orangetext

rstats speeches transcripts

Last synced: over 1 year ago
JSON representation

🍊📄 : An #rstats project to keep track of The 🍊 One's speeches

Host: GitHub
URL: https://github.com/hrbrmstr/orangetext
Owner: hrbrmstr
Created: 2017-01-22T01:26:59.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2017-10-18T15:17:04.000Z (over 8 years ago)
Last Synced: 2025-03-18T01:11:13.709Z (over 1 year ago)
Topics: rstats, speeches, transcripts
Language: R
Size: 354 KB
Stars: 52
Watchers: 9
Forks: 4
Open Issues: 1
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md

Awesome Lists containing this project

README

          ---

output: rmarkdown::github_document

---

`orangetext` is an #rstats project to keep track of The 🍊 One's speeches and include some code snippets for text analysis on them.

Gladly accepting PRs for legit new transcripts and more analysis scripts. 

### Transcripts

```{r echo=FALSE, results='asis'}

cat(sprintf("- `%s`\n", list.files("data/speeches")))

```

### Sample code

```{r message=FALSE}

library(ngram)

library(tidyverse)

library(magrittr)

library(ggalt)

library(hrbrmisc)

library(stringi)

library(rprojroot)

```

Read all the speeches in:

```{r message=FALSE, warning=FALSE}

rprojroot::find_rstudio_root_file() %>%

  file.path("data", "speeches") %>%

  list.files("*.txt", full.names=TRUE) %>%

  map(read_lines) %>%

  flatten_chr() %>%

  stri_enc_toascii() %>%  

  stri_trim_both() %>%

  discard(equals, "") %>%

  paste0(collapse=" ") %>%

  stri_replace_all_regex("[[:space:]]+", " ") %>%

  preprocess(case="lower", remove.punct=TRUE,

             remove.numbers=TRUE, fix.spacing=TRUE) -> texts

```

What have we got:

```{r}

string.summary(texts)

```

The 1-grams are kinda useless but this makes a big tibble for 1:8-grams.

```{r}

map_df(1:8, ~ngram(texts, n=.x) %>%

         get.phrasetable() %>%

         tbl_df() %>%

         rename(words=ngrams) %>%

         mutate(words=stri_trim_both(words)) %>%

         mutate(ngram=sprintf("ngrams: %s", .x))) %>%

  mutate(ngram=factor(ngram, levels=unique(ngram))) %>% 

  select(ngram, freq, prop, words) -> grams

```

```{r}

glimpse(grams)

```

```{r}

filter(grams, ngram=="ngrams: 3")

```

```{r}

filter(grams, ngram=="ngrams: 4")

```

```{r}

filter(grams, ngram=="ngrams: 5")

```

```{r}

filter(grams, ngram=="ngrams: 6")

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hrbrmstr/orangetext

Awesome Lists containing this project

README