https://github.com/mpadge/texttimetravel
Tools for analysing temporally structured text collections
https://github.com/mpadge/texttimetravel
r text-mining time-series topic-modeling
Last synced: about 2 months ago
JSON representation
Tools for analysing temporally structured text collections
- Host: GitHub
- URL: https://github.com/mpadge/texttimetravel
- Owner: mpadge
- Created: 2018-12-11T10:02:13.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-11-10T21:01:55.000Z (over 4 years ago)
- Last Synced: 2025-02-14T13:23:49.896Z (3 months ago)
- Topics: r, text-mining, time-series, topic-modeling
- Language: R
- Size: 56.6 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
README
[](https://travis-ci.org/mpadge/texttimetravel)
[](http://www.repostatus.org/#active)
[](https://codecov.io/gh/mpadge/texttimetravel)# texttimetravel
Tools for analysing temporally structured text collections, including tools for
reading large sets of texts in (via
[`pdftools`](https://github.com/ropensci/pdftools)), and for time series
analysis of qualitative statistics such as word associations and topic models
(primarily via [`quanteda`](https://quanteda.io) and
[`topicmodels`](https://cran.r-project.org/package=topicmodels).## Installation
System requirements (for linux):
- libpoppler-cpp-dev
- libgsl-devFor other systems, see respective documentation for
[`pdftools`](https://github.com/ropensci/pdftools)) and
[`topicmodels`](https://cran.r-project.org/package=topicmodels).```{r, eval=FALSE}
devtools::install_github ('mpadge/texttimetravel')
```---------------
## Usage
Load packages and a temporally-structured corpus to work with:
```{r load-real, echo = FALSE, message = FALSE}
devtools::load_all (".", export_all = FALSE)
library (quanteda)
dat <- data_corpus_inaugural
```
```{r load-print, eval = FALSE}
library (texttimetravel)
library (quanteda)
dat <- data_corpus_inaugural
#dat <- corpus_reshape (dat, to = "sentences") # if desired
```
([`data_corpus_inaugural`](https://quanteda.io/reference/data_corpus_inaugural.html)
is a sample corpus from [`quanteda`](https://quanteda.io) of inaugural speeches
of US presidents.) Then use [`quanteda`](https://quanteda.io) functions to
convert to desired tokenized form:
```{r tokenize}
tok <- tokens (dat,
remove_numbers = TRUE,
remove_punct = TRUE,
remove_separators = TRUE)
tok <- tokens_remove (tok, stopwords("english"))
```## keywords
Keyword associations can be extracted with the `ttt_keyness` function, which
relies on the `quanteda::keyness` function, yet simplifies the interface by
allowing keyness statistics to be extracted with a single function call.```{r keywords}
x <- ttt_keyness (tok, "politic*")
head (x, n = 10) %>% knitr::kable()
x <- ttt_keyness (tok, "school*")
head (x, n = 10) %>% knitr::kable()
```## topics
The function `ttt_fit_topics` provides a convenient wrapper around the functions
provided by the
[`topicmodels`](https://cran.r-project.org/package=topicmodels) package, and
extends functionality via two additional parameters:1. `years`, allowing topic models to be fitted only to those portions of a
corpus corresponding to the specified years;
2. `topic`, allowing models to be fitted around a specified topic phrase.```{r topics1}
x <- ttt_fit_topics (tok, ntopics = 5)
topicmodels::get_terms(x, 10) %>% knitr::kable()
```
```{r topics2}
x <- ttt_fit_topics (tok, years = 1789:1900, ntopics = 5)
topicmodels::get_terms(x, 10) %>% knitr::kable()
```
```{r topics3}
x <- ttt_fit_topics (tok, topic = "nation", ntopics = 5)
topicmodels::get_terms(x, 10) %>% knitr::kable()
```