https://github.com/mpadge/texttimetravel

Tools for analysing temporally structured text collections
https://github.com/mpadge/texttimetravel

r text-mining time-series topic-modeling

Last synced: about 2 months ago
JSON representation

Tools for analysing temporally structured text collections

Host: GitHub
URL: https://github.com/mpadge/texttimetravel
Owner: mpadge
Created: 2018-12-11T10:02:13.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2020-11-10T21:01:55.000Z (over 4 years ago)
Last Synced: 2025-02-14T13:23:49.896Z (3 months ago)
Topics: r, text-mining, time-series, topic-modeling
Language: R
Size: 56.6 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.Rmd

Awesome Lists containing this project

README

        [![Build Status](https://travis-ci.org/mpadge/texttimetravel.svg?branch=master)](https://travis-ci.org/mpadge/texttimetravel)

[![Project Status: Active](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)

[![codecov](https://codecov.io/gh/mpadge/texttimetravel/branch/master/graph/badge.svg)](https://codecov.io/gh/mpadge/texttimetravel)

# texttimetravel

Tools for analysing temporally structured text collections, including tools for

reading large sets of texts in (via

[`pdftools`](https://github.com/ropensci/pdftools)), and for time series

analysis of qualitative statistics such as word associations and topic models

(primarily via [`quanteda`](https://quanteda.io) and

[`topicmodels`](https://cran.r-project.org/package=topicmodels).

## Installation

System requirements (for linux):

- libpoppler-cpp-dev

- libgsl-dev 

For other systems, see respective documentation for 

[`pdftools`](https://github.com/ropensci/pdftools)) and

[`topicmodels`](https://cran.r-project.org/package=topicmodels).

```{r, eval=FALSE}

devtools::install_github ('mpadge/texttimetravel')

```

---------------

## Usage

Load packages and a temporally-structured corpus to work with:

```{r load-real, echo = FALSE, message = FALSE}

devtools::load_all (".", export_all = FALSE)

library (quanteda)

dat <- data_corpus_inaugural

```

```{r load-print, eval = FALSE}

library (texttimetravel)

library (quanteda)

dat <- data_corpus_inaugural

#dat <- corpus_reshape (dat, to = "sentences") # if desired

```

([`data_corpus_inaugural`](https://quanteda.io/reference/data_corpus_inaugural.html)

is a sample corpus from [`quanteda`](https://quanteda.io) of inaugural speeches

of US presidents.) Then use [`quanteda`](https://quanteda.io) functions to

convert to desired tokenized form:

```{r tokenize}

tok <- tokens (dat,

               remove_numbers = TRUE,

               remove_punct = TRUE,

               remove_separators = TRUE)

tok <- tokens_remove (tok, stopwords("english"))

```

## keywords

Keyword associations can be extracted with the `ttt_keyness` function, which

relies on the `quanteda::keyness` function, yet simplifies the interface by

allowing keyness statistics to be extracted with a single function call.

```{r keywords}

x <- ttt_keyness (tok, "politic*")

head (x, n = 10) %>% knitr::kable()

x <- ttt_keyness (tok, "school*")

head (x, n = 10) %>% knitr::kable()

```

## topics

The function `ttt_fit_topics` provides a convenient wrapper around the functions

provided by the

[`topicmodels`](https://cran.r-project.org/package=topicmodels) package, and

extends functionality via two additional parameters:

1. `years`, allowing topic models to be fitted only to those portions of a

   corpus corresponding to the specified years;

2. `topic`, allowing models to be fitted around a specified topic phrase.

```{r topics1}

x <- ttt_fit_topics (tok, ntopics = 5)

topicmodels::get_terms(x, 10) %>% knitr::kable()

```

```{r topics2}

x <- ttt_fit_topics (tok, years = 1789:1900, ntopics = 5)

topicmodels::get_terms(x, 10) %>% knitr::kable()

```

```{r topics3}

x <- ttt_fit_topics (tok, topic = "nation", ntopics = 5)

topicmodels::get_terms(x, 10) %>% knitr::kable()

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mpadge/texttimetravel

Awesome Lists containing this project

README