https://github.com/tylerlittlefield/startrek
Tidy Star Trek Transcripts (TNG & DS9)
https://github.com/tylerlittlefield/startrek
r rstats startrek transcripts
Last synced: 10 months ago
JSON representation
Tidy Star Trek Transcripts (TNG & DS9)
- Host: GitHub
- URL: https://github.com/tylerlittlefield/startrek
- Owner: tylerlittlefield
- License: other
- Created: 2019-05-31T01:07:27.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-06-02T01:53:03.000Z (over 6 years ago)
- Last Synced: 2025-04-04T01:32:09.530Z (10 months ago)
- Topics: r, rstats, startrek, transcripts
- Language: R
- Homepage:
- Size: 27.5 MB
- Stars: 4
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
pkg_size <- function(package) {
root <- find.package(package)
rel_paths <- list.files(root, all.files = TRUE, recursive = TRUE)
abs_paths <- file.path(root, rel_paths)
paste0(round(sum(file.info(abs_paths)$size) / 1e6, 2), " MB")
}
```
# startrek 
[](https://travis-ci.org/tyluRp/startrek)
[](https://ci.appveyor.com/project/tyluRp/startrek)
The goal of startrek is to access Star Trek transcripts in a [`data.frame`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/data.frame.html) for easy analysis. All transcripts have been parsed from text files to a [tidy data](http://vita.had.co.nz/papers/tidy-data.html) format.
```{r, echo=FALSE, dpi=300, message=FALSE, warning=FALSE}
library(startrek)
library(tibble)
library(dplyr)
library(tidyr)
library(tidytext)
library(ggplot2)
set.seed(42)
bind_rows(sample(tng, 4), .id = "episode") %>%
unnest_tokens(word, line) %>%
anti_join(get_stopwords()) %>%
inner_join(get_sentiments("bing"), by = "word") %>%
count(episode, index = id %/% 40, sentiment) %>%
spread(sentiment, n, fill = 0) %>%
mutate(
sentiment = positive - negative,
color = ifelse(sentiment <= 0, "a", "b")
) %>%
ggplot(aes(index, sentiment, fill = color)) +
geom_col(show.legend = FALSE) +
geom_hline(yintercept = 0) +
facet_wrap(~episode, ncol = 2, scales = "free_x") +
theme_bw() +
theme(
text = element_text(family = "SFProText-Regular"),
panel.grid = element_blank()
)
```
## Installation
Keep in mind that this is a data package which stores the data locally. There aren't any functions which scrape data from a reliable source. As of now, the size of this package is ~`r pkg_size("startrek")`.
If the size isn't a concern, you can install the development version from GitHub:
``` r
devtools::install_github("tylurp/startrek")
```
Or, download the data to disk from the data folder in this repository.
## Example
To access an episode transcript from The Next Generation series, see the `tng` list:
```{r example, message=FALSE}
library(startrek)
library(tibble)
library(dplyr)
library(tidyr)
tng$`The Inner Light`
```
Or access the entire series and play with the data in creative ways. For example, we might infer character specific episodes by counting the number of lines each character has in each episode:
```{r}
tng %>%
bind_rows(.id = "episode") %>%
select(episode, everything()) %>%
group_by(episode) %>%
count(character, sort = TRUE)
```
The Deep Space Nine series is also available:
```{r}
ds9$Chimera
```
If you want both datasets together, one approach might be to created a nested data frame:
```{r}
all_episodes <- function(.data, series_name) {
.data %>%
bind_rows(.id = "episode") %>%
mutate(series = series_name) %>%
select(series, everything())
}
tng_all <- all_episodes(tng, "TNG")
ds9_all <- all_episodes(ds9, "DS9")
bind_rows(tng_all, ds9_all) %>%
group_by(series, episode) %>%
nest()
```
The columns have been arranged in a specific order to read from left to right or when using `glimpse()`, top to bottom. For example:
```{r}
ds9$Chimera %>%
.[5, ] %>%
glimpse()
```
The raw text files were parsed using the scripts found in the data-raw folder of this repository. Below is a visual explanation:
```{r parse_visual}
ds9$Emissary %>%
.[26, ] %>%
glimpse()
```
```{r echo=FALSE, out.width="550px"}
knitr::include_graphics("man/figures/parse-diagram.png")
```
## Acknowledgements
* Transcripts were taken from [Star Trek Minutiae](http://www.st-minutiae.com/resources/scripts/)