Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/batpigandme/pivot-to-video
📺 OK, not *really* this is more about `tidyr::pivot_*()` functions…
https://github.com/batpigandme/pivot-to-video
Last synced: 7 days ago
JSON representation
📺 OK, not *really* this is more about `tidyr::pivot_*()` functions…
- Host: GitHub
- URL: https://github.com/batpigandme/pivot-to-video
- Owner: batpigandme
- Created: 2019-09-16T15:22:56.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-10-30T13:35:36.000Z (about 5 years ago)
- Last Synced: 2024-11-09T21:43:32.313Z (2 months ago)
- Language: HTML
- Size: 6.04 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
README
---
author: Mara Averick
date: '`r format(Sys.Date())`'
title: '`r emo::ji("tv")` Pivot to Video'
output:
html_document:
keep_md: true
---```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.path = 'fig/', dev = 'png', dpi = 144, fig.retina = 2)
````r emo::ji("television")` OK, not _really_!
This is actually about `tidyr::pivot_*()` functions, which you can learn all
about in the new tidyr version 1.0.0 [Pivoting
vignette](https://tidyr.tidyverse.org/articles/pivot.html)!But, I *will* be using some nifty TV-related data. So, I'm sticking
with the name.```{r message=FALSE, warning=FALSE}
library(tidyverse)
```## `r emo::ji("burger")` Bob's Burgers: A Belcher Family survey
```{r message=FALSE, warning=FALSE}
sheet <- googlesheets::gs_title("bobs_burgers_survey_results")bobs_ws <- googlesheets::gs_ws_ls(sheet)
``````{r}
raw_dat <- sheet %>%
googlesheets::gs_read(ws = glue::glue("{bobs_ws}"))
``````{r prelim-results}
belcher_results <- tibble::rowid_to_column(raw_dat, "resp_id") %>%
dplyr::rename("response" = `Members of the Belcher family with whom I identify (select all that apply)`) %>%
dplyr::select(-Timestamp)belcher_results
```Because I used a [Google Form](https://docs.google.com/forms/d/e/1FAIpQLScYLAkzDzEOWvXrn601Vt79S6DsKLkv12NUsqbo72j1fmapWA/viewform) to collect this data, I don't have to worry about
order of names, since they come out the same every time.```{r agg-results}
agg_results <- belcher_results %>%
dplyr::group_by(response) %>%
dplyr::summarise(total = n()) %>%
dplyr::arrange(desc(total))agg_results
```### `tidyr::separate_rows()`
Let's make them long with
[`tidyr::separate_rows()`](https://tidyr.tidyverse.org/dev/reference/separate_rows.html).
I'm also adding a numeric variable, `identify` (short for "character(s) with
whom I identify"), and ensuring that respondent IDs don't get erroneously
treated as numeric, by converting them to characters.```{r results-prep}
belcher_results <- belcher_results %>%
tidyr::separate_rows(response) %>%
dplyr::mutate(identify = 1,
resp_id = as.character(resp_id))head(belcher_results)
```### `tidyr::pivot_wider()`
Now we'll take one of the "new" tidyr verbs for a spin, [`pivot_wider()`](https://tidyr.tidyverse.org/dev/reference/pivot_wider.html).
Rather than fill things out with a bunch of `NA`s, we'll prepare our data
to use with the `UpSetR` package by turning it into binaries, and ditch the
respondent ID in the end.```{r}
binary_tib <- belcher_results %>%
tidyr::pivot_wider(
names_from = response,
values_from = identify,
values_fill = list(identify = 0)
) %>%
dplyr::select(-resp_id)head(binary_tib)
```### `UpSetR::upset()`
I *highly* recommend Paul Campbell's
[code-through](https://www.cultureofinsight.com/blog/2018/01/25/2018-01-25-visualising-twitter-follower-overlap/)
using [UpSetR](https://github.com/hms-dbmi/UpSetR), which gave me (among other things) the pro tip that `upset()` does
*not* like tibbles (hence the `as.data.frame()` at the end).```{r}
binary_df <- as.data.frame(binary_tib)
``````{r basic-upset}
UpSetR::upset(binary_df, nsets = 5, order.by = "freq")
```## `r emo::ji("movie")` I have seen this movie...
Let's take a look at another dataset I collected with a quick survey, this one asking people whether they had or had not seen a given movie.
```{r sheet-chunk-2}
sheet <- googlesheets::gs_title("seen_this_movie")movie_ws <- googlesheets::gs_ws_ls(sheet)
raw_dat <- sheet %>%
googlesheets::gs_read(ws = glue::glue("{movie_ws}"))
```First I'll use [`tibble::rowid_to_column()`](https://tibble.tidyverse.org/reference/rownames.html) to keep track of which user said what (and, again, make sure those are stored as characters rather than numeric, so I don't accidentally analyze it the wrong way).
I still don't care about the timestamp, so I'll get everything _but_ that using `dplyr::select(-Timestamp)`.
The variable names here have a whole bunch of annoying characters (e.g. spaces, exclamation points). So, I'm going to use Sam Firke's [janitor](https://sfirke.github.io/janitor/) package — specifically the [`janitor::clean_names()`](https://sfirke.github.io/janitor/reference/clean_names.html) function — to take care of some of the grunt work for me there.
I'm also going to rename `my_age_is` to `age`, just because it's annoying.
```{r wide-movie-results}
movie_results <- tibble::rowid_to_column(raw_dat, "resp_id") %>%
dplyr::mutate(resp_id = as.character(resp_id)) %>%
dplyr::select(-Timestamp) %>%
janitor::clean_names() %>%
dplyr::rename("age" = my_age_is)head(movie_results)
```Since this dataset is wide, let's also take a gander at it using `glimpse()`:
```{r glimpse-wide-movies}
glimpse(movie_results)
```### [`tidyr::pivot_longer()`](https://tidyr.tidyverse.org/reference/pivot_longer.html)
Since I only want to elongate the movies, and all of those columns are logical, I'll first select which columns I want by using `select_if()` and `is.logical()` as the predicate function.
```{r}
logicols <- select_if(movie_results, is.logical) %>%
colnames()movie_long <- movie_results %>%
pivot_longer(
cols = one_of(logicols),
names_to = "movie",
values_to = "seen"
)head(movie_long)
```
Aside: A nice little trick for recoding `TRUE`/`FALSE` as `0` and `1`, just use `as.numeric()`.```{r}
movie_bin <- movie_long %>%
mutate(seen_num = as.numeric(seen))movie_bin
``````{r}
lil_seen <- movie_bin %>%
select(resp_id, movie, seen_num)movie_bin %>%
group_by(movie) %>%
summarise(pct_seen = sum(seen_num) / n()) %>%
arrange(desc(pct_seen))
``````{r}
library(corrr)lil_seen_wide <- lil_seen %>%
pivot_wider(names_from = movie, values_from = seen_num) %>%
select(-resp_id)correlate(lil_seen_wide) %>%
fashion()
``````{r movies-rplot}
correlate(lil_seen_wide) %>%
rplot()
``````{r movies-upset}
lil_seen_wide <- as.data.frame(lil_seen_wide)
UpSetR::upset(lil_seen_wide, nsets = 13, order.by = "freq")
```Let's briefly pretend we're looking at just three movies: Bridesmaids, Anchorman, and Airplane.
```{r}
three_movies <- c("airplane", "anchorman", "bridesmaids")
``````{r}
movie_long %>%
filter(movie %in% three_movies)
```For the wide version, we'll add the respondent id and age to the list of variables we want to look at.
```{r}
vars_wanted <- c("resp_id", "age", three_movies)movie_results %>%
select(vars_wanted)
```Note the change in "shape" of our data, though the contents remain the same:
![](https://i.imgur.com/JcIlr51.png)
---