Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/batpigandme/pivot-to-video

📺 OK, not *really* this is more about `tidyr::pivot_*()` functions…
https://github.com/batpigandme/pivot-to-video

Last synced: 7 days ago
JSON representation

📺 OK, not *really* this is more about `tidyr::pivot_*()` functions…

Host: GitHub
URL: https://github.com/batpigandme/pivot-to-video
Owner: batpigandme
Created: 2019-09-16T15:22:56.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2019-10-30T13:35:36.000Z (about 5 years ago)
Last Synced: 2024-11-09T21:43:32.313Z (2 months ago)
Language: HTML
Size: 6.04 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd

Awesome Lists containing this project

README

        ---

author: Mara Averick

date: '`r format(Sys.Date())`'

title: '`r emo::ji("tv")` Pivot to Video'

output:

  html_document:

    keep_md: true

---

```{r setup, include = FALSE}

knitr::opts_chunk$set(echo = TRUE, fig.path = 'fig/', dev = 'png', dpi = 144, fig.retina = 2)

```

`r emo::ji("television")` OK, not _really_! 

This is actually about `tidyr::pivot_*()` functions, which you can learn all

about in the new tidyr version 1.0.0 [Pivoting

vignette](https://tidyr.tidyverse.org/articles/pivot.html)! 

But, I *will* be using some nifty TV-related data. So, I'm sticking

with the name.

```{r message=FALSE, warning=FALSE}

library(tidyverse)

```

## `r emo::ji("burger")` Bob's Burgers: A Belcher Family survey

```{r message=FALSE, warning=FALSE}

sheet <- googlesheets::gs_title("bobs_burgers_survey_results")

bobs_ws <- googlesheets::gs_ws_ls(sheet)

```

```{r}

raw_dat <- sheet %>% 

  googlesheets::gs_read(ws = glue::glue("{bobs_ws}"))

```

```{r prelim-results}

belcher_results <- tibble::rowid_to_column(raw_dat, "resp_id") %>%

  dplyr::rename("response" = `Members of the Belcher family with whom I identify (select all that apply)`) %>%

  dplyr::select(-Timestamp)

belcher_results

```

Because I used a [Google Form](https://docs.google.com/forms/d/e/1FAIpQLScYLAkzDzEOWvXrn601Vt79S6DsKLkv12NUsqbo72j1fmapWA/viewform) to collect this data, I don't have to worry about

order of names, since they come out the same every time.

```{r agg-results}

agg_results <- belcher_results %>%

  dplyr::group_by(response) %>%

  dplyr::summarise(total = n()) %>%

  dplyr::arrange(desc(total))

agg_results

```

### `tidyr::separate_rows()`

Let's make them long with

[`tidyr::separate_rows()`](https://tidyr.tidyverse.org/dev/reference/separate_rows.html).

I'm also adding a numeric variable, `identify` (short for "character(s) with

whom I identify"), and ensuring that respondent IDs don't get erroneously

treated as numeric, by converting them to characters.

```{r results-prep}

belcher_results <- belcher_results %>%

  tidyr::separate_rows(response) %>%

  dplyr::mutate(identify = 1,

                resp_id = as.character(resp_id))

head(belcher_results)

```

### `tidyr::pivot_wider()`

Now we'll take one of the "new" tidyr verbs for a spin, [`pivot_wider()`](https://tidyr.tidyverse.org/dev/reference/pivot_wider.html). 

Rather than fill things out with a bunch of `NA`s, we'll prepare our data 

to use with the `UpSetR` package by turning it into binaries, and ditch the 

respondent ID in the end.

```{r}

binary_tib <- belcher_results %>%

  tidyr::pivot_wider(

    names_from = response,

    values_from = identify,

    values_fill = list(identify = 0)

  ) %>%

  dplyr::select(-resp_id)

head(binary_tib)

```

### `UpSetR::upset()`

I *highly* recommend Paul Campbell's

[code-through](https://www.cultureofinsight.com/blog/2018/01/25/2018-01-25-visualising-twitter-follower-overlap/)

using [UpSetR](https://github.com/hms-dbmi/UpSetR), which gave me (among other things) the pro tip that `upset()` does

*not* like tibbles (hence the `as.data.frame()` at the end).

```{r}

binary_df <- as.data.frame(binary_tib)

```

```{r basic-upset}

UpSetR::upset(binary_df, nsets = 5, order.by = "freq")

```

## `r emo::ji("movie")` I have seen this movie...

Let's take a look at another dataset I collected with a quick survey, this one asking people whether they had or had not seen a given movie.

```{r sheet-chunk-2}

sheet <- googlesheets::gs_title("seen_this_movie")

movie_ws <- googlesheets::gs_ws_ls(sheet)

raw_dat <- sheet %>% 

  googlesheets::gs_read(ws = glue::glue("{movie_ws}"))

```

First I'll use [`tibble::rowid_to_column()`](https://tibble.tidyverse.org/reference/rownames.html) to keep track of which user said what (and, again, make sure those are stored as characters rather than numeric, so I don't accidentally analyze it the wrong way).

I still don't care about the timestamp, so I'll get everything _but_ that using `dplyr::select(-Timestamp)`.

The variable names here have a whole bunch of annoying characters (e.g. spaces, exclamation points). So, I'm going to use Sam Firke's [janitor](https://sfirke.github.io/janitor/) package — specifically the  [`janitor::clean_names()`](https://sfirke.github.io/janitor/reference/clean_names.html) function — to take care of some of the grunt work for me there.

I'm also going to rename `my_age_is` to `age`, just because it's annoying.

```{r wide-movie-results}

movie_results <- tibble::rowid_to_column(raw_dat, "resp_id") %>%

  dplyr::mutate(resp_id = as.character(resp_id)) %>%

  dplyr::select(-Timestamp) %>%

  janitor::clean_names() %>%

  dplyr::rename("age" = my_age_is)

head(movie_results)

```

Since this dataset is wide, let's also take a gander at it using `glimpse()`:

```{r glimpse-wide-movies}

glimpse(movie_results)

```

### [`tidyr::pivot_longer()`](https://tidyr.tidyverse.org/reference/pivot_longer.html)

Since I only want to elongate the movies, and all of those columns are logical, I'll first select which columns I want by using `select_if()` and `is.logical()` as the predicate function.

```{r}

logicols <- select_if(movie_results, is.logical) %>%

  colnames()

movie_long <- movie_results %>%

  pivot_longer(

    cols = one_of(logicols),

    names_to = "movie",

    values_to = "seen"

  )

head(movie_long)

```

Aside: A nice little trick for recoding `TRUE`/`FALSE` as `0` and `1`, just use `as.numeric()`.

```{r}

movie_bin <- movie_long %>%

  mutate(seen_num = as.numeric(seen))

movie_bin

```

```{r}

lil_seen <- movie_bin %>%

  select(resp_id, movie, seen_num)

movie_bin %>%

  group_by(movie) %>%

  summarise(pct_seen = sum(seen_num) / n()) %>%

  arrange(desc(pct_seen))

```

```{r}

library(corrr)

lil_seen_wide <- lil_seen %>%

  pivot_wider(names_from = movie, values_from = seen_num) %>%

  select(-resp_id)

correlate(lil_seen_wide) %>%

  fashion()

```

```{r movies-rplot}

correlate(lil_seen_wide) %>%

  rplot()

```

```{r movies-upset}

lil_seen_wide <- as.data.frame(lil_seen_wide)

UpSetR::upset(lil_seen_wide, nsets = 13, order.by = "freq")

```

Let's briefly pretend we're looking at just three movies: Bridesmaids, Anchorman, and Airplane.

```{r}

three_movies <- c("airplane", "anchorman", "bridesmaids")

```

```{r}

movie_long %>%

  filter(movie %in% three_movies)

```

For the wide version, we'll add the respondent id and age to the list of variables we want to look at.

```{r}

vars_wanted <- c("resp_id", "age", three_movies)

movie_results %>%

  select(vars_wanted)

```

Note the change in "shape" of our data, though the contents remain the same:

![](https://i.imgur.com/JcIlr51.png)

---