https://github.com/gadenbuie/tidyexplain

🤹‍♀ Animations of tidyverse verbs using R, the tidyverse, and gganimate
https://github.com/gadenbuie/tidyexplain
dplyr gganimate ggplot2 joins rstats sql
Last synced: 2 months ago
JSON representation
🤹‍♀ Animations of tidyverse verbs using R, the tidyverse, and gganimate
Host: GitHub
URL: https://github.com/gadenbuie/tidyexplain
Owner: gadenbuie
License: cc0-1.0
Created: 2018-08-15T18:29:46.000Z (almost 7 years ago)
Default Branch: main
Last Pushed: 2022-01-11T02:50:28.000Z (over 3 years ago)
Last Synced: 2024-10-14T22:29:57.585Z (8 months ago)
Topics: dplyr, gganimate, ggplot2, joins, rstats, sql
Language: R
Homepage: https://garrickadenbuie.com/project/tidyexplain
Size: 82.9 MB
Stars: 762
Watchers: 31
Forks: 181
Open Issues: 14
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project

jimsghstars - gadenbuie/tidyexplain - 🤹‍♀ Animations of tidyverse verbs using R, the tidyverse, and gganimate (R)
README

        ---

output: github_document

---

```{r setup, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  echo = FALSE,

  warning = FALSE,

  message = FALSE,

  cache = TRUE

)

```

[gganimate]: https://github.com/thomasp85/gganimate#README

[dplyr-two-table]: https://dplyr.tidyverse.org/articles/two-table.html

[r4ds]: http://r4ds.had.co.nz/

[r4ds-relational]: http://r4ds.had.co.nz/relational-data.html

[r4ds-set-ops]: http://r4ds.had.co.nz/relational-data.html#set-operations

[r4ds-tidy-data]: http://r4ds.had.co.nz/tidy-data.html#tidy-data-1

[tidyverse]: https://tidyverse.org

[tidyr]: https://tidyr.tidyverse.org

# Tidy Animated Verbs

[![CC0](https://img.shields.io/badge/license_(images)_-CC0-green.svg)](https://creativecommons.org/publicdomain/zero/1.0/)

[![MIT](https://img.shields.io/badge/license_(code)_-MIT-green.svg)](https://opensource.org/licenses/MIT)

Garrick Aden-Buie -- [&commat;grrrck](https://twitter.com/grrrck) -- [garrickadenbuie.com](https://www.garrickadenbuie.com)

**Thanks to contributions from ...**

- [Tyler Grant Smith](https://github.com/TylerGrantSmith) contributed set operations animations.

- [Lukas Wallrich](https://github.com/LukasWallrich) and [Kelsey Gonzalez](https://github.com/kelseygonzalez) helped create animations of tidyr's pivoting functions.

**Animations:**

- [**Mutating Joins**](#mutating-joins) — [`inner_join()`](#inner-join), [`left_join()`](#left-join),

  [`right_join()`](#right-join), [`full_join()`](#full-join)

  

- [**Filtering Joins**](#filtering-joins) — [`semi_join()`](#semi-join), [`anti_join()`](#anti-join)

- [**Set Operations**](#set-operations) — [`union()`](#union), [`union_all()`](#union-all), [`intersect()`](#intersection), [`setdiff()`](#set-difference)

- [**Tidy Data**](#tidy-data) — [`pivot_wider()` and `pivot_longer()`](#pivot-wider-and-longer), [`spread()` and `gather()`](#spread-and-gather)

- Learn more about

    - [Using the animations and images](#usage)

    - [Relational Data](#relational-data)

    - [gganimate](#gganimate)

    

## Background

### Usage

Please feel free to use these images for teaching or learning about action verbs from the [tidyverse](https://tidyverse.org).

You can directly download the [original animations](images/) or static images in [svg](images/static/svg/) or [png](images/static/png/) formats, or you can use the [scripts](R/) to recreate the images locally.

Currently, the animations cover the [dplyr two-table verbs][dplyr-two-table] and I'd like to expand the animations to include more verbs from the tidyverse.

[Suggestions are welcome!](https://github.com/gadenbuie/tidy-animated-verbs/issues)

### Relational Data

The [Relational Data][r4ds-relational] chapter of the

[R for Data Science][r4ds] book by Garrett Grolemund and Hadley Wickham

is an excellent resource for learning more about relational data.

The [dplyr two-table verbs vignette][dplyr-two-table]

and Jenny Bryan's [Cheatsheet for dplyr join functions](http://stat545.com/bit001_dplyr-cheatsheet.html)

are also great resources.

### gganimate

The animations were made possible by the newly re-written [gganimate] package by 

[Thomas Lin Pedersen](https://github.com/thomasp85)

(original by [Dave Robinson](https://github.com/dgrtwo)).

The [package readme][gganimate] provides an excellent (and quick) introduction to gganimate.

### Dynamic Animations

Thanks to an initial push by [David Zimmermann](https://github.com/DavZim), we have begun work towards functions that generate dynamic animations from users' actual data.

Please visit the [pkg branch](https://github.com/gadenbuie/tidyexplain/tree/pkg) of the tidyexplain repository for more information (or to contribute!).

## Mutating Joins

> A mutating join allows you to combine variables from two tables. It first matches observations by their keys, then copies across variables from one table to the other.  

> [R for Data Science: Mutating joins](http://r4ds.had.co.nz/relational-data.html#mutating-joins)

```{r intial-dfs}

source("R/00_base_join.R")

df_names <- tibble(

  .x = c(1.5, 4.5), .y = 0.25,

  value = c("x", "y"),

  size = 12,

  color = "black"

)

g <- plot_data(initial_join_dfs) +

  geom_text(data = df_names, family = "Fira Mono", size = 24)

save_static_plot(g, "original-dfs")

```



```{r echo=TRUE}

x

y

```

### Inner Join

> All rows from `x` where there are matching values in `y`, and all columns from `x` and `y`.

```{r inner-join}

source("R/inner_join.R")

```

![](images/inner-join.gif)

```{r echo=TRUE}

inner_join(x, y, by = "id")

```

### Left Join

> All rows from `x`, and all columns from `x` and `y`. Rows in `x` with no match in `y` will have `NA` values in the new columns.

```{r left-join}

source("R/left_join.R")

```

![](images/left-join.gif)

```{r echo=TRUE}

left_join(x, y, by = "id")

```

### Left Join (Extra Rows in y)

> ... If there are multiple matches between `x` and `y`, all combinations of the matches are returned.

```{r left-join-extra}

source("R/left_join_extra.R")

```

![](images/left-join-extra.gif)

```{r echo=TRUE}

y_extra # has multiple rows with the key from `x`

left_join(x, y_extra, by = "id")

```

### Right Join

> All rows from y, and all columns from `x` and `y`. Rows in `y` with no match in `x` will have `NA` values in the new columns.

```{r right-join}

source("R/right_join.R")

```

![](images/right-join.gif)

```{r echo=TRUE}

right_join(x, y, by = "id")

```

### Full Join

> All rows and all columns from both `x` and `y`. Where there are not matching values, returns `NA` for the one missing.

```{r full-join}

source("R/full_join.R")

```

![](images/full-join.gif)

```{r echo=TRUE}

full_join(x, y, by = "id")

```

## Filtering Joins

> Filtering joins match observations in the same way as mutating joins, but affect the observations, not the variables.

> ... Semi-joins are useful for matching filtered summary tables back to the original rows.

> ... Anti-joins are useful for diagnosing join mismatches.  

> [R for Data Science: Filtering Joins](http://r4ds.had.co.nz/relational-data.html#filtering-joins)

### Semi Join

> All rows from `x` where there are matching values in `y`, keeping just columns from `x`.

```{r semi-join}

source("R/semi_join.R")

```

![](images/semi-join.gif)

```{r echo=TRUE}

semi_join(x, y, by = "id")

```

### Anti Join

> All rows from `x` where there are not matching values in `y`, keeping just columns from `x`.

```{r anti-join}

source("R/anti_join.R")

```

![](images/anti-join.gif)

```{r echo=TRUE}

anti_join(x, y, by = "id")

```

## Set Operations

> Set operations are occasionally useful when you want to break a single complex filter into simpler pieces. 

> All these operations work with a complete row, comparing the values of every variable. 

> These expect the x and y inputs to have the same variables, and treat the observations like sets.  

> [R for Data Science: Set operations](http://r4ds.had.co.nz/relational-data.html#set-operations)

```{r intial-dfs-so}

source("R/00_base_set.R")

df_names <- tibble(

  .x = c(2.5, 5.5), .y = 0.25,

  value = c("x", "y"),

  size = 12,

  color = "black"

)

g <- plot_data_set(initial_set_dfs, "", NULL, NULL) +

  geom_text(data = df_names, family = "Fira Mono", size = 24)

save_static_plot(g, "original-dfs-set-ops")

```

```{r remove-set-ops-ids}

x <- x %>% select(-id)

y <- y %>% select(-id)

```



```{r echo=TRUE}

x

y 

```

### Union

> All unique rows from `x` and `y`.

```{r union}

source("R/union.R")

<>

```

![](images/union.gif)

```{r echo=TRUE}

union(x, y)

```

![](images/union-rev.gif)

```{r echo=TRUE}

union(y, x)

```

### Union All

> All rows from `x` and `y`, keeping duplicates.

```{r union-all}

source("R/union_all.R")

<>

```

![](images/union-all.gif)

```{r echo=TRUE}

union_all(x, y)

```

### Intersection

> Common rows in both `x` and `y`, keeping just unique rows.

```{r intersect}

source("R/intersect.R")

<>

```

![](images/intersect.gif)

```{r echo=TRUE}

intersect(x, y)

```

### Set Difference

> All rows from `x` which are not also rows in `y`, keeping just unique rows.

```{r setdiff}

source("R/setdiff.R")

<>

```

![](images/setdiff.gif)

```{r echo=TRUE}

setdiff(x, y)

```

![](images/setdiff-rev.gif)

```{r echo=TRUE}

setdiff(y, x)

```

## Tidy Data

[Tidy data][r4ds-tidy-data] follows the following three rules:

1. Each variable has its own column.

1. Each observation has its own row.

1. Each value has its own cell.

Many of the tools in the [tidyverse] expect data to be formatted as a tidy dataset and the [tidyr] package provides functions to help you organize your data into tidy data.

```{r tidyr-wide-long, fig.width = 6, fig.height = 10}

source("R/tidyr_pivoting.R")

source("R/tidyr_spread_gather.R")

tidy_plots <- list()

tidy_plots$wide <- bind_rows(sg_wide, sg_wide_labels)

tidy_plots$long <- bind_rows(sg_long, sg_long_labels)

tidy_plots <- map(tidy_plots, ~ mutate(., 

  .text_color = ifelse(grepl("id|key|val", value), "black", "white"),

  .text_size  = ifelse(grepl("id|key|val", value), 6, 10)

)) %>% 

  imap(~ plot_data(.x, .y))

tidy_plots$wide <- tidy_plots$wide + ylim(-6.5, 0.5)

save_static_plot(

  patchwork::wrap_plots(tidy_plots, widths = 4, heights = 8),

  width = 8,

  height = 8,

  "original-dfs-tidy"

)

```

![](images/static/png/original-dfs-tidy.png)

```{r echo=TRUE}

wide

long

```

### Pivot Wider and Longer

`pivot_wider()` and `pivot_longer()` were introduced in [tidyr version 1.0](https://www.tidyverse.org/blog/2019/09/tidyr-1-0-0/#pivoting) (released in September 2019).

They provide a more consistent and more powerful approach to changing the fundamental shape of the data and are "modern alternatives to `spread()` and `gather()`. 

Here we show the very basic mechanics of pivoting, but there's much more that the pivot functions can do.

You can learn more about them in the [Pivoting vignette in tidyr](https://tidyr.tidyverse.org/articles/pivot.html).

```r

pivot_wider(data, names_from = key, values_from = val)

```

> `pivot_wider()` "widens" data, increasing the number of columns and decreasing the number of rows.

```r

pivot_longer(data, cols = x:y, names_to = "key", values_to = "val")

```

> `pivot_longer()` "lengthens" data, increasing the number of rows and decreasing the number of columns.

![](images/tidyr-pivoting.gif)

### Spread and Gather

```r

spread(data, key, value)

```

> Spread a key-value pair across multiple columns. 

> Use it when an a column contains observations from multiple variables.

```r

gather(data, key = "key", value = "value", ...)

```

> Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed. 

> You use `gather()` when you notice that your column names are not names of variables, but *values* of a variable.

![](images/tidyr-spread-gather.gif)

```{r echo=TRUE}

gather(wide, key, val, x:z)

spread(long, key, val)

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/gadenbuie/tidyexplain

Awesome Lists containing this project

README