Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gadenbuie/tidyexplain
๐คนโโ Animations of tidyverse verbs using R, the tidyverse, and gganimate
https://github.com/gadenbuie/tidyexplain
dplyr gganimate ggplot2 joins rstats sql
Last synced: about 3 hours ago
JSON representation
๐คนโโ Animations of tidyverse verbs using R, the tidyverse, and gganimate
- Host: GitHub
- URL: https://github.com/gadenbuie/tidyexplain
- Owner: gadenbuie
- License: cc0-1.0
- Created: 2018-08-15T18:29:46.000Z (over 6 years ago)
- Default Branch: main
- Last Pushed: 2022-01-11T02:50:28.000Z (about 3 years ago)
- Last Synced: 2024-10-14T22:29:57.585Z (4 months ago)
- Topics: dplyr, gganimate, ggplot2, joins, rstats, sql
- Language: R
- Homepage: https://garrickadenbuie.com/project/tidyexplain
- Size: 82.9 MB
- Stars: 762
- Watchers: 31
- Forks: 181
- Open Issues: 14
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
- jimsghstars - gadenbuie/tidyexplain - ๐คนโโ Animations of tidyverse verbs using R, the tidyverse, and gganimate (R)
README
---
output: github_document
---```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
echo = FALSE,
warning = FALSE,
message = FALSE,
cache = TRUE
)
```[gganimate]: https://github.com/thomasp85/gganimate#README
[dplyr-two-table]: https://dplyr.tidyverse.org/articles/two-table.html
[r4ds]: http://r4ds.had.co.nz/
[r4ds-relational]: http://r4ds.had.co.nz/relational-data.html
[r4ds-set-ops]: http://r4ds.had.co.nz/relational-data.html#set-operations
[r4ds-tidy-data]: http://r4ds.had.co.nz/tidy-data.html#tidy-data-1
[tidyverse]: https://tidyverse.org
[tidyr]: https://tidyr.tidyverse.org# Tidy Animated Verbs
[data:image/s3,"s3://crabby-images/62567/62567df72ac3ab212fb4c2a5b9c8cd41bd9d654f" alt="CC0"_-CC0-green.svg)](https://creativecommons.org/publicdomain/zero/1.0/)
[data:image/s3,"s3://crabby-images/99e5c/99e5c07357608d8f0f8c98cc343e14b177bc0a3d" alt="MIT"_-MIT-green.svg)](https://opensource.org/licenses/MIT)Garrick Aden-Buie -- [@grrrck](https://twitter.com/grrrck) -- [garrickadenbuie.com](https://www.garrickadenbuie.com)
**Thanks to contributions from ...**
- [Tyler Grant Smith](https://github.com/TylerGrantSmith) contributed set operations animations.
- [Lukas Wallrich](https://github.com/LukasWallrich) and [Kelsey Gonzalez](https://github.com/kelseygonzalez) helped create animations of tidyr's pivoting functions.**Animations:**
- [**Mutating Joins**](#mutating-joins) โ [`inner_join()`](#inner-join), [`left_join()`](#left-join),
[`right_join()`](#right-join), [`full_join()`](#full-join)
- [**Filtering Joins**](#filtering-joins) โ [`semi_join()`](#semi-join), [`anti_join()`](#anti-join)- [**Set Operations**](#set-operations) โ [`union()`](#union), [`union_all()`](#union-all), [`intersect()`](#intersection), [`setdiff()`](#set-difference)
- [**Tidy Data**](#tidy-data) โ [`pivot_wider()` and `pivot_longer()`](#pivot-wider-and-longer), [`spread()` and `gather()`](#spread-and-gather)
- Learn more about
- [Using the animations and images](#usage)
- [Relational Data](#relational-data)
- [gganimate](#gganimate)
## Background### Usage
Please feel free to use these images for teaching or learning about action verbs from the [tidyverse](https://tidyverse.org).
You can directly download the [original animations](images/) or static images in [svg](images/static/svg/) or [png](images/static/png/) formats, or you can use the [scripts](R/) to recreate the images locally.Currently, the animations cover the [dplyr two-table verbs][dplyr-two-table] and I'd like to expand the animations to include more verbs from the tidyverse.
[Suggestions are welcome!](https://github.com/gadenbuie/tidy-animated-verbs/issues)### Relational Data
The [Relational Data][r4ds-relational] chapter of the
[R for Data Science][r4ds] book by Garrett Grolemund and Hadley Wickham
is an excellent resource for learning more about relational data.The [dplyr two-table verbs vignette][dplyr-two-table]
and Jenny Bryan's [Cheatsheet for dplyr join functions](http://stat545.com/bit001_dplyr-cheatsheet.html)
are also great resources.### gganimate
The animations were made possible by the newly re-written [gganimate] package by
[Thomas Lin Pedersen](https://github.com/thomasp85)
(original by [Dave Robinson](https://github.com/dgrtwo)).
The [package readme][gganimate] provides an excellent (and quick) introduction to gganimate.### Dynamic Animations
Thanks to an initial push by [David Zimmermann](https://github.com/DavZim), we have begun work towards functions that generate dynamic animations from users' actual data.
Please visit the [pkg branch](https://github.com/gadenbuie/tidyexplain/tree/pkg) of the tidyexplain repository for more information (or to contribute!).## Mutating Joins
> A mutating join allows you to combine variables from two tables. It first matches observations by their keys, then copies across variables from one table to the other.
> [R for Data Science: Mutating joins](http://r4ds.had.co.nz/relational-data.html#mutating-joins)```{r intial-dfs}
source("R/00_base_join.R")
df_names <- tibble(
.x = c(1.5, 4.5), .y = 0.25,
value = c("x", "y"),
size = 12,
color = "black"
)g <- plot_data(initial_join_dfs) +
geom_text(data = df_names, family = "Fira Mono", size = 24)save_static_plot(g, "original-dfs")
```
```{r echo=TRUE}
x
y
```### Inner Join
> All rows from `x` where there are matching values in `y`, and all columns from `x` and `y`.
```{r inner-join}
source("R/inner_join.R")
```data:image/s3,"s3://crabby-images/682dc/682dcd982d315de84fae0a31f96e11b1be6accca" alt=""
```{r echo=TRUE}
inner_join(x, y, by = "id")
```### Left Join
> All rows from `x`, and all columns from `x` and `y`. Rows in `x` with no match in `y` will have `NA` values in the new columns.
```{r left-join}
source("R/left_join.R")
```data:image/s3,"s3://crabby-images/a235a/a235a525732473731b8f86b39740804ba2cf8a1d" alt=""
```{r echo=TRUE}
left_join(x, y, by = "id")
```### Left Join (Extra Rows in y)
> ... If there are multiple matches between `x` and `y`, all combinations of the matches are returned.
```{r left-join-extra}
source("R/left_join_extra.R")
```data:image/s3,"s3://crabby-images/ef8de/ef8de9ac695c21a23605b18d246fbb11eb1f511e" alt=""
```{r echo=TRUE}
y_extra # has multiple rows with the key from `x`
left_join(x, y_extra, by = "id")
```### Right Join
> All rows from y, and all columns from `x` and `y`. Rows in `y` with no match in `x` will have `NA` values in the new columns.
```{r right-join}
source("R/right_join.R")
```data:image/s3,"s3://crabby-images/f155c/f155ccee6a135b42cea00fcf626dba9f1abf5c15" alt=""
```{r echo=TRUE}
right_join(x, y, by = "id")
```### Full Join
> All rows and all columns from both `x` and `y`. Where there are not matching values, returns `NA` for the one missing.
```{r full-join}
source("R/full_join.R")
```data:image/s3,"s3://crabby-images/a8c30/a8c309f892cbff6aadaf39c1a2437e18b12a38a3" alt=""
```{r echo=TRUE}
full_join(x, y, by = "id")
```## Filtering Joins
> Filtering joins match observations in the same way as mutating joins, but affect the observations, not the variables.
> ... Semi-joins are useful for matching filtered summary tables back to the original rows.
> ... Anti-joins are useful for diagnosing join mismatches.
> [R for Data Science: Filtering Joins](http://r4ds.had.co.nz/relational-data.html#filtering-joins)### Semi Join
> All rows from `x` where there are matching values in `y`, keeping just columns from `x`.
```{r semi-join}
source("R/semi_join.R")
```data:image/s3,"s3://crabby-images/456bf/456bf49c84ed84b00f6e09fbac96fef8e4152e49" alt=""
```{r echo=TRUE}
semi_join(x, y, by = "id")
```### Anti Join
> All rows from `x` where there are not matching values in `y`, keeping just columns from `x`.
```{r anti-join}
source("R/anti_join.R")
```data:image/s3,"s3://crabby-images/11dd3/11dd39ed398f1c44b0acac5c939412c2397d046b" alt=""
```{r echo=TRUE}
anti_join(x, y, by = "id")
```## Set Operations
> Set operations are occasionally useful when you want to break a single complex filter into simpler pieces.
> All these operations work with a complete row, comparing the values of every variable.
> These expect the x and y inputs to have the same variables, and treat the observations like sets.
> [R for Data Science: Set operations](http://r4ds.had.co.nz/relational-data.html#set-operations)```{r intial-dfs-so}
source("R/00_base_set.R")
df_names <- tibble(
.x = c(2.5, 5.5), .y = 0.25,
value = c("x", "y"),
size = 12,
color = "black"
)g <- plot_data_set(initial_set_dfs, "", NULL, NULL) +
geom_text(data = df_names, family = "Fira Mono", size = 24)save_static_plot(g, "original-dfs-set-ops")
``````{r remove-set-ops-ids}
x <- x %>% select(-id)
y <- y %>% select(-id)
```
```{r echo=TRUE}
x
y
```### Union
> All unique rows from `x` and `y`.
```{r union}
source("R/union.R")
<>
```data:image/s3,"s3://crabby-images/e6fb0/e6fb03b5e27fcb3b048461ef4c3b74a599903ec7" alt=""
```{r echo=TRUE}
union(x, y)
```data:image/s3,"s3://crabby-images/b01cd/b01cdd32fc9500ec90ee4513f0270d9fd5d123f6" alt=""
```{r echo=TRUE}
union(y, x)
```### Union All
> All rows from `x` and `y`, keeping duplicates.
```{r union-all}
source("R/union_all.R")
<>
```data:image/s3,"s3://crabby-images/53239/53239e034b9b0cabe1819d7ef228bcdb6c34d969" alt=""
```{r echo=TRUE}
union_all(x, y)
```### Intersection
> Common rows in both `x` and `y`, keeping just unique rows.
```{r intersect}
source("R/intersect.R")
<>
```data:image/s3,"s3://crabby-images/efd23/efd237d4408fb564289d63391d1773c02c89687a" alt=""
```{r echo=TRUE}
intersect(x, y)
```### Set Difference
> All rows from `x` which are not also rows in `y`, keeping just unique rows.
```{r setdiff}
source("R/setdiff.R")
<>
```data:image/s3,"s3://crabby-images/5854c/5854cc50d8a3329ce05d3e79699b19147d0e922d" alt=""
```{r echo=TRUE}
setdiff(x, y)
```data:image/s3,"s3://crabby-images/4f5c6/4f5c601c2e08f45c380ac58f24c4d9770c1cd590" alt=""
```{r echo=TRUE}
setdiff(y, x)
```## Tidy Data
[Tidy data][r4ds-tidy-data] follows the following three rules:
1. Each variable has its own column.
1. Each observation has its own row.
1. Each value has its own cell.Many of the tools in the [tidyverse] expect data to be formatted as a tidy dataset and the [tidyr] package provides functions to help you organize your data into tidy data.
```{r tidyr-wide-long, fig.width = 6, fig.height = 10}
source("R/tidyr_pivoting.R")
source("R/tidyr_spread_gather.R")tidy_plots <- list()
tidy_plots$wide <- bind_rows(sg_wide, sg_wide_labels)
tidy_plots$long <- bind_rows(sg_long, sg_long_labels)tidy_plots <- map(tidy_plots, ~ mutate(.,
.text_color = ifelse(grepl("id|key|val", value), "black", "white"),
.text_size = ifelse(grepl("id|key|val", value), 6, 10)
)) %>%
imap(~ plot_data(.x, .y))tidy_plots$wide <- tidy_plots$wide + ylim(-6.5, 0.5)
save_static_plot(
patchwork::wrap_plots(tidy_plots, widths = 4, heights = 8),
width = 8,
height = 8,
"original-dfs-tidy"
)
```data:image/s3,"s3://crabby-images/39055/390550970257e8dc03d805b9b98178ab9bb43ebc" alt=""
```{r echo=TRUE}
wide
long
```### Pivot Wider and Longer
`pivot_wider()` and `pivot_longer()` were introduced in [tidyr version 1.0](https://www.tidyverse.org/blog/2019/09/tidyr-1-0-0/#pivoting) (released in September 2019).
They provide a more consistent and more powerful approach to changing the fundamental shape of the data and are "modern alternatives to `spread()` and `gather()`.Here we show the very basic mechanics of pivoting, but there's much more that the pivot functions can do.
You can learn more about them in the [Pivoting vignette in tidyr](https://tidyr.tidyverse.org/articles/pivot.html).```r
pivot_wider(data, names_from = key, values_from = val)
```> `pivot_wider()` "widens" data, increasing the number of columns and decreasing the number of rows.
```r
pivot_longer(data, cols = x:y, names_to = "key", values_to = "val")
```> `pivot_longer()` "lengthens" data, increasing the number of rows and decreasing the number of columns.
data:image/s3,"s3://crabby-images/27cee/27cee723000ef890054700067c77369efa51ed29" alt=""
### Spread and Gather
```r
spread(data, key, value)
```> Spread a key-value pair across multiple columns.
> Use it when an a column contains observations from multiple variables.```r
gather(data, key = "key", value = "value", ...)
```> Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed.
> You use `gather()` when you notice that your column names are not names of variables, but *values* of a variable.data:image/s3,"s3://crabby-images/704a2/704a21a5259bf9a583eb1a598c23f47ea474cea9" alt=""
```{r echo=TRUE}
gather(wide, key, val, x:z)
spread(long, key, val)
```