https://github.com/trinker/dplyr_in_a_nutshell
This is a minimal guide, mostly for myself, to remind me of the most import dplyr functions and how they relate to base R functions I'm that familiar with.
https://github.com/trinker/dplyr_in_a_nutshell
Last synced: about 1 year ago
JSON representation
This is a minimal guide, mostly for myself, to remind me of the most import dplyr functions and how they relate to base R functions I'm that familiar with.
- Host: GitHub
- URL: https://github.com/trinker/dplyr_in_a_nutshell
- Owner: trinker
- Created: 2014-01-24T03:12:38.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2017-09-01T02:00:15.000Z (almost 9 years ago)
- Last Synced: 2025-02-14T13:15:38.201Z (over 1 year ago)
- Homepage:
- Size: 35.2 KB
- Stars: 35
- Watchers: 3
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
README
dplyr In a Nutshell
===
This is a minimal guide, mostly for myself, to remind me of the most import dplyr functions and how they relate to base R functions that I'm familiar with. Also check out [tidyr In a Nutshell](https://github.com/trinker/tidyr_in_a_nutshell).
```{r setup, include=FALSE, echo=FALSE}
opts_chunk$set(comment=NA, tidy=FALSE)
```
# 8 dplyr Functions to Rule the World
### Speedy Table
`tbl_df`
### The 5 Guys + 1
1. `filter`
2. `select`
3. `mutate`
4. `group_by`
5. `summarise`
6. `arrange`
### Chaining (pronounced "then")
`%>%`
# Relating the Functions
### Speedy Table
`tbl_df` works similar to `data.table` in that it prints sensibly.
### Relating the 5 Guys + 1 to base R
List of dplyr functions and the base functions they're related to:
Base Function | dplyr Function(s) | Special Powers
-----------------|-------------------|-----------------------------
subset | filter & select | filter rows & select columns
transform | mutate | operate with columns not yet created
split | group_by | splits without cutting
lapply + do.call | summarise | apply and bind in a single bound
order + with | arrange | "I only have to specify dataframe once?"
### Chaining
`%>%`... Do you know ggplot2's `+`? Same idea.

*Basically previous input in chain supplied as argument 1 to function on right side.*
# Demos
### Speedy Table
```{r, message=FALSE}
library(dplyr)
mtcars2 <- tbl_df(mtcars)
```
### The 5 Guys
```{r, message=FALSE}
filter(mtcars2[1:10, ], cyl == 8)
select(mtcars2[1:10, ], mpg, cyl, hp:vs)
arrange(mtcars2[1:10, ], cyl, disp)
mutate(mtcars2[1:10, ], displ_l = disp / 61.0237, displ_l_add1 = displ_l + 1)
summarise(mtcars, mean(disp))
```
### Chaining
```{r}
mtcars2 %>%
group_by(cyl) %>%
summarise(md=mean(disp), mh=mean(hp), mdh=mean(disp + hp))
mtcars2 %>%
group_by(cyl, gear) %>%
summarise(md=mean(disp), mh=mean(hp), mdh=mean(disp + hp)) %>%
arrange(-cyl, -gear)
## Use `%>%` with base functions too!!!
mtcars2 %>%
group_by(cyl, gear) %>%
summarise(md=mean(disp), mh=mean(hp), mdh=mean(disp + hp)) %>%
arrange(-cyl, -gear) %>%
head()
mtcars2 %>%
group_by(cyl) %>%
summarise(max(disp), hp[1])
mtcars2 %>%
group_by(cyl) %>%
summarise(n = n())
table(mtcars$cyl)
```