An open API service indexing awesome lists of open source software.

https://github.com/trinker/dplyr_in_a_nutshell

This is a minimal guide, mostly for myself, to remind me of the most import dplyr functions and how they relate to base R functions I'm that familiar with.
https://github.com/trinker/dplyr_in_a_nutshell

Last synced: about 1 year ago
JSON representation

This is a minimal guide, mostly for myself, to remind me of the most import dplyr functions and how they relate to base R functions I'm that familiar with.

Awesome Lists containing this project

README

          

dplyr In a Nutshell
===

This is a minimal guide, mostly for myself, to remind me of the most import dplyr functions and how they relate to base R functions that I'm familiar with. Also check out [tidyr In a Nutshell](https://github.com/trinker/tidyr_in_a_nutshell).

```{r setup, include=FALSE, echo=FALSE}
opts_chunk$set(comment=NA, tidy=FALSE)
```

# 8 dplyr Functions to Rule the World

### Speedy Table

`tbl_df`

### The 5 Guys + 1

1. `filter`
2. `select`
3. `mutate`
4. `group_by`
5. `summarise`
6. `arrange`

### Chaining (pronounced "then")

`%>%`

# Relating the Functions

### Speedy Table

`tbl_df` works similar to `data.table` in that it prints sensibly.

### Relating the 5 Guys + 1 to base R

List of dplyr functions and the base functions they're related to:

Base Function | dplyr Function(s) | Special Powers
-----------------|-------------------|-----------------------------
subset | filter & select | filter rows & select columns
transform | mutate | operate with columns not yet created
split | group_by | splits without cutting
lapply + do.call | summarise | apply and bind in a single bound
order + with | arrange | "I only have to specify dataframe once?"

### Chaining

`%>%`... Do you know ggplot2's `+`? Same idea.

![](chain.png)

*Basically previous input in chain supplied as argument 1 to function on right side.*

# Demos
### Speedy Table
```{r, message=FALSE}
library(dplyr)
mtcars2 <- tbl_df(mtcars)
```

### The 5 Guys
```{r, message=FALSE}
filter(mtcars2[1:10, ], cyl == 8)
select(mtcars2[1:10, ], mpg, cyl, hp:vs)
arrange(mtcars2[1:10, ], cyl, disp)
mutate(mtcars2[1:10, ], displ_l = disp / 61.0237, displ_l_add1 = displ_l + 1)
summarise(mtcars, mean(disp))
```

### Chaining

```{r}
mtcars2 %>%
group_by(cyl) %>%
summarise(md=mean(disp), mh=mean(hp), mdh=mean(disp + hp))
mtcars2 %>%
group_by(cyl, gear) %>%
summarise(md=mean(disp), mh=mean(hp), mdh=mean(disp + hp)) %>%
arrange(-cyl, -gear)
## Use `%>%` with base functions too!!!
mtcars2 %>%
group_by(cyl, gear) %>%
summarise(md=mean(disp), mh=mean(hp), mdh=mean(disp + hp)) %>%
arrange(-cyl, -gear) %>%
head()
mtcars2 %>%
group_by(cyl) %>%
summarise(max(disp), hp[1])
mtcars2 %>%
group_by(cyl) %>%
summarise(n = n())
table(mtcars$cyl)
```