An open API service indexing awesome lists of open source software.

https://github.com/romainfrancois/dance

tibble() dancing 💃
https://github.com/romainfrancois/dance

dance dplyr r rstats tibble

Last synced: 2 days ago
JSON representation

tibble() dancing 💃

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# dance

[![Lifecycle Status](https://img.shields.io/badge/lifecycle-experimental-blue.svg)](https://www.tidyverse.org/lifecycle/)
[![Travis build status](https://travis-ci.org/romainfrancois/dance.svg?branch=master)](https://travis-ci.org/romainfrancois/dance)

![](https://media.giphy.com/media/mnLGTXoWVzAfm/giphy.gif)

Dancing `r emo::ji("woman_dancing")` with the stats, aka `tibble()` dancing `r emo::ji("man_dancing")`.
`dance` is a sort of reinvention of `dplyr` classic verbs, with a more modern stack
underneath, i.e. it leverages a lot from `vctrs` and `rlang`.

# Installation

You can install the development version from GitHub.

```{r, eval=FALSE}
# install.packages("pak")
pak::pkg_install("romainfrancois/dance")
```

# Usage

We'll illustrate tibble dancing with `iris` grouped by `Species`.

```{r example}
library(dance)
g <- iris %>% group_by(Species)

```

### waltz(), polka(), tango(), charleston()

These are in the neighborhood of `dplyr::summarise()`.

`waltz()` takes a grouped tibble and a list of formulas and returns a tibble with:
as many columns as supplied formulas, one row per group. It does not prepend the grouping
variables (see `tango` for that).

```{r}
g %>%
waltz(
Sepal.Length = ~mean(Sepal.Length),
Sepal.Width = ~mean(Sepal.Width)
)
```

`polka()` deals with peeling off one layer of grouping:

```{r}
g %>%
polka()
```

`tango()` binds the results of `polka()` and `waltz()` so is the closest to
`dplyr::summarise()`

```{r}
g %>%
tango(
Sepal.Length = ~mean(Sepal.Length),
Sepal.Width = ~mean(Sepal.Width)
)
```

`charleston()` is like `tango` but it packs the new columns in a tibble:

```{r}
g %>%
charleston(
Sepal.Length = ~mean(Sepal.Length),
Sepal.Width = ~mean(Sepal.Width)
)
```

### swing, twist

There is no `waltz_at()`, `tango_at()`, etc ... but instead we can use
either the same function on a set of columns or a set of functions on the same column.

For this, we need to learn new dance moves:

`swing()` and `twist()` are for applying the same function to a set
of columns:

```{r}
library(tidyselect)

g %>%
tango(swing(mean, starts_with("Petal")))

g %>%
tango(data = twist(mean, starts_with("Petal")))
```

They differ in the type of column is created and how to name them:

- `swing()` makes as many new columns as are selected by the tidy selection, and
the columns are named using a `.name` glue pattern, this way we might `swing()`
several times.

```{r}
g %>%
tango(
swing(mean, starts_with("Petal"), .name = "mean_{var}"),
swing(median, starts_with("Petal"), .name = "median_{var}"),
)
```

- `twist()` instead creates a single data frame column.

```{r}
g %>%
tango(
mean = twist(mean, starts_with("Petal")),
median = twist(median, starts_with("Petal")),
)
```

The first arguments of `swing()` and `twist()` are either a function or a
formula that uses `.` as a placeholder. Subsequent arguments are
tidyselect selections.

You can combine `swing()` and `twist()` in the same `tango()` or `waltz()`:

```{r}
g %>%
tango(
swing(mean, starts_with("Petal"), .name = "mean_{var}"),
median = twist(median, contains("."))
)
```

### rumba, zumba

Similarly `rumba()` can be used to apply several functions to a single column.
`rumba()` creates single columns and `zumba()` packs them into a data frame column.

```{r}
g %>%
tango(
rumba(Sepal.Width, mean = mean, median = median, .name = "Sepal_{fun}"),
Petal = zumba(Petal.Width, mean = mean, median = median)
)
```

### salsa, chacha, samba, madison

Now we enter the realms of `dplyr::mutate()` with:

- `salsa()` : to create new columns
- `chacha()`: to reorganize a grouped tibble so that data for each group is contiguous
- `samba()` : `chacha()` + `salsa()`

```{r}
g %>%
salsa(
Sepal = ~Sepal.Length * Sepal.Width,
Petal = ~Petal.Length * Petal.Width
)
```

You can `swing()`, `twist()`, `rumba()` and `zumba()` here too, and if you
want the original data, you can use `samba()` instead of `salsa()`:

```{r}
g %>%
samba(centered = twist(~ . - mean(.), everything(), -Species))
```

`madison()` packs the columns `salsa()` would have created

```{r}
g %>%
madison(swing(~ . - mean(.), starts_with("Sepal")))
```

### bolero and mambo

`bolero()` is similar to `dplyr::filter()`.
The formulas may be made by `mambo()` if you want to apply the same
predicate to a tidyselection of columns:

```{r}
g %>%
bolero(~Sepal.Width > 4)

g %>%
bolero(mambo(~. > 4, starts_with("Sepal")))

g %>%
bolero(mambo(~. > 4, starts_with("Sepal"), .op = or))
```