https://github.com/romainfrancois/dance
tibble() dancing 💃
https://github.com/romainfrancois/dance
dance dplyr r rstats tibble
Last synced: 2 days ago
JSON representation
tibble() dancing 💃
- Host: GitHub
- URL: https://github.com/romainfrancois/dance
- Owner: romainfrancois
- License: other
- Created: 2019-02-22T20:37:36.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2019-11-20T08:19:25.000Z (over 5 years ago)
- Last Synced: 2025-04-13T00:42:21.223Z (2 days ago)
- Topics: dance, dplyr, r, rstats, tibble
- Language: R
- Homepage:
- Size: 428 KB
- Stars: 45
- Watchers: 4
- Forks: 6
- Open Issues: 17
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- awesome-dataframes - dance - Dancing 💃 with the stats, aka `tibble()` dancing 🕺. dance is a sort of reinvention of dplyr classic verbs, with a more modern stack underneath, i.e. it leverages a lot from vctrs and rlang. (Libraries)
- jimsghstars - romainfrancois/dance - tibble() dancing 💃 (R)
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```# dance
[](https://www.tidyverse.org/lifecycle/)
[](https://travis-ci.org/romainfrancois/dance)
Dancing `r emo::ji("woman_dancing")` with the stats, aka `tibble()` dancing `r emo::ji("man_dancing")`.
`dance` is a sort of reinvention of `dplyr` classic verbs, with a more modern stack
underneath, i.e. it leverages a lot from `vctrs` and `rlang`.# Installation
You can install the development version from GitHub.
```{r, eval=FALSE}
# install.packages("pak")
pak::pkg_install("romainfrancois/dance")
```# Usage
We'll illustrate tibble dancing with `iris` grouped by `Species`.
```{r example}
library(dance)
g <- iris %>% group_by(Species)```
### waltz(), polka(), tango(), charleston()
These are in the neighborhood of `dplyr::summarise()`.
`waltz()` takes a grouped tibble and a list of formulas and returns a tibble with:
as many columns as supplied formulas, one row per group. It does not prepend the grouping
variables (see `tango` for that).
```{r}
g %>%
waltz(
Sepal.Length = ~mean(Sepal.Length),
Sepal.Width = ~mean(Sepal.Width)
)
````polka()` deals with peeling off one layer of grouping:
```{r}
g %>%
polka()
````tango()` binds the results of `polka()` and `waltz()` so is the closest to
`dplyr::summarise()````{r}
g %>%
tango(
Sepal.Length = ~mean(Sepal.Length),
Sepal.Width = ~mean(Sepal.Width)
)
````charleston()` is like `tango` but it packs the new columns in a tibble:
```{r}
g %>%
charleston(
Sepal.Length = ~mean(Sepal.Length),
Sepal.Width = ~mean(Sepal.Width)
)
```### swing, twist
There is no `waltz_at()`, `tango_at()`, etc ... but instead we can use
either the same function on a set of columns or a set of functions on the same column.For this, we need to learn new dance moves:
`swing()` and `twist()` are for applying the same function to a set
of columns:```{r}
library(tidyselect)g %>%
tango(swing(mean, starts_with("Petal")))g %>%
tango(data = twist(mean, starts_with("Petal")))
```They differ in the type of column is created and how to name them:
- `swing()` makes as many new columns as are selected by the tidy selection, and
the columns are named using a `.name` glue pattern, this way we might `swing()`
several times.
```{r}
g %>%
tango(
swing(mean, starts_with("Petal"), .name = "mean_{var}"),
swing(median, starts_with("Petal"), .name = "median_{var}"),
)
```- `twist()` instead creates a single data frame column.
```{r}
g %>%
tango(
mean = twist(mean, starts_with("Petal")),
median = twist(median, starts_with("Petal")),
)
```The first arguments of `swing()` and `twist()` are either a function or a
formula that uses `.` as a placeholder. Subsequent arguments are
tidyselect selections.You can combine `swing()` and `twist()` in the same `tango()` or `waltz()`:
```{r}
g %>%
tango(
swing(mean, starts_with("Petal"), .name = "mean_{var}"),
median = twist(median, contains("."))
)
```### rumba, zumba
Similarly `rumba()` can be used to apply several functions to a single column.
`rumba()` creates single columns and `zumba()` packs them into a data frame column.```{r}
g %>%
tango(
rumba(Sepal.Width, mean = mean, median = median, .name = "Sepal_{fun}"),
Petal = zumba(Petal.Width, mean = mean, median = median)
)
```### salsa, chacha, samba, madison
Now we enter the realms of `dplyr::mutate()` with:
- `salsa()` : to create new columns
- `chacha()`: to reorganize a grouped tibble so that data for each group is contiguous
- `samba()` : `chacha()` + `salsa()````{r}
g %>%
salsa(
Sepal = ~Sepal.Length * Sepal.Width,
Petal = ~Petal.Length * Petal.Width
)
```You can `swing()`, `twist()`, `rumba()` and `zumba()` here too, and if you
want the original data, you can use `samba()` instead of `salsa()`:```{r}
g %>%
samba(centered = twist(~ . - mean(.), everything(), -Species))
````madison()` packs the columns `salsa()` would have created
```{r}
g %>%
madison(swing(~ . - mean(.), starts_with("Sepal")))
```### bolero and mambo
`bolero()` is similar to `dplyr::filter()`.
The formulas may be made by `mambo()` if you want to apply the same
predicate to a tidyselection of columns:```{r}
g %>%
bolero(~Sepal.Width > 4)g %>%
bolero(mambo(~. > 4, starts_with("Sepal")))g %>%
bolero(mambo(~. > 4, starts_with("Sepal"), .op = or))
```