An open API service indexing awesome lists of open source software.

https://github.com/markfairbanks/tidytable

Tidy interface to 'data.table'
https://github.com/markfairbanks/tidytable

Last synced: about 21 hours ago
JSON representation

Tidy interface to 'data.table'

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
warning = FALSE,
message = FALSE
)
```

# tidytable

[![CRAN status](https://www.r-pkg.org/badges/version/tidytable)](https://cran.r-project.org/package=tidytable)
![r-universe](https://fastverse.r-universe.dev/badges/tidytable)
[![downloads](http://cranlogs.r-pkg.org/badges/grand-total/tidytable?color=blue)](https://r-pkg.org/pkg/tidytable)
[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/last-month/tidytable?color=blue)](https://markfairbanks.github.io/tidytable/)
[![R-CMD-check](https://github.com/markfairbanks/tidytable/workflows/R-CMD-check/badge.svg)](https://github.com/markfairbanks/tidytable/actions)

`tidytable` is a data frame manipulation library for users who need [`data.table` speed](https://markfairbanks.github.io/tidytable/articles/speed_comparisons.html) but prefer `tidyverse`-like syntax.

## Installation

Install the released version from [CRAN](https://CRAN.R-project.org) with:

``` r
install.packages("tidytable")
```

Or install the development version from [GitHub](https://github.com/) with:

``` r
# install.packages("pak")
pak::pak("markfairbanks/tidytable")
```

## General syntax

`tidytable` replicates `tidyverse` syntax but uses `data.table` in the background. In general you can simply use `library(tidytable)` to replace your existing `dplyr` and `tidyr` code with `data.table` backed equivalents.

A full list of implemented functions can be found [here](https://markfairbanks.github.io/tidytable/reference/index.html).

```{r}
library(tidytable)

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
select(x, y, z) %>%
filter(x < 4, y > 1) %>%
arrange(x, y) %>%
mutate(double_x = x * 2,
x_plus_y = x + y)
```

## Applying functions by group

You can use the normal `tidyverse` `group_by()`/`ungroup()` workflow, or you can use `.by` syntax to reduce typing. Using `.by` in a function is shorthand for `df %>% group_by() %>% some_function() %>% ungroup()`.

* A single column can be passed with `.by = z`
* Multiple columns can be passed with `.by = c(y, z)`

```{r}
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
summarize(avg_z = mean(z),
.by = c(x, y))
```

All functions that can operate by group have a `.by` argument built in.
(`mutate()`, `filter()`, `summarize()`, etc.)

The above syntax is equivalent to:

```{r}
df %>%
group_by(x, y) %>%
summarize(avg_z = mean(z)) %>%
ungroup()
```

Both options are available for users, so you can use the syntax that you prefer.

## tidyselect support

`tidytable` allows you to select/drop columns just like you would in the tidyverse by utilizing the [`tidyselect`](https://tidyselect.r-lib.org) package in the background.

Normal selection can be mixed with all `tidyselect` helpers: `everything()`, `starts_with()`, `ends_with()`, `any_of()`, `where()`, etc.

```{r}
df <- data.table(
a = 1:3,
b1 = 4:6,
b2 = 7:9,
c = c("a", "a", "b")
)

df %>%
select(a, starts_with("b"))
```

A full overview of selection options can be found [here](https://tidyselect.r-lib.org/reference/language.html).

### Using tidyselect in `.by`

`tidyselect` helpers also work when using `.by`:

```{r}
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
summarize(avg_z = mean(z),
.by = where(is.character))
```

## Tidy evaluation compatibility

Tidy evaluation can be used to write custom functions with `tidytable` functions.
The embracing shortcut `{{ }}` works, or you can use `enquo()` with `!!` if you prefer:

```{r}
df <- data.table(x = c(1, 1, 1), y = 4:6, z = c("a", "a", "b"))

add_one <- function(data, add_col) {
data %>%
mutate(new_col = {{ add_col }} + 1)
}

df %>%
add_one(x)
```

The `.data` and `.env` pronouns also work within `tidytable` functions:

```{r}
var <- 10

df %>%
mutate(new_col = .data$x + .env$var)
```

A full overview of tidy evaluation can be found [here](https://rlang.r-lib.org/reference/topic-data-mask.html).

## `dt()` helper

The `dt()` function makes regular `data.table` syntax pipeable, so you can easily mix `tidytable` syntax with `data.table` syntax:

```{r}
df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
dt(, .(x, y, z)) %>%
dt(x < 4 & y > 1) %>%
dt(order(x, y)) %>%
dt(, double_x := x * 2) %>%
dt(, .(avg_x = mean(x)), by = z)
```

## Speed Comparisons

For those interested in performance, speed comparisons can be found [here](https://markfairbanks.github.io/tidytable/articles/speed_comparisons.html).

## Acknowledgements
`tidytable` is only possible because of the great contributions to R by the `data.table` and `tidyverse` teams. `data.table` is used as the main data frame engine in the background, while `tidyverse` packages like `rlang`, `vctrs`, and `tidyselect` are heavily relied upon to give users an experience similar to `dplyr` and `tidyr`.