https://github.com/markfairbanks/tidytable

Tidy interface to 'data.table'
https://github.com/markfairbanks/tidytable

Last synced: about 2 months ago
JSON representation

Tidy interface to 'data.table'

Host: GitHub
URL: https://github.com/markfairbanks/tidytable
Owner: markfairbanks
License: other
Created: 2019-11-15T19:20:49.000Z (over 5 years ago)
Default Branch: main
Last Pushed: 2025-01-21T17:45:37.000Z (6 months ago)
Last Synced: 2025-03-30T20:00:40.603Z (3 months ago)
Language: R
Homepage: https://markfairbanks.github.io/tidytable/
Size: 74.5 MB
Stars: 460
Watchers: 13
Forks: 33
Open Issues: 13
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE

Awesome Lists containing this project

jimsghstars - markfairbanks/tidytable - Tidy interface to 'data.table' (R)

README

        ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%",

  warning = FALSE,

  message = FALSE

)

```

# tidytable   

[![CRAN status](https://www.r-pkg.org/badges/version/tidytable)](https://cran.r-project.org/package=tidytable)

![r-universe](https://fastverse.r-universe.dev/badges/tidytable)

[![downloads](http://cranlogs.r-pkg.org/badges/grand-total/tidytable?color=blue)](https://r-pkg.org/pkg/tidytable)

[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/last-month/tidytable?color=blue)](https://markfairbanks.github.io/tidytable/)

[![R-CMD-check](https://github.com/markfairbanks/tidytable/workflows/R-CMD-check/badge.svg)](https://github.com/markfairbanks/tidytable/actions)

`tidytable` is a data frame manipulation library for users who need [`data.table` speed](https://markfairbanks.github.io/tidytable/articles/speed_comparisons.html) but prefer `tidyverse`-like syntax.

## Installation

Install the released version from [CRAN](https://CRAN.R-project.org) with:

``` r

install.packages("tidytable")

```

Or install the development version from [GitHub](https://github.com/) with:

``` r

# install.packages("pak")

pak::pak("markfairbanks/tidytable")

```

## General syntax

`tidytable` replicates `tidyverse` syntax but uses `data.table` in the background. In general you can simply use `library(tidytable)` to replace your existing `dplyr` and `tidyr` code with `data.table` backed equivalents.

A full list of implemented functions can be found [here](https://markfairbanks.github.io/tidytable/reference/index.html).

```{r}

library(tidytable)

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%

  select(x, y, z) %>%

  filter(x < 4, y > 1) %>%

  arrange(x, y) %>%

  mutate(double_x = x * 2,

         x_plus_y = x + y)

```

## Applying functions by group

You can use the normal `tidyverse` `group_by()`/`ungroup()` workflow, or you can use `.by` syntax to reduce typing. Using `.by` in a function is shorthand for `df %>% group_by() %>% some_function() %>% ungroup()`.

* A single column can be passed with `.by = z`

* Multiple columns can be passed with `.by = c(y, z)`

```{r}

df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%

  summarize(avg_z = mean(z),

            .by = c(x, y))

```

All functions that can operate by group have a `.by` argument built in.

(`mutate()`, `filter()`, `summarize()`, etc.)

The above syntax is equivalent to:

```{r}

df %>%

  group_by(x, y) %>%

  summarize(avg_z = mean(z)) %>%

  ungroup()

```

Both options are available for users, so you can use the syntax that you prefer.

## tidyselect support

`tidytable` allows you to select/drop columns just like you would in the tidyverse by utilizing the [`tidyselect`](https://tidyselect.r-lib.org) package in the background.

Normal selection can be mixed with all `tidyselect` helpers: `everything()`, `starts_with()`, `ends_with()`, `any_of()`, `where()`, etc.

```{r}

df <- data.table(

  a = 1:3,

  b1 = 4:6,

  b2 = 7:9,

  c = c("a", "a", "b")

)

df %>%

  select(a, starts_with("b"))

```

A full overview of selection options can be found [here](https://tidyselect.r-lib.org/reference/language.html).

### Using tidyselect in `.by`

`tidyselect` helpers also work when using `.by`:

```{r}

df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%

  summarize(avg_z = mean(z),

            .by = where(is.character))

```

## Tidy evaluation compatibility

Tidy evaluation can be used to write custom functions with `tidytable` functions.

The embracing shortcut `{{ }}` works, or you can use `enquo()` with `!!` if you prefer:

```{r}

df <- data.table(x = c(1, 1, 1), y = 4:6, z = c("a", "a", "b"))

add_one <- function(data, add_col) {

  data %>%

    mutate(new_col = {{ add_col }} + 1)

}

df %>%

  add_one(x)

```

The `.data` and `.env` pronouns also work within `tidytable` functions:

```{r}

var <- 10

df %>%

  mutate(new_col = .data$x + .env$var)

```

A full overview of tidy evaluation can be found [here](https://rlang.r-lib.org/reference/topic-data-mask.html).

## `dt()` helper

The `dt()` function makes regular `data.table` syntax pipeable, so you can easily mix `tidytable` syntax with `data.table` syntax:

```{r}

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%

  dt(, .(x, y, z)) %>%

  dt(x < 4 & y > 1) %>%

  dt(order(x, y)) %>%

  dt(, double_x := x * 2) %>%

  dt(, .(avg_x = mean(x)), by = z)

```

## Speed Comparisons

For those interested in performance, speed comparisons can be found [here](https://markfairbanks.github.io/tidytable/articles/speed_comparisons.html).

## Acknowledgements

`tidytable` is only possible because of the great contributions to R by the `data.table` and `tidyverse` teams. `data.table` is used as the main data frame engine in the background, while `tidyverse` packages like `rlang`, `vctrs`, and `tidyselect` are heavily relied upon to give users an experience similar to `dplyr` and `tidyr`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/markfairbanks/tidytable

Awesome Lists containing this project

README