https://github.com/markfairbanks/tidytable
Tidy interface to 'data.table'
https://github.com/markfairbanks/tidytable
Last synced: about 21 hours ago
JSON representation
Tidy interface to 'data.table'
- Host: GitHub
- URL: https://github.com/markfairbanks/tidytable
- Owner: markfairbanks
- License: other
- Created: 2019-11-15T19:20:49.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2025-01-21T17:45:37.000Z (3 months ago)
- Last Synced: 2025-03-30T20:00:40.603Z (15 days ago)
- Language: R
- Homepage: https://markfairbanks.github.io/tidytable/
- Size: 74.5 MB
- Stars: 460
- Watchers: 13
- Forks: 33
- Open Issues: 13
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - markfairbanks/tidytable - Tidy interface to 'data.table' (R)
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
warning = FALSE,
message = FALSE
)
```# tidytable
![]()
[](https://cran.r-project.org/package=tidytable)

[](https://r-pkg.org/pkg/tidytable)
[](https://markfairbanks.github.io/tidytable/)
[](https://github.com/markfairbanks/tidytable/actions)`tidytable` is a data frame manipulation library for users who need [`data.table` speed](https://markfairbanks.github.io/tidytable/articles/speed_comparisons.html) but prefer `tidyverse`-like syntax.
## Installation
Install the released version from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("tidytable")
```Or install the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("pak")
pak::pak("markfairbanks/tidytable")
```## General syntax
`tidytable` replicates `tidyverse` syntax but uses `data.table` in the background. In general you can simply use `library(tidytable)` to replace your existing `dplyr` and `tidyr` code with `data.table` backed equivalents.
A full list of implemented functions can be found [here](https://markfairbanks.github.io/tidytable/reference/index.html).
```{r}
library(tidytable)df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df %>%
select(x, y, z) %>%
filter(x < 4, y > 1) %>%
arrange(x, y) %>%
mutate(double_x = x * 2,
x_plus_y = x + y)
```## Applying functions by group
You can use the normal `tidyverse` `group_by()`/`ungroup()` workflow, or you can use `.by` syntax to reduce typing. Using `.by` in a function is shorthand for `df %>% group_by() %>% some_function() %>% ungroup()`.
* A single column can be passed with `.by = z`
* Multiple columns can be passed with `.by = c(y, z)````{r}
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)df %>%
summarize(avg_z = mean(z),
.by = c(x, y))
```All functions that can operate by group have a `.by` argument built in.
(`mutate()`, `filter()`, `summarize()`, etc.)The above syntax is equivalent to:
```{r}
df %>%
group_by(x, y) %>%
summarize(avg_z = mean(z)) %>%
ungroup()
```Both options are available for users, so you can use the syntax that you prefer.
## tidyselect support
`tidytable` allows you to select/drop columns just like you would in the tidyverse by utilizing the [`tidyselect`](https://tidyselect.r-lib.org) package in the background.
Normal selection can be mixed with all `tidyselect` helpers: `everything()`, `starts_with()`, `ends_with()`, `any_of()`, `where()`, etc.
```{r}
df <- data.table(
a = 1:3,
b1 = 4:6,
b2 = 7:9,
c = c("a", "a", "b")
)df %>%
select(a, starts_with("b"))
```A full overview of selection options can be found [here](https://tidyselect.r-lib.org/reference/language.html).
### Using tidyselect in `.by`
`tidyselect` helpers also work when using `.by`:
```{r}
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)df %>%
summarize(avg_z = mean(z),
.by = where(is.character))
```## Tidy evaluation compatibility
Tidy evaluation can be used to write custom functions with `tidytable` functions.
The embracing shortcut `{{ }}` works, or you can use `enquo()` with `!!` if you prefer:```{r}
df <- data.table(x = c(1, 1, 1), y = 4:6, z = c("a", "a", "b"))add_one <- function(data, add_col) {
data %>%
mutate(new_col = {{ add_col }} + 1)
}df %>%
add_one(x)
```The `.data` and `.env` pronouns also work within `tidytable` functions:
```{r}
var <- 10df %>%
mutate(new_col = .data$x + .env$var)
```A full overview of tidy evaluation can be found [here](https://rlang.r-lib.org/reference/topic-data-mask.html).
## `dt()` helper
The `dt()` function makes regular `data.table` syntax pipeable, so you can easily mix `tidytable` syntax with `data.table` syntax:
```{r}
df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))df %>%
dt(, .(x, y, z)) %>%
dt(x < 4 & y > 1) %>%
dt(order(x, y)) %>%
dt(, double_x := x * 2) %>%
dt(, .(avg_x = mean(x)), by = z)
```## Speed Comparisons
For those interested in performance, speed comparisons can be found [here](https://markfairbanks.github.io/tidytable/articles/speed_comparisons.html).
## Acknowledgements
`tidytable` is only possible because of the great contributions to R by the `data.table` and `tidyverse` teams. `data.table` is used as the main data frame engine in the background, while `tidyverse` packages like `rlang`, `vctrs`, and `tidyselect` are heavily relied upon to give users an experience similar to `dplyr` and `tidyr`.