https://github.com/tidyverse/dtplyr

Data table backend for dplyr
https://github.com/tidyverse/dtplyr

datatable dplyr r

Last synced: about 2 months ago
JSON representation

Data table backend for dplyr

Host: GitHub
URL: https://github.com/tidyverse/dtplyr
Owner: tidyverse
License: other
Created: 2016-03-07T23:28:16.000Z (over 9 years ago)
Default Branch: main
Last Pushed: 2025-01-24T17:43:43.000Z (6 months ago)
Last Synced: 2025-04-28T11:52:52.094Z (3 months ago)
Topics: datatable, dplyr, r
Language: R
Homepage: https://dtplyr.tidyverse.org
Size: 10.5 MB
Stars: 672
Watchers: 30
Forks: 58
Open Issues: 33
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Support: .github/SUPPORT.md

Awesome Lists containing this project

jimsghstars - tidyverse/dtplyr - Data table backend for dplyr (R)

README

        ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# dtplyr 

[![CRAN status](https://www.r-pkg.org/badges/version/dtplyr)](https://cran.r-project.org/package=dtplyr)

[![R-CMD-check](https://github.com/tidyverse/dtplyr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/dtplyr/actions/workflows/R-CMD-check.yaml)

[![Codecov test coverage](https://codecov.io/gh/tidyverse/dtplyr/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/dtplyr)

## Overview

dtplyr provides a [data.table](http://r-datatable.com/) backend for dplyr. The goal of dtplyr is to allow you to write dplyr code that is automatically translated to the equivalent, but usually much faster, data.table code.

See `vignette("translation")` for details of the current translations, and  [table.express](https://github.com/asardaes/table.express) and [rqdatatable](https://github.com/WinVector/rqdatatable/) for related work.

## Installation

You can install from CRAN with:

```R

install.packages("dtplyr")

```

Or try the development version from GitHub with:

```R

# install.packages("pak")

pak::pak("tidyverse/dtplyr")

```

## Usage

To use dtplyr, you must at least load dtplyr and dplyr. You may also want to load [data.table](http://r-datatable.com/) so you can access the other goodies that it provides:

```{r setup}

library(data.table)

library(dtplyr)

library(dplyr, warn.conflicts = FALSE)

```

Then use `lazy_dt()` to create a "lazy" data table that tracks the operations performed on it.

```{r}

mtcars2 <- lazy_dt(mtcars)

```

You can preview the transformation (including the generated data.table code) by printing the result:

```{r}

mtcars2 %>%

  filter(wt < 5) %>%

  mutate(l100k = 235.21 / mpg) %>% # liters / 100 km

  group_by(cyl) %>%

  summarise(l100k = mean(l100k))

```

But generally you should reserve this only for debugging, and use `as.data.table()`, `as.data.frame()`, or `as_tibble()` to indicate that you're done with the transformation and want to access the results:

```{r}

mtcars2 %>%

  filter(wt < 5) %>%

  mutate(l100k = 235.21 / mpg) %>% # liters / 100 km

  group_by(cyl) %>%

  summarise(l100k = mean(l100k)) %>%

  as_tibble()

```

## Why is dtplyr slower than data.table?

There are two primary reasons that dtplyr will always be somewhat slower than data.table:

* Each dplyr verb must do some work to convert dplyr syntax to data.table

  syntax. This takes time proportional to the complexity of the input code,

  not the input _data_, so should be a negligible overhead for large datasets.

  [Initial benchmarks][benchmark] suggest that the overhead should be under

  1ms per dplyr call.

* To match dplyr semantics, `mutate()` does not modify in place by default.

  This means that most expressions involving `mutate()` must make a copy

  that would not be necessary if you were using data.table directly.

  (You can opt out of this behaviour in `lazy_dt()` with `immutable = FALSE`).

[benchmark]: https://dtplyr.tidyverse.org/articles/translation.html#performance

## Code of Conduct

Please note that the dtplyr project is released with a [Contributor Code of Conduct](https://dtplyr.tidyverse.org/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tidyverse/dtplyr

Awesome Lists containing this project

README