Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tidyverse/dtplyr
Data table backend for dplyr
https://github.com/tidyverse/dtplyr
datatable dplyr r
Last synced: about 15 hours ago
JSON representation
Data table backend for dplyr
- Host: GitHub
- URL: https://github.com/tidyverse/dtplyr
- Owner: tidyverse
- License: other
- Created: 2016-03-07T23:28:16.000Z (almost 9 years ago)
- Default Branch: main
- Last Pushed: 2024-12-26T19:04:27.000Z (16 days ago)
- Last Synced: 2025-01-03T20:00:07.747Z (8 days ago)
- Topics: datatable, dplyr, r
- Language: R
- Homepage: https://dtplyr.tidyverse.org
- Size: 9.79 MB
- Stars: 670
- Watchers: 31
- Forks: 57
- Open Issues: 33
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Support: .github/SUPPORT.md
Awesome Lists containing this project
- jimsghstars - tidyverse/dtplyr - Data table backend for dplyr (R)
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```[![CRAN status](https://www.r-pkg.org/badges/version/dtplyr)](https://cran.r-project.org/package=dtplyr)
[![R-CMD-check](https://github.com/tidyverse/dtplyr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/dtplyr/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/tidyverse/dtplyr/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/dtplyr)## Overview
dtplyr provides a [data.table](http://r-datatable.com/) backend for dplyr. The goal of dtplyr is to allow you to write dplyr code that is automatically translated to the equivalent, but usually much faster, data.table code.
See `vignette("translation")` for details of the current translations, and [table.express](https://github.com/asardaes/table.express) and [rqdatatable](https://github.com/WinVector/rqdatatable/) for related work.
## Installation
You can install from CRAN with:
```R
install.packages("dtplyr")
```Or try the development version from GitHub with:
```R
# install.packages("pak")
pak::pak("tidyverse/dtplyr")
```## Usage
To use dtplyr, you must at least load dtplyr and dplyr. You may also want to load [data.table](http://r-datatable.com/) so you can access the other goodies that it provides:
```{r setup}
library(data.table)
library(dtplyr)
library(dplyr, warn.conflicts = FALSE)
```Then use `lazy_dt()` to create a "lazy" data table that tracks the operations performed on it.
```{r}
mtcars2 <- lazy_dt(mtcars)
```You can preview the transformation (including the generated data.table code) by printing the result:
```{r}
mtcars2 %>%
filter(wt < 5) %>%
mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
group_by(cyl) %>%
summarise(l100k = mean(l100k))
```But generally you should reserve this only for debugging, and use `as.data.table()`, `as.data.frame()`, or `as_tibble()` to indicate that you're done with the transformation and want to access the results:
```{r}
mtcars2 %>%
filter(wt < 5) %>%
mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
group_by(cyl) %>%
summarise(l100k = mean(l100k)) %>%
as_tibble()
```## Why is dtplyr slower than data.table?
There are two primary reasons that dtplyr will always be somewhat slower than data.table:
* Each dplyr verb must do some work to convert dplyr syntax to data.table
syntax. This takes time proportional to the complexity of the input code,
not the input _data_, so should be a negligible overhead for large datasets.
[Initial benchmarks][benchmark] suggest that the overhead should be under
1ms per dplyr call.* To match dplyr semantics, `mutate()` does not modify in place by default.
This means that most expressions involving `mutate()` must make a copy
that would not be necessary if you were using data.table directly.
(You can opt out of this behaviour in `lazy_dt()` with `immutable = FALSE`).[benchmark]: https://dtplyr.tidyverse.org/articles/translation.html#performance
## Code of Conduct
Please note that the dtplyr project is released with a [Contributor Code of Conduct](https://dtplyr.tidyverse.org/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.