Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tidyverse/dbplyr

Database (DBI) backend for dplyr
https://github.com/tidyverse/dbplyr

database r

Last synced: about 1 month ago
JSON representation

Database (DBI) backend for dplyr

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```

# dbplyr

[![CRAN status](https://www.r-pkg.org/badges/version/dbplyr)](https://cran.r-project.org/package=dbplyr)
[![R-CMD-check](https://github.com/tidyverse/dbplyr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/dbplyr/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/tidyverse/dbplyr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/dbplyr?branch=main)

## Overview

dbplyr is the database backend for [dplyr](https://dplyr.tidyverse.org). It allows you to use remote database tables as if they are in-memory data frames by automatically converting dplyr code into SQL.

To learn more about why you might use dbplyr instead of writing SQL, see `vignette("sql")`. To learn more about the details of the SQL translation, see `vignette("translation-verb")` and `vignette("translation-function")`.

## Installation

```{r, eval = FALSE}
# The easiest way to get dbplyr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just dbplyr:
install.packages("dbplyr")

# Or the development version from GitHub:
# install.packages("pak")
pak::pak("tidyverse/dbplyr")
```

## Usage

dbplyr is designed to work with database tables as if they were local data frames. To demonstrate this I'll first create an in-memory SQLite database and copy over a dataset:

```{r setup}
library(dplyr, warn.conflicts = FALSE)

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, mtcars)
```

Note that you don't actually need to load dbplyr with `library(dbplyr)`; dplyr automatically loads it for you when it sees you working with a database. Database connections are coordinated by the DBI package. Learn more at

Now you can retrieve a table using `tbl()` (see `?tbl_dbi` for more details). Printing it just retrieves the first few rows:

```{r}
mtcars2 <- tbl(con, "mtcars")
mtcars2
```

All dplyr calls are evaluated lazily, generating SQL that is only sent to the database when you request the data.

```{r}
# lazily generates query
summary <- mtcars2 %>%
group_by(cyl) %>%
summarise(mpg = mean(mpg, na.rm = TRUE)) %>%
arrange(desc(mpg))

# see query
summary %>% show_query()

# execute query and retrieve results
summary %>% collect()
```

## Code of Conduct

Please note that the dbplyr project is released with a [Contributor Code of Conduct](https://dbplyr.tidyverse.org/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.