Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tidymodels/modeldb

Run models inside a database using R
https://github.com/tidymodels/modeldb

database dbplyr dplyr ggplot2 modeling rlang sql tidyeval visualization

Last synced: about 2 months ago
JSON representation

Run models inside a database using R

Awesome Lists containing this project

README

        

---
output: github_document
---

# modeldb

```{r setup, include=FALSE}
library(dplyr)
library(modeldb)
```

[![R-CMD-check](https://github.com/tidymodels/modeldb/workflows/R-CMD-check/badge.svg)](https://github.com/tidymodels/modeldb/actions)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/modeldb)](https://CRAN.R-project.org/package=modeldb)
[![Codecov test coverage](https://codecov.io/gh/tidymodels/modeldb/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/modeldb?branch=main)
[![Downloads](http://cranlogs.r-pkg.org/badges/modeldb)](https://CRAN.R-project.org/package=modeldb)

Fit models inside the database! **modeldb works with most database back-ends** because it leverages [dplyr](https://dplyr.tidyverse.org/) and [dbplyr](https://dbplyr.tidyverse.org/) for the final SQL translation of the algorithm. It currently supports:

- K-means clustering

- Linear regression

## Installation

Install the CRAN version with:

```{r, eval = FALSE}
install.packages("modeldb")
```

The development version is available from GitHub using remotes:

```{r, eval = FALSE}
# install.packages("remotes")
remotes::install_github("tidymodels/modeldb")
```

## Linear regression

An easy way to try out the package is by creating a temporary SQLite database, and loading `mtcars` to it.

```{r}
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
RSQLite::initExtension(con)
dplyr::copy_to(con, mtcars)
```

```{r}
library(dplyr)

tbl(con, "mtcars") %>%
select(wt, mpg, qsec) %>%
linear_regression_db(wt)
```

The model output can be parsed by [tidypredict](https://tidypredict.tidymodels.org/) to run the predictions in the database. Please see the "Linear Regression" article to learn more about how to use `linear_regression_db()`

## K Means clustering

To use the `simple_kmeans_db()` function, simply pipe the database back end table to the function. This returns a list object that contains two items:

- A sql query table with the final center assignment
- A local table with the information about the centers

```{r}
km <- tbl(con, "mtcars") %>%
simple_kmeans_db(mpg, wt)

colnames(km)
```

The SQL statement from `tbl` can be extracted using dbplyr's `remote_query()`

```{r}
dbplyr::remote_query(km)
```

## Contributing

This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.

- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on Posit Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).

- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/modeldb/issues).

- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code. Check out [this helpful article on how to create reprexes](https://dbplyr.tidyverse.org/articles/reprex.html) for problems involving a database.

- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).