Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tidymodels/tidypredict

Run predictions inside the database
https://github.com/tidymodels/tidypredict

dbplyr dplyr purrr r rlang

Last synced: about 2 months ago
JSON representation

Run predictions inside the database

Awesome Lists containing this project

README

        

---
output: github_document
---

# tidypredict

[![R-CMD-check](https://github.com/tidymodels/tidypredict/workflows/R-CMD-check/badge.svg)](https://github.com/tidymodels/tidypredict/actions)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/tidypredict)](https://CRAN.r-project.org/package=tidypredict)
[![Codecov test coverage](https://codecov.io/gh/tidymodels/tidypredict/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/tidypredict?branch=main)

```{r pre, include = FALSE}
if (!rlang::is_installed("randomForest")) {
knitr::opts_chunk$set(
eval = FALSE
)
}
```

```{r setup, include=FALSE}
library(dplyr)
library(tidypredict)
library(randomForest)
```

The main goal of `tidypredict` is to enable running predictions inside databases. It reads the model, extracts the components needed to calculate the prediction, and then creates an R formula that can be translated into SQL. In other words, it is able to parse a model such as this one:

```{r}
model <- lm(mpg ~ wt + cyl, data = mtcars)
```

`tidypredict` can return a SQL statement that is ready to run inside the database. Because it uses `dplyr`'s database interface, it works with several databases back-ends, such as MS SQL:

```{r}
tidypredict_sql(model, dbplyr::simulate_mssql())
```

## Installation

Install `tidypredict` from CRAN using:
```{r, eval = FALSE}
install.packages("tidypredict")
```

Or install the **development version** using `devtools` as follows:
```{r, eval = FALSE}
install.packages("remotes")
remotes::install_github("tidymodels/tidypredict")
```

## Functions

`tidypredict` has only a few functions, and it is not expected that number to grow much. The main focus at this time is to add more models to support.

| Function | Description
|-----------------------------|--------------------------------------------------------------------------------|
|`tidypredict_fit()` | Returns an R formula that calculates the prediction |
|`tidypredict_sql()` | Returns a SQL query based on the formula from `tidypredict_fit()` |
|`tidypredict_to_column()` | Adds a new column using the formula from `tidypredict_fit()` |
|`tidypredict_test()` | Tests `tidyverse` predictions against the model's native `predict()` function |
|`tidypredict_interval()` | Same as `tidypredict_fit()` but for intervals (only works with `lm` and `glm`) |
|`tidypredict_sql_interval()` | Same as `tidypredict_sql()` but for intervals (only works with `lm` and `glm`) |
|`parse_model()` | Creates a list spec based on the R model |
|`as_parsed_model()` | Prepares an object to be recognized as a parsed model |

## How it works

Instead of translating directly to a SQL statement, `tidypredict` creates an R formula. That formula can then be used inside `dplyr`. The overall workflow would be as illustrated in the image above, and described here:

1. Fit the model using a base R model, or one from the packages listed in [Supported Models](#supported-models)
1. `tidypredict` reads model, and creates a list object with the necessary components to run predictions
1. `tidypredict` builds an R formula based on the list object
1. `dplyr` evaluates the formula created by `tidypredict`
1. `dplyr` translates the formula into a SQL statement, or any other interfaces.
1. The database executes the SQL statement(s) created by `dplyr`

### Parsed model spec

`tidypredict` writes and reads a spec based on a model. Instead of simply writing the R formula directly, splitting the spec from the formula adds the following capabilities:

1. No more saving models as `.rds` - Specifically for cases when the model needs to be used for predictions in a Shiny app.
1. Beyond R models - Technically, anything that can write a proper spec, can be read into `tidypredict`. It also means, that the parsed model spec can become a good alternative to using *PMML.*

## Supported models

The following models are supported by `tidypredict`:

- Linear Regression - `lm()`
- Generalized Linear model - `glm()`
- Random Forest models - `randomForest::randomForest()`
- Random Forest models, via `ranger` - `ranger::ranger()`
- MARS models - `earth::earth()`
- XGBoost models - `xgboost::xgb.Booster.complete()`
- Cubist models - `Cubist::cubist()`
- Tree models, via `partykit` - `partykit::ctree()`

### `parsnip`

`tidypredict` supports models fitted via the `parsnip` interface. The ones confirmed currently work in `tidypredict` are:

- `lm()` - `parsnip`: `linear_reg()` with *"lm"* as the engine.
- `randomForest::randomForest()` - `parsnip`: `rand_forest()` with *"randomForest"* as the engine.
- `ranger::ranger()` - `parsnip`: `rand_forest()` with *"ranger"* as the engine.
- `earth::earth()` - `parsnip`: `mars()` with *"earth"* as the engine.

### `broom`

The `tidy()` function from broom works with linear models parsed via `tidypredict`

```{r}
pm <- parse_model(lm(wt ~ ., mtcars))
tidy(pm)
```

## Contributing

This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.

- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on RStudio Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).

- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/tidypredict/issues).

- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.

- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).