Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/drsimonj/twidlr

data.frame-based API for model and predict functions
https://github.com/drsimonj/twidlr

Last synced: about 2 months ago
JSON representation

data.frame-based API for model and predict functions

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```

# twidlr: consistent data.frame and formula API for models

## Overview

twidlr is an R package that exposes a consistent API for model functions and their corresponding predict methods such that they are specified as:

```{r, eval = F}
fit <- model(data, formula, ...)
predict(fit, data, ...)
```

Where "data" is a **required** data.frame (or able to be coerced to one) and "formula" is a formula (or string able to be coerced to one) that describes the model to be fitted.

twidlr gets its name from the "twiddle" used in R formulas.

## Installation

twidlr is available to install from github by running:

```{r, eval = F}
# install.packages("devtools")
devtools::install_github("drsimonj/twidlr")
```

## Usage

`library(twidlr)` exposes model functions that you're already familiar with, but such that they accept a data.frame first, formula second, and then additional arguments. A robust method to `predict` data is also exposed.

For example, a typical linear model would be `lm(hp ~ mpg * wt, mtcars, ...)`. Once `twidlr` is loaded, the same model would be run via `lm(mtcars, hp ~ mpg * wt, ...)`.

## Motivation

Modelling in R is messy! Some models take formulas and data frames while others require matrices and vectors. The same can be said of corresponding `predict()` methods, which can also be impure, returning unexpected or inconsistent results.

twidlr seeks to overcome these problems be providing:

- **Consistent API** for model functions and their corresponding `predict` methods (helping to improve the generality of tidy modelling packages like [piplearner](https://github.com/drsimonj/pipelearner))
- **Pure and available predictions** by way of `predict` being made available for all methods (including unsupervised algorithms like kmeans) and making "data" a required argument
- **[Tidyverse](http://tidyverse.org/) philosophy** by working with data frames and being pipeable such as `mtcars %>% lm(hp ~ wt)`
- **Leverage formula operators** where they may be valid but not originally available. For example, to specify select variables or include additional terms like interactions and dummy-coded variables with syntax such as `glmnet(iris, Sepal.Width ~ Petal.Width * Petal.Length + Species)`. Formulas created as strings can always be used too!

## twidlr models

Model functions exposed by twidlr:

```{r, echo = F}
x <- data.frame(rbind(
c(Package = "stats", Function = "lm"),

## Add new model functions here as c("Package", "Function") ------ >

c("xgboost", "xgboost"),
c("glmnet", "glmnet"),
c("stats", "glm"),
c("rpart", "rpart"),
c("randomForest", "randomForest"),
c("lme4", "lmer"),
c("lme4", "glmer"),
c("quantreg","rq"),
c("quantreg","nlrq"),
c("quantreg","rqss"),
c("quantreg","crq"),
c("stats", "kmeans"),
c("stats", "t.test (now 'ttest')"),
c("stats", "prcomp"),
c("stats", "aov"),
c("glmnet", "cv.glmnet"),
c("stats", "factanal"),
c("e1071", "svm"),
c("e1071", "naiveBayes"),
c("gamlss","gamlss")

## < ---------------------------------------------------------------
))

x <- x[order(x$Package, x$Function), ]
x <- tapply(x$Function, x$Package, paste, collapse = ", ")
x <- data.frame(Package = names(x), Functions = x, row.names = NULL)
knitr::kable(x[order(x$Package, x$Function), ], row.names = FALSE)
```

## Contributing

For conventions and best-practices when contributing to twidlr, please see [CONTRIBUTING.md](https://github.com/drsimonj/twidlr/blob/master/CONTRIBUTING.md)