Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/drsimonj/twidlr

data.frame-based API for model and predict functions
https://github.com/drsimonj/twidlr

Last synced: 3 months ago
JSON representation

data.frame-based API for model and predict functions

Host: GitHub
URL: https://github.com/drsimonj/twidlr
Owner: drsimonj
License: other
Created: 2017-04-22T05:53:44.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2017-06-06T05:54:40.000Z (over 7 years ago)
Last Synced: 2024-08-06T03:05:14.335Z (7 months ago)
Language: R
Homepage:
Size: 246 KB
Stars: 59
Watchers: 6
Forks: 9
Open Issues: 9
Metadata Files:
- Readme: README.Rmd
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

README

        ---

output: github_document

---

```{r, echo = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "README-"

)

```

# twidlr: consistent data.frame and formula API for models 

## Overview

twidlr is an R package that exposes a consistent API for model functions and their corresponding predict methods such that they are specified as:

```{r, eval = F}

fit <- model(data, formula, ...)

predict(fit, data, ...)

```

Where "data" is a **required** data.frame (or able to be coerced to one) and "formula" is a formula (or string able to be coerced to one) that describes the model to be fitted.

twidlr gets its name from the "twiddle" used in R formulas.

## Installation

twidlr is available to install from github by running:

```{r, eval = F}

# install.packages("devtools")

devtools::install_github("drsimonj/twidlr")

```

## Usage

`library(twidlr)` exposes model functions that you're already familiar with, but such that they accept a data.frame first, formula second, and then additional arguments. A robust method to `predict` data is also exposed.

For example, a typical linear model would be `lm(hp ~ mpg * wt, mtcars, ...)`. Once `twidlr` is loaded, the same model would be run via `lm(mtcars, hp ~ mpg * wt, ...)`.

## Motivation

Modelling in R is messy! Some models take formulas and data frames while others require matrices and vectors. The same can be said of corresponding `predict()` methods, which can also be impure, returning unexpected or inconsistent results.

twidlr seeks to overcome these problems be providing:

- **Consistent API** for model functions and their corresponding `predict` methods (helping to improve the generality of tidy modelling packages like [piplearner](https://github.com/drsimonj/pipelearner))

- **Pure and available predictions** by way of `predict` being made available for all methods (including unsupervised algorithms like kmeans) and making "data" a required argument

- **[Tidyverse](http://tidyverse.org/) philosophy** by working with data frames and being pipeable such as `mtcars %>% lm(hp ~ wt)`

- **Leverage formula operators** where they may be valid but not originally available. For example, to specify select variables or include additional terms like interactions and dummy-coded variables with syntax such as `glmnet(iris, Sepal.Width ~ Petal.Width * Petal.Length + Species)`. Formulas created as strings can always be used too!

## twidlr models

Model functions exposed by twidlr:

```{r, echo = F}

x <- data.frame(rbind(

  c(Package = "stats", Function = "lm"),

  

  ## Add new model functions here as c("Package", "Function") ------ >

  

  c("xgboost", "xgboost"),

  c("glmnet", "glmnet"),

  c("stats", "glm"),

  c("rpart", "rpart"),

  c("randomForest", "randomForest"),

  c("lme4", "lmer"),

  c("lme4", "glmer"),

  c("quantreg","rq"),

  c("quantreg","nlrq"),

  c("quantreg","rqss"),

  c("quantreg","crq"),

  c("stats", "kmeans"),

  c("stats", "t.test (now 'ttest')"),

  c("stats", "prcomp"),

  c("stats", "aov"),

  c("glmnet", "cv.glmnet"),

  c("stats", "factanal"),

  c("e1071", "svm"),

  c("e1071", "naiveBayes"),

  c("gamlss","gamlss")

  

  ## < ---------------------------------------------------------------

))

x <- x[order(x$Package, x$Function), ]

x <- tapply(x$Function, x$Package, paste, collapse = ", ")

x <- data.frame(Package = names(x), Functions = x, row.names = NULL)

knitr::kable(x[order(x$Package, x$Function), ], row.names = FALSE)

```

## Contributing

For conventions and best-practices when contributing to twidlr, please see [CONTRIBUTING.md](https://github.com/drsimonj/twidlr/blob/master/CONTRIBUTING.md)