Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/drsimonj/twidlr
data.frame-based API for model and predict functions
https://github.com/drsimonj/twidlr
Last synced: 4 days ago
JSON representation
data.frame-based API for model and predict functions
- Host: GitHub
- URL: https://github.com/drsimonj/twidlr
- Owner: drsimonj
- License: other
- Created: 2017-04-22T05:53:44.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2017-06-06T05:54:40.000Z (over 7 years ago)
- Last Synced: 2024-08-06T03:05:14.335Z (3 months ago)
- Language: R
- Homepage:
- Size: 246 KB
- Stars: 59
- Watchers: 6
- Forks: 9
- Open Issues: 9
-
Metadata Files:
- Readme: README.Rmd
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```# twidlr: consistent data.frame and formula API for models
## Overview
twidlr is an R package that exposes a consistent API for model functions and their corresponding predict methods such that they are specified as:
```{r, eval = F}
fit <- model(data, formula, ...)
predict(fit, data, ...)
```Where "data" is a **required** data.frame (or able to be coerced to one) and "formula" is a formula (or string able to be coerced to one) that describes the model to be fitted.
twidlr gets its name from the "twiddle" used in R formulas.
## Installation
twidlr is available to install from github by running:
```{r, eval = F}
# install.packages("devtools")
devtools::install_github("drsimonj/twidlr")
```## Usage
`library(twidlr)` exposes model functions that you're already familiar with, but such that they accept a data.frame first, formula second, and then additional arguments. A robust method to `predict` data is also exposed.
For example, a typical linear model would be `lm(hp ~ mpg * wt, mtcars, ...)`. Once `twidlr` is loaded, the same model would be run via `lm(mtcars, hp ~ mpg * wt, ...)`.
## Motivation
Modelling in R is messy! Some models take formulas and data frames while others require matrices and vectors. The same can be said of corresponding `predict()` methods, which can also be impure, returning unexpected or inconsistent results.
twidlr seeks to overcome these problems be providing:
- **Consistent API** for model functions and their corresponding `predict` methods (helping to improve the generality of tidy modelling packages like [piplearner](https://github.com/drsimonj/pipelearner))
- **Pure and available predictions** by way of `predict` being made available for all methods (including unsupervised algorithms like kmeans) and making "data" a required argument
- **[Tidyverse](http://tidyverse.org/) philosophy** by working with data frames and being pipeable such as `mtcars %>% lm(hp ~ wt)`
- **Leverage formula operators** where they may be valid but not originally available. For example, to specify select variables or include additional terms like interactions and dummy-coded variables with syntax such as `glmnet(iris, Sepal.Width ~ Petal.Width * Petal.Length + Species)`. Formulas created as strings can always be used too!## twidlr models
Model functions exposed by twidlr:
```{r, echo = F}
x <- data.frame(rbind(
c(Package = "stats", Function = "lm"),
## Add new model functions here as c("Package", "Function") ------ >
c("xgboost", "xgboost"),
c("glmnet", "glmnet"),
c("stats", "glm"),
c("rpart", "rpart"),
c("randomForest", "randomForest"),
c("lme4", "lmer"),
c("lme4", "glmer"),
c("quantreg","rq"),
c("quantreg","nlrq"),
c("quantreg","rqss"),
c("quantreg","crq"),
c("stats", "kmeans"),
c("stats", "t.test (now 'ttest')"),
c("stats", "prcomp"),
c("stats", "aov"),
c("glmnet", "cv.glmnet"),
c("stats", "factanal"),
c("e1071", "svm"),
c("e1071", "naiveBayes"),
c("gamlss","gamlss")
## < ---------------------------------------------------------------
))x <- x[order(x$Package, x$Function), ]
x <- tapply(x$Function, x$Package, paste, collapse = ", ")
x <- data.frame(Package = names(x), Functions = x, row.names = NULL)
knitr::kable(x[order(x$Package, x$Function), ], row.names = FALSE)
```## Contributing
For conventions and best-practices when contributing to twidlr, please see [CONTRIBUTING.md](https://github.com/drsimonj/twidlr/blob/master/CONTRIBUTING.md)