An open API service indexing awesome lists of open source software.

https://github.com/extendr/mdl

An opinionated and performant reimagining of model matrices using rust
https://github.com/extendr/mdl

extendr machine-learning rstats rust

Last synced: about 1 year ago
JSON representation

An opinionated and performant reimagining of model matrices using rust

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# mdl

[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![CRAN status](https://www.r-pkg.org/badges/version/mdl)](https://CRAN.R-project.org/package=mdl)

mdl implements an opinionated and performant reimagining of model matrices. The package supplies one function, `mdl::mtrx()` (read: "model matrix"), that takes in a formula and data frame and outputs a numeric matrix. Compared to its base R friend `model.matrix()`, it's _really_ fast.

**This package is highly experimental. Interpret results with caution!**

## Installation

You can install the development version of mdl like so:

``` r
# install.packages("mdl")
pak::pak("simonpcouch/mdl")
```

## Example

The output of `mdl::mtrx()` looks a lot like that from `model.matrix()`:

```{r}
# convert to factor to demonstrate dummy variable creations
mtcars$cyl <- as.factor(mtcars$cyl)

head(
mdl::mtrx(mpg ~ ., mtcars)
)
```

Compared to `model.matrix()`, `mdl::mtrx()` is sort of a glorified `as.matrix()` data frame method. More specifically:

* Does not accept formulae with inlined functions (like `-` or `*`).
* Never drops rows (and thus doesn't accept an `na.action`).
* Assumes that factors levels are encoded as they're intended (i.e. `drop.unused.levels` and `xlev` are not accepted).

It's quite a bit faster for smaller data sets:

```{r}
bench::mark(
mdl::mtrx(mpg ~ ., mtcars),
model.matrix(mpg ~ ., mtcars),
check = FALSE
)
```

The factor of speedup isn't so drastic for larger datasets and datasets with more factors, but it is still quite substantial:

```{r}
for (p in c("vs", "am", "gear", "carb")) {
mtcars[[p]] <- as.factor(mtcars[[p]])
}

bench::mark(
mdl::mtrx(mpg ~ ., mtcars[rep(1:32, 1e5), ]),
model.matrix(mpg ~ ., mtcars[rep(1:32, 1e5), ]),
check = FALSE
)
```

Check out [this article](https://github.com/simonpcouch/mdl/blob/main/vignettes/articles/plain-r.Rmd) for more detailed benchmarks.