https://github.com/extendr/mdl
An opinionated and performant reimagining of model matrices using rust
https://github.com/extendr/mdl
extendr machine-learning rstats rust
Last synced: about 1 year ago
JSON representation
An opinionated and performant reimagining of model matrices using rust
- Host: GitHub
- URL: https://github.com/extendr/mdl
- Owner: extendr
- License: other
- Created: 2024-08-25T19:07:36.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-04-15T20:57:49.000Z (about 1 year ago)
- Last Synced: 2025-05-12T12:18:46.697Z (about 1 year ago)
- Topics: extendr, machine-learning, rstats, rust
- Language: R
- Homepage: https://extendr.github.io/mdl/
- Size: 1.97 MB
- Stars: 6
- Watchers: 3
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Support: .github/SUPPORT.md
Awesome Lists containing this project
README
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# mdl
[](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[](https://CRAN.R-project.org/package=mdl)
mdl implements an opinionated and performant reimagining of model matrices. The package supplies one function, `mdl::mtrx()` (read: "model matrix"), that takes in a formula and data frame and outputs a numeric matrix. Compared to its base R friend `model.matrix()`, it's _really_ fast.
**This package is highly experimental. Interpret results with caution!**
## Installation
You can install the development version of mdl like so:
``` r
# install.packages("mdl")
pak::pak("simonpcouch/mdl")
```
## Example
The output of `mdl::mtrx()` looks a lot like that from `model.matrix()`:
```{r}
# convert to factor to demonstrate dummy variable creations
mtcars$cyl <- as.factor(mtcars$cyl)
head(
mdl::mtrx(mpg ~ ., mtcars)
)
```
Compared to `model.matrix()`, `mdl::mtrx()` is sort of a glorified `as.matrix()` data frame method. More specifically:
* Does not accept formulae with inlined functions (like `-` or `*`).
* Never drops rows (and thus doesn't accept an `na.action`).
* Assumes that factors levels are encoded as they're intended (i.e. `drop.unused.levels` and `xlev` are not accepted).
It's quite a bit faster for smaller data sets:
```{r}
bench::mark(
mdl::mtrx(mpg ~ ., mtcars),
model.matrix(mpg ~ ., mtcars),
check = FALSE
)
```
The factor of speedup isn't so drastic for larger datasets and datasets with more factors, but it is still quite substantial:
```{r}
for (p in c("vs", "am", "gear", "carb")) {
mtcars[[p]] <- as.factor(mtcars[[p]])
}
bench::mark(
mdl::mtrx(mpg ~ ., mtcars[rep(1:32, 1e5), ]),
model.matrix(mpg ~ ., mtcars[rep(1:32, 1e5), ]),
check = FALSE
)
```
Check out [this article](https://github.com/simonpcouch/mdl/blob/main/vignettes/articles/plain-r.Rmd) for more detailed benchmarks.