An open API service indexing awesome lists of open source software.

https://github.com/bgreenwell/rstratx

An R interface to the stratx Python library
https://github.com/bgreenwell/rstratx

Last synced: about 2 months ago
JSON representation

An R interface to the stratx Python library

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# rstratx

The **rstratx** provides an interface to [stratx](https://github.com/parrt/stratx), a Python library for [A Stratification Approach to Partial Dependence for Codependent Variables](https://arxiv.org/abs/1907.06698). Currently, only the StratPD algorithm is supported (which only applies to numeric features).

**WARNING:** This package is under heavy development. The underlying Python code needs cleaned up, and imports aren't really handled that gracefully on the R side. Use at your own risk.

## Installation

``` r
# You can install the development version from GitHub:
if (!("remotes" %in% installed.packages()[, "Package"])) {
install.packages("remotes")
}
remotes::install_github("bgreenwell/rstratx")
```

## Example

Here's a basic example using the well-known Boston housing data set:

```{r example, fig.width=6, fig.asp=0.618, out.width="70%"}
# Load required packages
library(pdp) # for ordinary partial dependence
library(ranger) # for random forest algorithm
library(reticulate) # for interfacing with Python
use_python("/Users/b780620/anaconda3/bin/python3", required = TRUE) # FIXME
library(rstratx) # for stratified partial dependence

# Load the Boston housing data
data(boston, package = "pdp")

#
# Ordinary partial dependence
#

# Fit a (default) random forest model and construct PDP for age
set.seed(1818) # for reproducibility
rfo <- ranger(cmedv ~ ., data = boston)
partial(rfo, pred.var = "age", plot = TRUE)

#
# Stratified partial dependence
#

# Compute stratified partial dependence for age (auto fits an RF)
spd <- stratpd(
X = subset(boston, select = -cmedv),
y = boston[, "cmedv", drop = FALSE], # needs a one-column data frame (for now)
feature_name = "age"
)

# Plot results
par(mar = c(4, 4, 1, 1) + 0.1)
plot(spd, type = "l", lwd = 2, las = 1, ylim = c(-10, 10))
```