https://github.com/bgreenwell/rstratx
An R interface to the stratx Python library
https://github.com/bgreenwell/rstratx
Last synced: about 2 months ago
JSON representation
An R interface to the stratx Python library
- Host: GitHub
- URL: https://github.com/bgreenwell/rstratx
- Owner: bgreenwell
- Created: 2019-11-04T21:14:07.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-11-05T18:56:55.000Z (over 5 years ago)
- Last Synced: 2025-02-02T00:48:27.914Z (4 months ago)
- Language: Python
- Homepage:
- Size: 194 KB
- Stars: 3
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# rstratxThe **rstratx** provides an interface to [stratx](https://github.com/parrt/stratx), a Python library for [A Stratification Approach to Partial Dependence for Codependent Variables](https://arxiv.org/abs/1907.06698). Currently, only the StratPD algorithm is supported (which only applies to numeric features).
**WARNING:** This package is under heavy development. The underlying Python code needs cleaned up, and imports aren't really handled that gracefully on the R side. Use at your own risk.
## Installation
``` r
# You can install the development version from GitHub:
if (!("remotes" %in% installed.packages()[, "Package"])) {
install.packages("remotes")
}
remotes::install_github("bgreenwell/rstratx")
```## Example
Here's a basic example using the well-known Boston housing data set:
```{r example, fig.width=6, fig.asp=0.618, out.width="70%"}
# Load required packages
library(pdp) # for ordinary partial dependence
library(ranger) # for random forest algorithm
library(reticulate) # for interfacing with Python
use_python("/Users/b780620/anaconda3/bin/python3", required = TRUE) # FIXME
library(rstratx) # for stratified partial dependence# Load the Boston housing data
data(boston, package = "pdp")#
# Ordinary partial dependence
## Fit a (default) random forest model and construct PDP for age
set.seed(1818) # for reproducibility
rfo <- ranger(cmedv ~ ., data = boston)
partial(rfo, pred.var = "age", plot = TRUE)#
# Stratified partial dependence
## Compute stratified partial dependence for age (auto fits an RF)
spd <- stratpd(
X = subset(boston, select = -cmedv),
y = boston[, "cmedv", drop = FALSE], # needs a one-column data frame (for now)
feature_name = "age"
)# Plot results
par(mar = c(4, 4, 1, 1) + 0.1)
plot(spd, type = "l", lwd = 2, las = 1, ylim = c(-10, 10))
```