An open API service indexing awesome lists of open source software.

https://github.com/bonstats/eatmyshorts

Visualisation methods and functions for exploring Bayesian Additive Regression Trees
https://github.com/bonstats/eatmyshorts

Last synced: 8 months ago
JSON representation

Visualisation methods and functions for exploring Bayesian Additive Regression Trees

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = T,
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```

```{r print-comment, results='asis', echo = FALSE}
# included to warn users viewing README.md to edit THIS document.
cat("")
```

# `eatmyshorts`

Visualisation methods and functions for exploring Bayesian Additive Regression Trees (BART). See [BART on CRAN](https://CRAN.R-project.org/package=BART), in particular the vignette.

This package is currently under development. Slides from OzViz 2019 workshop on `eatmyshorts` are available [here](https://bonstats.github.io/ozviz2019/slides.html).

The first occurrence of the famous "eat my shorts" catch phrase was in S01E02 of The Simpsons. In this episode Edna Krabappel also says "visualise it BART" when Bart is having problems with a aptitude test. [This clip](https://youtu.be/6Jq_9ghf-jI) from the episode summarises how I sometimes feel about visualising BART models.

# Example usage of package

Below is a walk through of the packages current capabilities, and will be updated as the package develops.

## Boston Housing example (from `BART` vignette)

First we fit the BART model on two variables (for simplicity). Note that we are only returning 5 trees from the 1000 post-burnin samples, and only using 10 trees in total.

```{r boston, echo=T, results='hide'}

suppressPackageStartupMessages({
library(eatmyshorts)
library(dplyr)
library(ggplot2)
library(BART)
library(tidytree)
})

Boston <- MASS::Boston

X <- Boston[, c(6, 13)]
y <- Boston$medv

set.seed(99)
bart_model <- wbart(x.train = X,
y.train = y,
nskip = 1000,
ndpost = 1000,
nkeeptreedraws = 5, # keep only 5 mcmc iteration sum-of-trees
ntree = 10, # use only 10 trees in each iteration (default 200)
printevery = 1000
)

```

The current methods for extracting the trees from the `BART` models are summarise below.

```{r tree-extract}
# Current methods for extracting trees:

tidytree_simple <- as_tidytree(bart_model)

tidytree_simple

tidytree_detailed <- as_tidytree(
bart_model,
extra_cols = c("var", "cut", "leaf_value", "is_leaf")
)

tidytree_detailed

tidygraph_detailed <- as_tidygraph_list(
bart_model,
extra_cols = c("var", "cut", "leaf_value", "is_leaf"))

tidygraph_detailed[1:2]

```

The `tidytree` method returns a `tbl_tree` (special `tibble`) grouped by MCMC iteration and tree number, whilst the `tidygraph` method returns a list of `tbl_graph` which inherit from the `igraph` structure.

By using the `ggraph` and `tidygraph` package together, we can do some simple plotting.

```{r boston-plot, fig.height=5, fig.width=5, cache = F}

library(ggraph)

ggraph(tidygraph_detailed[[31]], 'dendrogram') +
geom_edge_elbow() +
geom_node_label(aes(label = label)) +
theme_graph()

```

This only plots 1 tree from 1 iteration of the BART MCMC. How do we extend it?