https://github.com/bonstats/eatmyshorts
Visualisation methods and functions for exploring Bayesian Additive Regression Trees
https://github.com/bonstats/eatmyshorts
Last synced: 8 months ago
JSON representation
Visualisation methods and functions for exploring Bayesian Additive Regression Trees
- Host: GitHub
- URL: https://github.com/bonstats/eatmyshorts
- Owner: bonStats
- License: gpl-3.0
- Created: 2019-11-19T05:02:20.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-05-29T00:09:33.000Z (about 6 years ago)
- Last Synced: 2025-09-09T18:31:38.110Z (9 months ago)
- Language: R
- Size: 2.48 MB
- Stars: 7
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = T,
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```
```{r print-comment, results='asis', echo = FALSE}
# included to warn users viewing README.md to edit THIS document.
cat("")
```
# `eatmyshorts`
Visualisation methods and functions for exploring Bayesian Additive Regression Trees (BART). See [BART on CRAN](https://CRAN.R-project.org/package=BART), in particular the vignette.
This package is currently under development. Slides from OzViz 2019 workshop on `eatmyshorts` are available [here](https://bonstats.github.io/ozviz2019/slides.html).
The first occurrence of the famous "eat my shorts" catch phrase was in S01E02 of The Simpsons. In this episode Edna Krabappel also says "visualise it BART" when Bart is having problems with a aptitude test. [This clip](https://youtu.be/6Jq_9ghf-jI) from the episode summarises how I sometimes feel about visualising BART models.
# Example usage of package
Below is a walk through of the packages current capabilities, and will be updated as the package develops.
## Boston Housing example (from `BART` vignette)
First we fit the BART model on two variables (for simplicity). Note that we are only returning 5 trees from the 1000 post-burnin samples, and only using 10 trees in total.
```{r boston, echo=T, results='hide'}
suppressPackageStartupMessages({
library(eatmyshorts)
library(dplyr)
library(ggplot2)
library(BART)
library(tidytree)
})
Boston <- MASS::Boston
X <- Boston[, c(6, 13)]
y <- Boston$medv
set.seed(99)
bart_model <- wbart(x.train = X,
y.train = y,
nskip = 1000,
ndpost = 1000,
nkeeptreedraws = 5, # keep only 5 mcmc iteration sum-of-trees
ntree = 10, # use only 10 trees in each iteration (default 200)
printevery = 1000
)
```
The current methods for extracting the trees from the `BART` models are summarise below.
```{r tree-extract}
# Current methods for extracting trees:
tidytree_simple <- as_tidytree(bart_model)
tidytree_simple
tidytree_detailed <- as_tidytree(
bart_model,
extra_cols = c("var", "cut", "leaf_value", "is_leaf")
)
tidytree_detailed
tidygraph_detailed <- as_tidygraph_list(
bart_model,
extra_cols = c("var", "cut", "leaf_value", "is_leaf"))
tidygraph_detailed[1:2]
```
The `tidytree` method returns a `tbl_tree` (special `tibble`) grouped by MCMC iteration and tree number, whilst the `tidygraph` method returns a list of `tbl_graph` which inherit from the `igraph` structure.
By using the `ggraph` and `tidygraph` package together, we can do some simple plotting.
```{r boston-plot, fig.height=5, fig.width=5, cache = F}
library(ggraph)
ggraph(tidygraph_detailed[[31]], 'dendrogram') +
geom_edge_elbow() +
geom_node_label(aes(label = label)) +
theme_graph()
```
This only plots 1 tree from 1 iteration of the BART MCMC. How do we extend it?