Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/chancejohnstone/piRF

implements multiple prediction interval methods for random forests
https://github.com/chancejohnstone/piRF

Last synced: 3 months ago
JSON representation

implements multiple prediction interval methods for random forests

Host: GitHub
URL: https://github.com/chancejohnstone/piRF
Owner: chancejohnstone
Created: 2020-02-04T00:36:19.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2022-08-17T18:08:10.000Z (over 2 years ago)
Last Synced: 2024-02-13T08:04:37.713Z (12 months ago)
Language: R
Homepage:
Size: 294 KB
Stars: 7
Watchers: 2
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.Rmd

Awesome Lists containing this project

awesome-conformal-prediction - piRF - Prediction Intervals for Random Forests

README

        ---

title: piRF - Prediction Intervals for Random Forests

output: github_document

author: Chancellor Johnstone and Haozhe Zhang

bibliography: REFERENCES.bib

nocite: | 

  @roy2019prediction, @ghosal2018boosting, @zhu2019hdi, @zhang2019random, @meinshausen2006quantile, @romano2019conformalized, @tung2014bias, @breiman2001random, @lopez2008neural

---



```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

```{r, echo = FALSE, results="hide", message=FALSE, warning=FALSE}

library(badger)

```

```{r, echo = FALSE, prompt=FALSE, results='asis', comment=""}

cat(

  "[![](https://www.r-pkg.org/badges/version/piRF?color=orange)](https://cran.r-project.org/package=piRF)",

  "[![](http://cranlogs.r-pkg.org/badges/last-month/piRF?color=blue)](https://cran.r-project.org/package=piRF)"

  #badge_cran_release("piRF", "orange"),

  #badge_cran_download("piRF", "grand-total", "blue")

  

)

```

## Introduction

The goal of *piRF* is to implement multiple state-of-the art random forest prediction interval methodologies in one complete package. Currently, the methods implemented can only be utilized within isolated packages, or the authors have not made a package publicly available. The package itself utilizes the functionality provided by the *ranger* package. If you utilize this package in any publications, please use the following citation:

Johnstone C, Zhang H (2020). piRF: Prediction Intervals for Random Forests. R package version 0.1.0,

.

A BibTeX entry for LaTeX users is

@Manual{,

  title = {piRF: Prediction Intervals for Random Forests},

  author = {Chancellor Johnstone and Haozhe Zhang},

  year = {2020},

  note = {R package version 0.1.0},

  url = {https://CRAN.R-project.org/package=piRF},

}

## Installation

You can install the released version of *piRF* from [CRAN](https://CRAN.R-project.org) with:

``` r

install.packages("piRF")

```

And the development version from [GitHub](https://github.com/) with:

``` r

# install.packages("devtools")

devtools::install_github("chancejohnstone/piRF")

```

## Example

This is a basic example which utilizes the *airfoil* dataset included with *piRF*. The dataset comes from [UCI Archive](https://archive.ics.uci.edu/ml/datasets/Airfoil+Self-Noise#). The NASA data set comprises different size NACA 0012 airfoils at various wind tunnel speeds and angles of attack.

The following functions are not exported by *piRF* but are used for this example.

```{r example}

library(piRF)

## basic example code

data(airfoil)

head(airfoil)

#functions to get average length and average coverage of output

getPILength <- function(x){

#average PI length across each set of predictions

l <- x[,2] - x[,1]

avg_l <- mean(l)

return(avg_l)

}

getCoverage <- function(x, response){

  #output coverage for test data

  coverage <- sum((response >= x[,1]) * (response <= x[,2]))/length(response)

  return(coverage)

}

```

Prediction intervals are generated for each of the methods implemented using train and test data sets constructed from the *airfoil* data.

```{r}

method_vec <- c("quantile", "oob", "bcqrf", "cqrf", "bop", "hdi", "brf")

#generate train and test data

set.seed(2020)

ratio <- .975

nrow <- nrow(airfoil)

n <- floor(nrow*ratio)

samp <- sample(1:nrow, size = n)

train <- airfoil[samp,]

test <- airfoil[-samp,]

#generate fitted models

res <- pirf(pressure ~ . , train_data = train,

            method = method_vec,

            concise= FALSE,

            num_threads = 2,

            alpha = .1)

#generate prediction intervals from fitted models

preds <- predict(res, test_data = test)

```

In this example, the *num_threads* option identifies the use of two cores for parallel processing. The default is to use all available cores. The *concise* option allows for the output of predictions for the test observations.

Below are the coverage rates and average prediction interval lengths using the test dataset. Both are important characteristics of prediction intervals.

```{r}

#empirical coverage, and average prediction interval length for each method

coverage <- sapply(preds$int, FUN = getCoverage, response = test$pressure)

coverage

length <- sapply(preds$int, FUN = getPILength)

length

```

Below are plots of the resulting prediction intervals generated for each method.

```{r}

#plotting intervals and predictions

par(mfrow = c(2,2))

for(i in 1:length(method_vec)){

  #color based on empirical coverage

  col <- ((test$pressure >= preds$int[[i]][,1]) * (test$pressure <= preds$int[[i]][,2])-1)*(-1)+1

   

  plot(x = preds$preds[[i]], y = test$pressure, pch = 20,

      col = "black", ylab = "true", xlab = "predicted", main = method_vec[i])

  abline(a = 0, b = 1)

  segments(x0 = preds$int[[i]][,1], x1 = preds$int[[i]][,2],

           y1 = test$pressure, y0 = test$pressure, lwd = 1, col = col)

}

```

If you find any issues with the package, or have suggestions for improvements, please let us know. 

## References