Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ndphillips/FFTrees

An R package to create and visualise fast-and-frugal decision trees (FFTs)
https://github.com/ndphillips/FFTrees

Last synced: about 1 month ago
JSON representation

An R package to create and visualise fast-and-frugal decision trees (FFTs)

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r setup, include = FALSE}
# Chunk options:
knitr::opts_chunk$set(collapse = FALSE,
comment = "#>",
message = FALSE,
warning = FALSE,
# Default figure options:
fig.path = "man/figures/README-",
fig.align = 'center', # ignored
fig.width = 7.0,
fig.height = 6.0,
out.width = "650")

# URLs:
url_pkg_CRAN <- "https://CRAN.R-project.org/package=FFTrees"
url_pkg_GitHub <- "https://github.com/ndphillips/FFTrees"
url_pkg_issues <- "https://github.com/ndphillips/FFTrees/issues"
url_doc_GitHub <- "https://ndphillips.github.io/FFTrees/"
url_app_shiny <- "https://econpsychbasel.shinyapps.io/shinyfftrees/"

url_JDM_issue <- "https://journal.sjdm.org/vol12.4.html"
url_JDM_html <- "https://journal.sjdm.org/17/17217/jdm17217.html"
url_JDM_pdf <- "https://journal.sjdm.org/17/17217/jdm17217.pdf"

url_JDM_doi <- "https://doi.org/10.1017/S1930297500006239"
doi_JDM <- "10.1017/S1930297500006239"
```

# FFTrees `r packageVersion("FFTrees")` FFTrees


[![CRAN status](https://www.r-pkg.org/badges/version/FFTrees)](https://CRAN.R-project.org/package=FFTrees)
[![Downloads/month](https://cranlogs.r-pkg.org/badges/FFTrees?color='00a9e0')](https://www.r-pkg.org/pkg/FFTrees)
[![Total downloads](https://cranlogs.r-pkg.org/badges/grand-total/FFTrees?color='00a9e0')](https://www.r-pkg.org/pkg/FFTrees)
[![R-CMD-check](https://github.com/ndphillips/FFTrees/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ndphillips/FFTrees/actions/workflows/R-CMD-check.yaml)

The R package **FFTrees** creates, visualizes and evaluates _fast-and-frugal decision trees_ (FFTs) for solving binary classification tasks, using the algorithms and methods described in Phillips, Neth, Woike & Gaissmaier (2017, [`r doi_JDM`](`r url_JDM_doi`)).

## What are fast-and-frugal trees (FFTs)?

_Fast-and-frugal trees_\ (FFTs) are simple and transparent decision algorithms for solving binary classification problems.
The key feature making FFTs faster and more frugal than other decision trees is that every node allows making a decision.
When predicting novel cases, the performance of FFTs competes with more complex algorithms and machine learning techniques, such as logistic regression\ (LR), support-vector machines\ (SVM), and random forests\ (RF).
Apart from being faster and requiring less information, FFTs tend to be robust against overfitting, and are easy to interpret, use, and communicate.


## Installation

The latest release of **FFTrees** is available from [CRAN](https://CRAN.R-project.org/) at <`r url_pkg_CRAN`>:

```{r install-cran, eval = FALSE}
install.packages("FFTrees")
```

The current development version can be installed from its [GitHub](https://github.com) repository at <`r url_pkg_GitHub`>:

```{r install-gitub, eval = FALSE}
# install.packages("devtools")
devtools::install_github("ndphillips/FFTrees", build_vignettes = TRUE)
```

## Getting started

As an example, let's create a FFT predicting patients' heart disease status (_Healthy_ vs. _Disease_) based on the `heartdisease` dataset included in **FFTrees**:

```{r load-pkg, message = FALSE}
library(FFTrees) # load package
```

### Using data

The `heartdisease` data provides medical information for 303\ patients that were examined for heart disease.
The full data contains a binary criterion variable describing the true state of each patient and were split into two subsets:
A `heart.train` set for fitting decision trees, and `heart.test` set for a testing these trees.
Here are the first rows and columns of both subsets of the `heartdisease` data:

- `heart.train` (the training / fitting data) describes `r nrow(heart.train)` patients:

```{r data-train, echo = FALSE}
n_train <- nrow(heart.train)
c_train <- paste0("**Table 1**: Beginning of the `heart.train` subset (using the data of ", n_train, " patients for fitting/training FFTs).")

knitr::kable(head(heart.train), caption = c_train)
```

- `heart.test` (the testing / prediction data) describes `r nrow(heart.test)` different patients on the same variables:

```{r data-test, echo = FALSE}
n_test <- nrow(heart.test)
c_test <- paste0("**Table 2**: Beginning of the `heart.test` subset (used to predict `diagnosis` for ", n_test, " new patients).")

knitr::kable(head(heart.test), caption = c_test)
```

Our challenge is to predict each patient's `diagnosis` ---\ a column of logical values indicating the true state of each patient (i.e., `TRUE` or\ `FALSE`, based on the patient suffering or not suffering from heart disease)\ --- from the values of potential predictors.

### Questions answered by FFTs

To solve binary classification problems by FFTs, we must answer two key questions:

- Which of the variables should we use to predict the criterion?
- How should we use and combine predictor variables into FFTs?

Once we have created some FFTs, additional questions include:

- How accurate are the predictions of a specific FFT?
- How costly are the predictions of each algorithm?

The **FFTrees** package answers these questions by creating, evaluating, and visualizing FFTs.

### Creating fast-and-frugal trees (FFTs)

We use the main `FFTrees()` function to create FFTs for the `heart.train` data and evaluate their predictive performance on the `heart.test` data:

```{r set-seed, echo = FALSE}
set.seed(246) # for reproducible randomness
```

- The main `FFTrees()` function allows creating an `FFTrees` object for the `heartdisease` data:

```{r example-heart-create, collapse = FALSE, results = 'hide'}
# Create an FFTrees object from the heartdisease data:
heart_fft <- FFTrees(formula = diagnosis ~.,
data = heart.train,
data.test = heart.test,
decision.labels = c("Healthy", "Disease"))
```

```{r example-2-simple, echo = FALSE, eval = FALSE}
# Create FFTs for the heartdisease data:
x <- FFTrees(formula = diagnosis ~.,
data = heartdisease,
decision.labels = c("Healthy", "Disease"))
```

Evaluating `FFTrees()` analyzes the training data, creates several FFTs, and applies them to the test data.
The results are stored in an object `heart_fft`, which can be printed, plotted and summarized (with options for selecting specific data or trees).

```{r example-heart-print, echo = FALSE, eval = FALSE}
# Print:
heart_fft
```

- Let's plot our `FFTrees` object to visualize a tree and its predictive performance (on the `test` data):

```{r example-heart-plot, eval = TRUE}
# Plot the best tree applied to the test data:
plot(heart_fft,
data = "test",
main = "Heart Disease")
```

**Figure\ 1**: A fast-and-frugal tree (FFT) predicting heart disease for `test` data and its performance characteristics.

- A summary of the trees in our `FFTrees` object and their key performance statistics can be obtained by `summary(heart_fft)`.

### Building FFTs from verbal descriptions

FFTs are so simple that we even can create them 'from words' and then apply them to data.
For example, let's create a tree with the following three nodes and evaluate its performance on the `heart.test` data:

1. If `sex = 1`, predict _Disease_.
2. If `age < 45`, predict _Healthy_.
3. If `thal = {fd, normal}`, predict _Healthy_,
otherwise, predict _Disease_.

These conditions can directly be supplied to the `my.tree` argument of `FFTrees()`:

```{r example-heart-verbal, eval = TRUE}
# Create custom FFT 'in words' and apply it to test data:

# 1. Create my own FFT (from verbal description):
my_fft <- FFTrees(formula = diagnosis ~.,
data = heart.train,
data.test = heart.test,
decision.labels = c("Healthy", "Disease"),
my.tree = "If sex = 1, predict Disease.
If age < 45, predict Healthy.
If thal = {fd, normal}, predict Healthy,
Otherwise, predict Disease.")

# 2. Plot and evaluate my custom FFT (for test data):
plot(my_fft,
data = "test",
main = "My custom FFT")
```

**Figure\ 2**: An FFT predicting heart disease created from a verbal description.

The performance measures (in the bottom panel of **Figure\ 2**) show that this particular tree is somewhat biased:
It has nearly perfect _sensitivity_ (i.e., is good at identifying cases of _Disease_) but suffers from low _specificity_ (i.e., performs poorly in identifying _Healthy_ cases).
Expressed in terms of its errors, `my_fft` incurs few misses at the expense of many false alarms.
Although the _accuracy_ of our custom tree still exceeds the data's baseline by a fair amount, the FFTs in `heart_fft` (created above) strike a better balance.

Overall, what counts as the "best" tree for a particular problem depends on many factors (e.g., the goal of fitting vs. predicting data and the trade-offs between maximizing accuracy vs. incorporating the costs of cues or errors).
To explore this range of options, the **FFTrees** package enables us to design and evaluate a range of FFTs.

## Resources

The following versions of **FFTrees** and corresponding resources are available:

Type: | Version: | URL: |
:---------------------------|:----------------|:-------------------------------|
A. **FFTrees** (R package): | [Release version](`r url_pkg_CRAN`) | <`r url_pkg_CRAN`> |
  | [Development version](`r url_pkg_GitHub`) | <`r url_pkg_GitHub`> |
B. Other resources: | [Online documentation](`r url_doc_GitHub`) | <`r url_doc_GitHub`> |
  | [Online demo](`r url_app_shiny`) (running v1.3.3) | <`r url_app_shiny`> |

## References

We had fun creating the **FFTrees** package and hope you like it too!
As a comprehensive, yet accessible introduction to FFTs, we recommend our article in the journal _Judgment and Decision Making_ ([2017](`r url_JDM_doi`)), entitled _FFTrees: A toolbox to create, visualize,and evaluate fast-and-frugal decision trees_ (available in [html](`r url_JDM_html`) | [PDF](`r url_JDM_pdf`)\ ).

**Citation** (in APA format):

- Phillips, N. D., Neth, H., Woike, J. K. & Gaissmaier, W. (2017).
FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees.
_Judgment and Decision Making_, _12_ (4), 344–368.
doi\ [`r doi_JDM`](`r url_JDM_doi`)

We encourage you to read the article to learn more about the history of FFTs and how the **FFTrees** package creates, visualizes, and evaluates them.
When using **FFTrees** in your own work, please cite us and share your experiences (e.g., [on GitHub](`r url_pkg_issues`)) so we can continue developing the package.

By\ 2024, over 130\ scientific publications have used or cited **FFTrees** (see [Google Scholar](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=205528310591558601) for the full list).
Examples include:

- Lötsch, J., Haehner, A., & Hummel, T. (2020). Machine-learning-derived rules set excludes risk of Parkinson’s disease in patients with olfactory or gustatory symptoms with high accuracy.
_Journal of Neurology_, _267_(2), 469--478.
doi\ [10.1007/s00415-019-09604-6](https://doi.org/10.1007/s00415-019-09604-6)

- Kagan, R., Parlee, L., Beckett, B., Hayden, J. B., Gundle, K. R., & Doung, Y. C. (2020).
Radiographic parameter-driven decision tree reliably predicts aseptic mechanical failure of compressive osseointegration fixation.
_Acta Orthopaedica_, _91_(2), 171--176.
doi\ [10.1080/17453674.2020.1716295](https://doi.org/10.1080/17453674.2020.1716295)

- Klement, R. J., Sonke, J. J., Allgäuer, M., Andratschke, N., Appold, S., Belderbos, J., ... & Mantel, F. (2020).
Correlating dose variables with local tumor control in stereotactic body radiotherapy for early stage non-small cell lung cancer: A modeling study on 1500 individual treatments.
_International Journal of Radiation Oncology * Biology * Physics_.
doi\ [10.1016/j.ijrobp.2020.03.005](https://doi.org/10.1016/j.ijrobp.2020.03.005)

- Nobre, G. G., Hunink, J. E., Baruth, B., Aerts, J. C., & Ward, P. J. (2019).
Translating large-scale climate variability into crop production forecast in Europe.
_Scientific Reports_, _9_(1), 1--13.
doi\ [10.1038/s41598-018-38091-4](https://doi.org/10.1038/s41598-018-38091-4)

- Buchinsky, F. J., Valentino, W. L., Ruszkay, N., Powell, E., Derkay, C. S., Seedat, R. Y., ... & Mortelliti, A. J. (2019).
Age at diagnosis, but not HPV type, is strongly associated with clinical course in recurrent respiratory papillomatosis.
_PloS One_, _14_(6).
doi\ [10.1371/journal.pone.0216697](https://doi.org/10.1371/journal.pone.0216697)

----

[File `README.Rmd` last updated on `r Sys.Date()`.]