Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ndphillips/FFTrees
An R package to create and visualise fast-and-frugal decision trees (FFTs)
https://github.com/ndphillips/FFTrees
Last synced: 3 days ago
JSON representation
An R package to create and visualise fast-and-frugal decision trees (FFTs)
- Host: GitHub
- URL: https://github.com/ndphillips/FFTrees
- Owner: ndphillips
- Created: 2016-07-12T06:16:40.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2024-05-22T08:22:26.000Z (6 months ago)
- Last Synced: 2024-05-22T13:53:02.196Z (6 months ago)
- Language: R
- Homepage: https://journal.sjdm.org/17/17217/jdm17217.pdf
- Size: 22.7 MB
- Stars: 135
- Watchers: 9
- Forks: 23
- Open Issues: 9
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
README
---
output: github_document
---```{r setup, include = FALSE}
# Chunk options:
knitr::opts_chunk$set(collapse = FALSE,
comment = "#>",
message = FALSE,
warning = FALSE,
# Default figure options:
fig.path = "man/figures/README-",
fig.align = 'center', # ignored
fig.width = 7.0,
fig.height = 6.0,
out.width = "650")# URLs:
url_pkg_CRAN <- "https://CRAN.R-project.org/package=FFTrees"
url_pkg_GitHub <- "https://github.com/ndphillips/FFTrees"
url_pkg_issues <- "https://github.com/ndphillips/FFTrees/issues"
url_doc_GitHub <- "https://ndphillips.github.io/FFTrees/"
url_app_shiny <- "https://econpsychbasel.shinyapps.io/shinyfftrees/"url_JDM_issue <- "https://journal.sjdm.org/vol12.4.html"
url_JDM_html <- "https://journal.sjdm.org/17/17217/jdm17217.html"
url_JDM_pdf <- "https://journal.sjdm.org/17/17217/jdm17217.pdf"url_JDM_doi <- "https://doi.org/10.1017/S1930297500006239"
doi_JDM <- "10.1017/S1930297500006239"
```# FFTrees `r packageVersion("FFTrees")`
[![CRAN status](https://www.r-pkg.org/badges/version/FFTrees)](https://CRAN.R-project.org/package=FFTrees)
[![Downloads/month](https://cranlogs.r-pkg.org/badges/FFTrees?color='00a9e0')](https://www.r-pkg.org/pkg/FFTrees)
[![Total downloads](https://cranlogs.r-pkg.org/badges/grand-total/FFTrees?color='00a9e0')](https://www.r-pkg.org/pkg/FFTrees)
[![R-CMD-check](https://github.com/ndphillips/FFTrees/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ndphillips/FFTrees/actions/workflows/R-CMD-check.yaml)
The R package **FFTrees** creates, visualizes and evaluates _fast-and-frugal decision trees_ (FFTs) for solving binary classification tasks, using the algorithms and methods described in Phillips, Neth, Woike & Gaissmaier (2017, [`r doi_JDM`](`r url_JDM_doi`)).
## What are fast-and-frugal trees (FFTs)?
_Fast-and-frugal trees_\ (FFTs) are simple and transparent decision algorithms for solving binary classification problems.
The key feature making FFTs faster and more frugal than other decision trees is that every node allows making a decision.
When predicting novel cases, the performance of FFTs competes with more complex algorithms and machine learning techniques, such as logistic regression\ (LR), support-vector machines\ (SVM), and random forests\ (RF).
Apart from being faster and requiring less information, FFTs tend to be robust against overfitting, and are easy to interpret, use, and communicate.
## Installation
The latest release of **FFTrees** is available from [CRAN](https://CRAN.R-project.org/) at <`r url_pkg_CRAN`>:
```{r install-cran, eval = FALSE}
install.packages("FFTrees")
```The current development version can be installed from its [GitHub](https://github.com) repository at <`r url_pkg_GitHub`>:
```{r install-gitub, eval = FALSE}
# install.packages("devtools")
devtools::install_github("ndphillips/FFTrees", build_vignettes = TRUE)
```## Getting started
As an example, let's create a FFT predicting patients' heart disease status (_Healthy_ vs. _Disease_) based on the `heartdisease` dataset included in **FFTrees**:
```{r load-pkg, message = FALSE}
library(FFTrees) # load package
```### Using data
The `heartdisease` data provides medical information for 303\ patients that were examined for heart disease.
The full data contains a binary criterion variable describing the true state of each patient and were split into two subsets:
A `heart.train` set for fitting decision trees, and `heart.test` set for a testing these trees.
Here are the first rows and columns of both subsets of the `heartdisease` data:- `heart.train` (the training / fitting data) describes `r nrow(heart.train)` patients:
```{r data-train, echo = FALSE}
n_train <- nrow(heart.train)
c_train <- paste0("**Table 1**: Beginning of the `heart.train` subset (using the data of ", n_train, " patients for fitting/training FFTs).")knitr::kable(head(heart.train), caption = c_train)
```- `heart.test` (the testing / prediction data) describes `r nrow(heart.test)` different patients on the same variables:
```{r data-test, echo = FALSE}
n_test <- nrow(heart.test)
c_test <- paste0("**Table 2**: Beginning of the `heart.test` subset (used to predict `diagnosis` for ", n_test, " new patients).")knitr::kable(head(heart.test), caption = c_test)
```Our challenge is to predict each patient's `diagnosis` ---\ a column of logical values indicating the true state of each patient (i.e., `TRUE` or\ `FALSE`, based on the patient suffering or not suffering from heart disease)\ --- from the values of potential predictors.
### Questions answered by FFTs
To solve binary classification problems by FFTs, we must answer two key questions:
- Which of the variables should we use to predict the criterion?
- How should we use and combine predictor variables into FFTs?Once we have created some FFTs, additional questions include:
- How accurate are the predictions of a specific FFT?
- How costly are the predictions of each algorithm?The **FFTrees** package answers these questions by creating, evaluating, and visualizing FFTs.
### Creating fast-and-frugal trees (FFTs)
We use the main `FFTrees()` function to create FFTs for the `heart.train` data and evaluate their predictive performance on the `heart.test` data:
```{r set-seed, echo = FALSE}
set.seed(246) # for reproducible randomness
```- The main `FFTrees()` function allows creating an `FFTrees` object for the `heartdisease` data:
```{r example-heart-create, collapse = FALSE, results = 'hide'}
# Create an FFTrees object from the heartdisease data:
heart_fft <- FFTrees(formula = diagnosis ~.,
data = heart.train,
data.test = heart.test,
decision.labels = c("Healthy", "Disease"))
``````{r example-2-simple, echo = FALSE, eval = FALSE}
# Create FFTs for the heartdisease data:
x <- FFTrees(formula = diagnosis ~.,
data = heartdisease,
decision.labels = c("Healthy", "Disease"))
```Evaluating `FFTrees()` analyzes the training data, creates several FFTs, and applies them to the test data.
The results are stored in an object `heart_fft`, which can be printed, plotted and summarized (with options for selecting specific data or trees).```{r example-heart-print, echo = FALSE, eval = FALSE}
# Print:
heart_fft
```- Let's plot our `FFTrees` object to visualize a tree and its predictive performance (on the `test` data):
```{r example-heart-plot, eval = TRUE}
# Plot the best tree applied to the test data:
plot(heart_fft,
data = "test",
main = "Heart Disease")
```**Figure\ 1**: A fast-and-frugal tree (FFT) predicting heart disease for `test` data and its performance characteristics.
- A summary of the trees in our `FFTrees` object and their key performance statistics can be obtained by `summary(heart_fft)`.
### Building FFTs from verbal descriptions
FFTs are so simple that we even can create them 'from words' and then apply them to data.
For example, let's create a tree with the following three nodes and evaluate its performance on the `heart.test` data:1. If `sex = 1`, predict _Disease_.
2. If `age < 45`, predict _Healthy_.
3. If `thal = {fd, normal}`, predict _Healthy_,
otherwise, predict _Disease_.These conditions can directly be supplied to the `my.tree` argument of `FFTrees()`:
```{r example-heart-verbal, eval = TRUE}
# Create custom FFT 'in words' and apply it to test data:# 1. Create my own FFT (from verbal description):
my_fft <- FFTrees(formula = diagnosis ~.,
data = heart.train,
data.test = heart.test,
decision.labels = c("Healthy", "Disease"),
my.tree = "If sex = 1, predict Disease.
If age < 45, predict Healthy.
If thal = {fd, normal}, predict Healthy,
Otherwise, predict Disease.")# 2. Plot and evaluate my custom FFT (for test data):
plot(my_fft,
data = "test",
main = "My custom FFT")
```**Figure\ 2**: An FFT predicting heart disease created from a verbal description.
The performance measures (in the bottom panel of **Figure\ 2**) show that this particular tree is somewhat biased:
It has nearly perfect _sensitivity_ (i.e., is good at identifying cases of _Disease_) but suffers from low _specificity_ (i.e., performs poorly in identifying _Healthy_ cases).
Expressed in terms of its errors, `my_fft` incurs few misses at the expense of many false alarms.
Although the _accuracy_ of our custom tree still exceeds the data's baseline by a fair amount, the FFTs in `heart_fft` (created above) strike a better balance.Overall, what counts as the "best" tree for a particular problem depends on many factors (e.g., the goal of fitting vs. predicting data and the trade-offs between maximizing accuracy vs. incorporating the costs of cues or errors).
To explore this range of options, the **FFTrees** package enables us to design and evaluate a range of FFTs.## Resources
The following versions of **FFTrees** and corresponding resources are available:
Type: | Version: | URL: |
:---------------------------|:----------------|:-------------------------------|
A. **FFTrees** (R package): | [Release version](`r url_pkg_CRAN`) | <`r url_pkg_CRAN`> |
| [Development version](`r url_pkg_GitHub`) | <`r url_pkg_GitHub`> |
B. Other resources: | [Online documentation](`r url_doc_GitHub`) | <`r url_doc_GitHub`> |
| [Online demo](`r url_app_shiny`) (running v1.3.3) | <`r url_app_shiny`> |## References
We had fun creating the **FFTrees** package and hope you like it too!
As a comprehensive, yet accessible introduction to FFTs, we recommend our article in the journal _Judgment and Decision Making_ ([2017](`r url_JDM_doi`)), entitled _FFTrees: A toolbox to create, visualize,and evaluate fast-and-frugal decision trees_ (available in [html](`r url_JDM_html`) | [PDF](`r url_JDM_pdf`)\ ).**Citation** (in APA format):
- Phillips, N. D., Neth, H., Woike, J. K. & Gaissmaier, W. (2017).
FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees.
_Judgment and Decision Making_, _12_ (4), 344–368.
doi\ [`r doi_JDM`](`r url_JDM_doi`)We encourage you to read the article to learn more about the history of FFTs and how the **FFTrees** package creates, visualizes, and evaluates them.
When using **FFTrees** in your own work, please cite us and share your experiences (e.g., [on GitHub](`r url_pkg_issues`)) so we can continue developing the package.By\ 2024, over 130\ scientific publications have used or cited **FFTrees** (see [Google Scholar](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=205528310591558601) for the full list).
Examples include:- Lötsch, J., Haehner, A., & Hummel, T. (2020). Machine-learning-derived rules set excludes risk of Parkinson’s disease in patients with olfactory or gustatory symptoms with high accuracy.
_Journal of Neurology_, _267_(2), 469--478.
doi\ [10.1007/s00415-019-09604-6](https://doi.org/10.1007/s00415-019-09604-6)- Kagan, R., Parlee, L., Beckett, B., Hayden, J. B., Gundle, K. R., & Doung, Y. C. (2020).
Radiographic parameter-driven decision tree reliably predicts aseptic mechanical failure of compressive osseointegration fixation.
_Acta Orthopaedica_, _91_(2), 171--176.
doi\ [10.1080/17453674.2020.1716295](https://doi.org/10.1080/17453674.2020.1716295)- Klement, R. J., Sonke, J. J., Allgäuer, M., Andratschke, N., Appold, S., Belderbos, J., ... & Mantel, F. (2020).
Correlating dose variables with local tumor control in stereotactic body radiotherapy for early stage non-small cell lung cancer: A modeling study on 1500 individual treatments.
_International Journal of Radiation Oncology * Biology * Physics_.
doi\ [10.1016/j.ijrobp.2020.03.005](https://doi.org/10.1016/j.ijrobp.2020.03.005)- Nobre, G. G., Hunink, J. E., Baruth, B., Aerts, J. C., & Ward, P. J. (2019).
Translating large-scale climate variability into crop production forecast in Europe.
_Scientific Reports_, _9_(1), 1--13.
doi\ [10.1038/s41598-018-38091-4](https://doi.org/10.1038/s41598-018-38091-4)- Buchinsky, F. J., Valentino, W. L., Ruszkay, N., Powell, E., Derkay, C. S., Seedat, R. Y., ... & Mortelliti, A. J. (2019).
Age at diagnosis, but not HPV type, is strongly associated with clinical course in recurrent respiratory papillomatosis.
_PloS One_, _14_(6).
doi\ [10.1371/journal.pone.0216697](https://doi.org/10.1371/journal.pone.0216697)----
[File `README.Rmd` last updated on `r Sys.Date()`.]