Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tidymodels/tidyposterior
Bayesian comparisons of models using resampled statistics
https://github.com/tidymodels/tidyposterior
Last synced: 4 days ago
JSON representation
Bayesian comparisons of models using resampled statistics
- Host: GitHub
- URL: https://github.com/tidymodels/tidyposterior
- Owner: tidymodels
- License: other
- Created: 2017-10-15T17:39:33.000Z (about 7 years ago)
- Default Branch: main
- Last Pushed: 2024-10-17T15:29:44.000Z (2 months ago)
- Last Synced: 2024-12-11T00:01:34.407Z (11 days ago)
- Language: R
- Homepage: https://tidyposterior.tidymodels.org
- Size: 32.3 MB
- Stars: 103
- Watchers: 8
- Forks: 10
- Open Issues: 6
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```[![R-CMD-check](https://github.com/tidymodels/tidyposterior/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidymodels/tidyposterior/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/tidymodels/tidyposterior/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/tidyposterior?branch=main)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/tidyposterior)](https://CRAN.r-project.org/package=tidyposterior)
[![Downloads](http://cranlogs.r-pkg.org/badges/tidyposterior)](https://CRAN.r-project.org/package=tidyposterior)
![](https://img.shields.io/badge/lifecycle-maturing-blue.svg)This package can be used to conduct _post hoc_ analyses of resampling results generated by models.
For example, if two models are evaluated with the root mean squared error (RMSE) using 10-fold cross-validation, there are 10 paired statistics. These can be used to make comparisons between models without involving a test set.
There is a rich literature on the analysis of model resampling results such as McLachlan's [_Discriminant Analysis and Statistical Pattern Recognition_](https://books.google.com/books?id=O_qHDLaWpDUC&lpg=PR7&ots=6GJnIREXZM&dq=%22Discriminant%20Analysis%20and%20Statistical%20Pattern%20Recognition%22&lr&pg=PR7#v=onepage&q=%22Discriminant%20Analysis%20and%20Statistical%20Pattern%20Recognition%22&f=false) and the references therein. This package follows _the spirit_ of [Benavoli _et al_ (2017)](https://people.idsia.ch//~marco/papers/2017jmlr-tests.pdf).
tidyposterior uses Bayesian generalized linear models for this purpose and can be considered an upgraded version of the [`caret::resamples()`](https://topepo.github.io/caret/model-training-and-tuning.html#exploring-and-comparing-resampling-distributions) function. The package works with [rsample](https://rsample.tidymodels.org/) objects natively but any results in a data frame can be used.
See [Chapter 11](https://www.tmwr.org/compare.html) of [_Tidy Models with R_](https://www.tmwr.org) for examples and more details.
## Installation
You can install the released version of tidyposterior from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("tidyposterior")
```And the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("pak")
pak::pak("tidymodels/tidyposterior")
```
## ExampleTo illustrate, here are some example objects using 10-fold cross-validation for a simple two-class problem:
```{r setup, results = "hide"}
library(tidymodels)
library(tidyposterior)data(two_class_dat, package = "modeldata")
set.seed(100)
folds <- vfold_cv(two_class_dat)
```We can define two different models (for simplicity, with no tuning parameters).
```{r model-specs}
logistic_reg_glm_spec <-
logistic_reg() %>%
set_engine('glm')mars_earth_spec <-
mars(prod_degree = 1) %>%
set_engine('earth') %>%
set_mode('classification')
```For tidymodels, the [tune::fit_resamples()] function can be used to estimate performance for each model/resample:
```{r tm-resamples}
rs_ctrl <- control_resamples(save_workflow = TRUE)logistic_reg_glm_res <-
logistic_reg_glm_spec %>%
fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl)mars_earth_res <-
mars_earth_spec %>%
fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl)
```From these, there are several ways to pass the results to the `perf_mod()` function. The most general approach is to have a data frame with the resampling labels (i.e., one or more id columns) as well as columns for each model that you would like to compare.
For the model results above, [tune::collect_metrics()] can be used along with some basic data manipulation steps:
```{r df-results}
logistic_roc <-
collect_metrics(logistic_reg_glm_res, summarize = FALSE) %>%
dplyr::filter(.metric == "roc_auc") %>%
dplyr::select(id, logistic = .estimate)mars_roc <-
collect_metrics(mars_earth_res, summarize = FALSE) %>%
dplyr::filter(.metric == "roc_auc") %>%
dplyr::select(id, mars = .estimate)resamples_df <- full_join(logistic_roc, mars_roc, by = "id")
resamples_df
```We can then give this directly to `perf_mod()`:
```{r df-mod}
set.seed(101)
roc_model_via_df <- perf_mod(resamples_df, iter = 2000)
```From this, the posterior distributions for each model can be obtained from the `tidy()` method:
```{r post}
#| fig-alt: "Faceted histogram chart. Area Under the ROC Curve along the x-axis, count along the y-axis. The two facets are logistic and mars. Both histogram looks fairly normally distributed, with a mean of 0.89 for logistic and 0.88 for mars. The full range is 0.84 to 0.93."
roc_model_via_df %>%
tidy() %>%
ggplot(aes(x = posterior)) +
geom_histogram(bins = 40, col = "blue", fill = "blue", alpha = .4) +
facet_wrap(~ model, ncol = 1) +
xlab("Area Under the ROC Curve")
```See `contrast_models()` for how to analyze these distributions
## Contributing
This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.
- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on RStudio Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).
- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/tidyposterior/issues).
- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.
- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).