https://github.com/norskregnesentral/shapr
Explaining the output of machine learning models with more accurately estimated Shapley values
https://github.com/norskregnesentral/shapr
explainable-ai explainable-ml rcpp rcpparmadillo rstats shapley
Last synced: 5 months ago
JSON representation
Explaining the output of machine learning models with more accurately estimated Shapley values
- Host: GitHub
- URL: https://github.com/norskregnesentral/shapr
- Owner: NorskRegnesentral
- License: other
- Created: 2018-05-16T07:05:13.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2024-05-29T19:29:48.000Z (about 2 years ago)
- Last Synced: 2024-05-29T20:02:54.826Z (about 2 years ago)
- Topics: explainable-ai, explainable-ml, rcpp, rcpparmadillo, rstats, shapley
- Language: R
- Homepage: https://norskregnesentral.github.io/shapr/
- Size: 40.5 MB
- Stars: 135
- Watchers: 8
- Forks: 30
- Open Issues: 31
-
Metadata Files:
- Readme: README.Rmd
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
---
output: github_document
bibliography: ./inst/REFERENCES.bib
link-citations: yes
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
tidy = "styler"
)
```
# shapr 
[](https://cran.r-project.org/package=shapr)
[](https://cran.r-project.org/package=shapr)
[](https://github.com/NorskRegnesentral/shapr/actions?query=workflow%3AR-CMD-check)
[](https://lifecycle.r-lib.org/articles/stages.html)
[](https://opensource.org/license/mit)
[-10.21105/joss.02027-brightgreen.svg)](https://doi.org/10.21105/joss.02027)
[-2504.01842-b31b1b.svg)](https://arxiv.org/abs/2504.01842)
See the pkgdown site at [norskregnesentral.github.io/shapr/](https://norskregnesentral.github.io/shapr/)
for a complete introduction with examples and documentation of the package.
For an overview of the methodology and capabilities of the package (per `shapr` v1.0.8),
see the software paper @jullum2025shapr, available as a preprint [here](https://arxiv.org/abs/2504.01842).
## NEWS
With `shapr` version 1.0.0 (GitHub only, Nov 2024) and version 1.0.1 (CRAN, Jan 2025),
the package underwent a major update, providing a full restructuring of the code base, and
a full suite of new functionality, including:
* A long list of approaches for estimating the contribution/value function $v(S)$, including Variational Autoencoders
and regression-based methods
* Iterative Shapley value estimation with convergence detection
* Parallelized computations with progress updates
* Reweighted Kernel SHAP for faster convergence
* New function `explain_forecast()` for explaining forecasts
* Asymmetric and causal Shapley values
* Several other methodological, computational and user-experience improvements
* Python wrapper `shaprpy` making the core functionality of `shapr` available in Python
See the [NEWS](https://norskregnesentral.github.io/shapr/news/index.html) for a complete list.
### Coming from shapr < 1.0.0?
`shapr` version >= 1.0.0 comes with a number of breaking changes.
Most notably, we moved from using two functions (`shapr()` and `explain()`) to
one function (`explain()`).
In addition, custom models are now explained by passing the prediction function directly to `explain()`.
Several input arguments were renamed, and a few functions for edge cases were removed to simplify the code base.
Click [here](https://github.com/NorskRegnesentral/shapr/blob/cranversion_0.2.2/README.md) to view a version of this README with the old syntax (v0.2.2).
### Python wrapper
We provide a Python wrapper (`shaprpy`) which allows explaining Python models with the methodology
implemented in `shapr`, directly from Python.
The wrapper calls R internally and therefore requires an installation of R.
See [here](https://norskregnesentral.github.io/shapr/shaprpy.html) for installation instructions and examples.
## The package
The `shapr` R package implements an enhanced version of the Kernel SHAP method for approximating Shapley values,
with a strong focus on conditional Shapley values.
The core idea is to remain completely model-agnostic while offering a variety of methods for estimating contribution
functions, enabling accurate computation of conditional Shapley values across different feature types, dependencies,
and distributions.
The package also includes evaluation metrics to compare various approaches.
With features like parallelized computations, convergence detection, progress updates, and extensive plotting options,
shapr is a highly efficient and user-friendly tool, delivering precise estimates of conditional Shapley values,
which are critical for understanding how features truly contribute to predictions.
A basic example is provided below.
Otherwise, we refer to the [pkgdown website](https://norskregnesentral.github.io/shapr/) and the vignettes there for details and further examples.
## Installation
`shapr` is available on [CRAN](https://cran.r-project.org/package=shapr) and can be installed in R as:
```{r, eval = FALSE}
install.packages("shapr")
```
To install the development version of `shapr`, available on GitHub, use
```{r, eval = FALSE}
remotes::install_github("NorskRegnesentral/shapr")
```
To also install all dependencies, use
```{r, eval = FALSE}
remotes::install_github("NorskRegnesentral/shapr", dependencies = TRUE)
```
## Example
`shapr` supports computation of Shapley values with any predictive model that takes a set of numeric features and produces a numeric outcome.
The following example shows how a simple `xgboost` model is trained using the *airquality* dataset, and how `shapr` explains the individual predictions.
We first enable parallel computation and progress updates with the following code chunk.
These are optional, but recommended for improved performance and user-friendliness,
particularly for problems with many features.
```{r init_no_eval,eval = FALSE}
# Enable parallel computation
# Requires the future and future_lapply packages
future::plan("multisession", workers = 2) # Increase the number of workers for increased performance with many features
# Enable progress updates of the v(S) computations
# Requires the progressr package
progressr::handlers(global = TRUE)
progressr::handlers("cli") # Using the cli package as backend (recommended for the estimates of the remaining time)
```
Here is the actual example:
```{r basic_example, warning = FALSE}
library(xgboost)
library(shapr)
data("airquality")
data <- data.table::as.data.table(airquality)
data <- data[complete.cases(data), ]
x_var <- c("Solar.R", "Wind", "Temp", "Month")
y_var <- "Ozone"
ind_x_explain <- 1:6
x_train <- data[-ind_x_explain, ..x_var]
y_train <- data[-ind_x_explain, get(y_var)]
x_explain <- data[ind_x_explain, ..x_var]
# Look at the dependence between the features
cor(x_train)
# Fit a basic xgboost model to the training data
model <- xgboost(
x = x_train,
y = y_train,
nround = 20,
verbosity = 0
)
# Specify phi_0, i.e., the expected prediction without any features
p0 <- mean(y_train)
# Compute Shapley values with Kernel SHAP, accounting for feature dependence using
# the empirical (conditional) distribution approach with bandwidth parameter sigma = 0.1 (default)
explanation <- explain(
model = model,
x_explain = x_explain,
x_train = x_train,
approach = "empirical",
phi0 = p0,
seed = 1
)
# Print the Shapley values for the observations to explain.
print(explanation)
# Provide a formatted summary of the shapr object
summary(explanation)
# Finally, we plot the resulting explanations
plot(explanation)
```
See @jullum2025shapr (preprint available [here](https://arxiv.org/abs/2504.01842)) for a software paper with an overview of the methodology and capabilities of the
package (as of v1.0.8).
See the [general usage vignette](https://norskregnesentral.github.io/shapr/articles/general_usage.html) for further
basic usage examples and brief introductions to the methodology.
For more thorough information about the underlying methodology, see methodological papers
@aas2019explaining, @redelmeier2020explaining, @jullum2021efficient, @olsen2022using, @olsen2024comparative and
@olsen2025improving.
See also @sellereite2019shapr for a very brief paper about a previous version (v0.1.1) of the package
(with a different structure, syntax, and significantly less functionality).
## Contribution
All feedback and suggestions are very welcome. Details on how to contribute can be found
[here](https://norskregnesentral.github.io/shapr/CONTRIBUTING.html). If you have any questions or comments, feel
free to open an issue [here](https://github.com/NorskRegnesentral/shapr/issues).
Please note that the `shapr` project is released with a
[Contributor Code of Conduct](https://norskregnesentral.github.io/shapr/CODE_OF_CONDUCT.html).
By contributing to this project, you agree to abide by its terms.
## References