Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jemus42/xplainfi

A very early and experimental bit of trying something out.
https://github.com/jemus42/xplainfi

Last synced: 24 days ago
JSON representation

A very early and experimental bit of trying something out.

Awesome Lists containing this project

README

        

---
output: github_document
editor_options:
chunk_output_type: console
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
# Quiet down
lgr::get_logger("mlr3")$set_threshold("warn")

set.seed(123)
```

# `xplainfi`

[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![R-CMD-check](https://github.com/jemus42/xplainfi/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/jemus42/xplainfi/actions/workflows/R-CMD-check.yaml)

The goal of `xplainfi` is to collect common feature importance methods under a unified and extensible interface.

For now, it is built specifically around [mlr3](https://mlr-org.com/), as available abstractions for learners, tasks, measures, etc. greatly simplify the implementation of importance measures.

## Installation

You can install the development version of `xplainfi` like so:

``` r
# install.packages(pak)
pak::pak("jemus42/xplainfi")
```

## Example: PFI

Here is a basic example on how to calculate PFI for a given learner and task, using repeated cross-validation as resampling strategy and computing PFI within each resampling 5 times:

```{r}
library(xplainfi)
library(mlr3)
library(mlr3learners)

task = tsk("german_credit")
learner = lrn("classif.ranger", num.trees = 100)
measure = msr("classif.ce")

pfi = PFI$new(
task = task,
learner = learner,
measure = measure,
resampling = rsmp("repeated_cv", folds = 3, repeats = 2),
iters_perm = 5
)
```

Compute and print PFI scores:

```{r}
pfi$compute()
```

Retrieve scores later in `pfi$importance`.

When PFI is computed based on resampling with multiple iterations, and / or multiple permutation iterations, the individual scores can be retrieved as a `data.table`:

```{r}
pfi$scores
```

Where `iter_rsmp` corresponds to the resampling iteration, i.e., 3 * 2 = 6 for 2 repeats of 3-fold cross-validation, and `iter_perm` corresponds to the permutation iteration, 5 in this case.
While `pfi$importance` contains the means across all iterations, `pfi$scores` allows you to manually aggregate them in any way you see fit.

In the simplest case, you run PFI with a single resampling iteration (holdout) and a single permutation iteration, and `pfi$importance` will contain the same values as `pfi$scores`.

```{r}
pfi_single = PFI$new(
task = task,
learner = learner,
measure = measure
)

pfi_single$compute()
pfi_single$scores
```