https://github.com/laura-rieger/deep-explanation-penalization

Code for using CDEP from the paper "Interpretations are useful: penalizing explanations to align neural networks with prior knowledge" https://arxiv.org/abs/1909.13584
https://github.com/laura-rieger/deep-explanation-penalization

ai artificial-intelligence cdep convolutional-neural-network data-science deep-learning explainability explainable-ai fairness fairness-ml feature-importance interpretability interpretable-deep-learning jupyter-notebook machine-learning ml neural-network python pytorch recurrent-neural-network

Last synced: 2 months ago
JSON representation

Code for using CDEP from the paper "Interpretations are useful: penalizing explanations to align neural networks with prior knowledge" https://arxiv.org/abs/1909.13584

Host: GitHub
URL: https://github.com/laura-rieger/deep-explanation-penalization
Owner: laura-rieger
License: mit
Created: 2019-02-12T21:26:08.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2021-03-22T02:10:12.000Z (over 4 years ago)
Last Synced: 2024-11-15T05:32:24.135Z (8 months ago)
Topics: ai, artificial-intelligence, cdep, convolutional-neural-network, data-science, deep-learning, explainability, explainable-ai, fairness, fairness-ml, feature-importance, interpretability, interpretable-deep-learning, jupyter-notebook, machine-learning, ml, neural-network, python, pytorch, recurrent-neural-network
Language: Jupyter Notebook
Homepage:
Size: 248 MB
Stars: 127
Watchers: 9
Forks: 14
Open Issues: 1
Metadata Files:
- Readme: readme.md
- License: LICENSE

Awesome Lists containing this project

README

        
 Making interpretations useful (CDEP) 🔨


 Regularizes interpretations (computed via contextual decomposition) to improve neural networks. Official code for Interpretations are useful: penalizing explanations to align neural networks with prior knowledges (ICML 2020 pdf). 




  

  

  

  

  



  Note: this repo is actively maintained. For any questions please file an issue.

  

![fig_intro](fig_intro.png)

# documentation

- fully-contained data/models/code for reproducing and experimenting with CDEP

- the [src](src) folder contains the core code for running and penalizing contextual decomposition

- in addition, we run experiments on 4 datasets, each of which are located in their own folders

  - notebooks in these folders show demos for different kinds of text

# examples

[ISIC skin-cancer classification](isic-skin-cancer) - using CDEP, we can learn to avoid spurious patches present in the training set, improving test performance!



  



The segmentation maps of the patches can be downloaded [here](https://drive.google.com/drive/folders/1Er2PQMwmDSmg3BThyeu-JKX442OkQJit?usp=sharing)

[ColorMNIST](mnist) - penalizing the contributions of individual pixels allows us to teach a network to learn a digit's shape instead of its color, improving its test accuracy from 0.5% to 25.1%



  



[Fixing text gender biases](text) - CDEP can help to learn spurious biases in a dataset, such as gendered words



  



# using CDEP on your own data

using CDEP requires two steps:

1. run CD/ACD on your model. Specifically, 3 things must be altered:

  - the pred_ims function must be replaced by a function you write using your own trained model. This function gets predictions from a model given a batch of examples.

  - the model must be replaced with your model

  - the current CD implementation doesn't always work for all types of networks. If you are getting an error inside of `cd.py`, you may need to write a custom function that iterates through the layers of your network (for examples see `cd.py`)

2. add CD scores to the loss function (see notebooks)

# related work

- ACD (ICLR 2019 [pdf](https://openreview.net/pdf?id=SkEqro0ctQ), [github](https://github.com/csinva/hierarchical-dnn-interpretations)) - extends CD to CNNs / arbitrary DNNs, and aggregates explanations into a hierarchy

- PDR framework (PNAS 2019 [pdf](https://arxiv.org/abs/1901.04592)) - an overarching framewwork for guiding and framing interpretable machine learning

- TRIM (ICLR 2020 workshop [pdf](https://arxiv.org/abs/2003.01926), [github](https://github.com/csinva/transformation-importance)) - using simple reparameterizations, allows for calculating disentangled importances to transformations of the input (e.g. assigning importances to different frequencies)

- DAC (arXiv 2019 [pdf](https://arxiv.org/abs/1905.07631), [github](https://github.com/csinva/disentangled-attribution-curves)) - finds disentangled interpretations for random forests

# reference

- feel free to use/share this code openly

- if you find this code useful for your research, please cite the following:

```r

@inproceedings{rieger2020interpretations,

  title={Interpretations are useful: penalizing explanations to align neural networks with prior knowledge},

  author={Rieger, Laura and Singh, Chandan and Murdoch, William and Yu, Bin},

  booktitle={International Conference on Machine Learning},

  pages={8116--8126},

  year={2020},

  organization={PMLR}

}

```