https://github.com/csinva/hierarchical-dnn-interpretations

Using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" 🧠 (ICLR 2019)
https://github.com/csinva/hierarchical-dnn-interpretations

acd ai artificial-intelligence convolutional-neural-networks data-science deep-learning deep-neural-networks explainability explainable-ai feature-importance iclr interpretability interpretation jupyter-notebook machine-learning ml neural-network python pytorch statistics

Last synced: 21 days ago
JSON representation

Using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" 🧠 (ICLR 2019)

Host: GitHub
URL: https://github.com/csinva/hierarchical-dnn-interpretations
Owner: csinva
License: mit
Created: 2018-05-18T12:54:43.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2021-08-25T12:27:37.000Z (almost 4 years ago)
Last Synced: 2025-05-13T09:11:33.319Z (about 2 months ago)
Topics: acd, ai, artificial-intelligence, convolutional-neural-networks, data-science, deep-learning, deep-neural-networks, explainability, explainable-ai, feature-importance, iclr, interpretability, interpretation, jupyter-notebook, machine-learning, ml, neural-network, python, pytorch, statistics
Language: Jupyter Notebook
Homepage: https://arxiv.org/abs/1806.05337
Size: 48.7 MB
Stars: 128
Watchers: 8
Forks: 23
Open Issues: 2
Metadata Files:
- Readme: readme.md
- License: LICENSE
- Citation: citation.bib

Awesome Lists containing this project

README

        
 Hierarchical neural-net interpretations (ACD) 🧠


 Produces hierarchical interpretations for a single prediction made by a pytorch neural network. Official code for Hierarchical interpretations for neural network predictions (ICLR 2019 pdf). 




  

  

  

  

  

  

  



	Documentation •

  Demo notebooks

  



	Note: this repo is actively maintained. For any questions please file an issue.



![](https://csinva.io/hierarchical-dnn-interpretations/intro.svg?sanitize=True)

# examples/documentation

- **installation**: `pip install acd` (or clone and run `python setup.py install`)

- **examples**: the [reproduce_figs](https://github.com/csinva/hierarchical-dnn-interpretations/tree/master/reproduce_figs) folder has notebooks with many demos

- **src**: the [acd](acd) folder contains the source for the method implementation

- allows for different types of interpretations by changing hyperparameters (explained in examples)

- all required data/models/code for reproducing are included in the [dsets](dsets) folder

| Inspecting NLP sentiment models    | Detecting adversarial examples      | Analyzing imagenet models           |

| ---------------------------------- | ----------------------------------- | ----------------------------------- |

| ![](reproduce_figs/figs/fig_2.png) | ![](reproduce_figs/figs/fig_s3.png) | ![](reproduce_figs/figs/fig_s2.png) |

# notes on using ACD on your own data

- the current CD implementation often works out-of-the box, especially for networks built on common layers, such as alexnet/vgg/resnet. However, if you have custom layers or layers not accessible in `net.modules()`, you may need to write a custom function to iterate through some layers of your network (for examples see `cd.py`). 

- to use baselines such build-up and occlusion, replace the pred_ims function by a function, which gets predictions from your model given a batch of examples.

# related work

- CDEP (ICML 2020 [pdf](https://arxiv.org/abs/1909.13584), [github](https://github.com/laura-rieger/deep-explanation-penalization)) - penalizes CD / ACD scores during training to make models generalize better

- TRIM (ICLR 2020 workshop [pdf](https://arxiv.org/abs/2003.01926), [github](https://github.com/csinva/transformation-importance)) - using simple reparameterizations, allows for calculating disentangled importances to transformations of the input (e.g. assigning importances to different frequencies)

- PDR framework (PNAS 2019 [pdf](https://arxiv.org/abs/1901.04592)) - an overarching framewwork for guiding and framing interpretable machine learning

- DAC (arXiv 2019 [pdf](https://arxiv.org/abs/1905.07631), [github](https://github.com/csinva/disentangled-attribution-curves)) - finds disentangled interpretations for random forests

- Baseline interpretability methods - the file `scores/score_funcs.py` also contains simple pytorch implementations of [integrated gradients](https://arxiv.org/abs/1703.01365) and the simple interpration technique `gradient * input`

# reference

- feel free to use/share this code openly

- if you find this code useful for your research, please cite the following:

 ```r

@inproceedings{

    singh2019hierarchical,

    title={Hierarchical interpretations for neural network predictions},

    author={Chandan Singh and W. James Murdoch and Bin Yu},

    booktitle={International Conference on Learning Representations},

    year={2019},

    url={https://openreview.net/forum?id=SkEqro0ctQ},

}

 ```