Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/fawazsammani/awesome-xai

Papers about Explainable AI (Deep Learning-based)
https://github.com/fawazsammani/awesome-xai

List: awesome-xai

Last synced: about 1 month ago
JSON representation

Papers about Explainable AI (Deep Learning-based)

Awesome Lists containing this project

README

        

# Awesome Explainable AI [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)

If you find some overlooked papers, please open issues or pull requests, and provide the paper(s) in this format:
```
- **[]** Paper Name [[pdf]]() [[code]]()
```

## Papers
- Visualizing and Understanding Convolutional Networks [[pdf]](https://arxiv.org/pdf/1311.2901.pdf)
- Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps [[pdf]](https://arxiv.org/pdf/1312.6034.pdf) [[saliency code]](https://github.com/sunnynevarekar/pytorch-saliency-maps)
- Striving for Simplicity: The All Convolutional Net [[pdf]](https://arxiv.org/pdf/1412.6806.pdf)
- Understanding Neural Networks Through Deep Visualization [[pdf]](https://arxiv.org/pdf/1506.06579.pdf)
- Synthesizing the preferred inputs for neurons in neural networks via deep generator networks [[pdf]](https://arxiv.org/pdf/1605.09304.pdf)
- Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks [[pdf]](https://arxiv.org/pdf/1602.03616.pdf)
- Understanding Deep Image Representations by Inverting Them [[pdf]](https://arxiv.org/pdf/1412.0035.pdf)
- Visualizing deep convolutional neural networks using natural pre-images [[pdf]](https://arxiv.org/pdf/1512.02017.pdf)
- Understanding Neural Networks via Feature Visualization: A survey [[pdf]](https://arxiv.org/pdf/1904.08939.pdf)
- Conditional iterative generation of images in latent space [[pdf]](https://arxiv.org/pdf/1612.00005.pdf)
- Interpretable Explanations of Black Boxes by Meaningful Perturbation [[pdf]](https://arxiv.org/pdf/1704.03296.pdf) [[code]](https://github.com/jacobgil/pytorch-explain-black-box) [[code]](https://github.com/ruthcfong/pytorch-explain-black-box) [[code]](https://github.com/da2so/Interpretable-Explanations-of-Black-Boxes-by-Meaningful-Perturbation)
- Gradient-Based Attribution Methods [[pdf]](https://cgl.ethz.ch/Downloads/Publications/Papers/2019/Anc19c/Anc19c.pdf)
- Top-down Neural Attention by Excitation Backprop [[pdf]](https://arxiv.org/pdf/1608.00507.pdf) [[code]](https://github.com/ruthcfong/pointing_game)
- Salient Deconvolutional Networks [[pdf]](https://www.robots.ox.ac.uk/~vedaldi/assets/pubs/mahendran16salient.pdf)
- Explaining and Interpreting LSTMs [[pdf]](https://arxiv.org/pdf/1909.12114.pdf)
- Explaining and Harnessing Adversarial Examples [[pdf]](https://arxiv.org/pdf/1412.6572.pdf) [[code]](https://www.tensorflow.org/tutorials/generative/adversarial_fgsm) [[code]](https://github.com/Harry24k/FGSM-pytorch) [[code]](https://pytorch.org/tutorials/beginner/fgsm_tutorial.html)
- Adversarial Training for Free! [[pdf]](https://arxiv.org/pdf/1904.12843.pdf) [[code]](https://github.com/mahyarnajibi/FreeAdversarialTraining) [[video]](https://www.youtube.com/watch?v=v8U9mM1Vwv0&ab_channel=AminJun)
- Fast Adversarial Training with Smooth Convergence [[pdf]](https://arxiv.org/pdf/2308.12857.pdf) [[code]](https://github.com/FAT-CS/ConvergeSmooth)
- Intriguing properties of neural networks [[pdf]](https://arxiv.org/pdf/1312.6199.pdf)
- High Confidence Predictions for Unrecognizable Images [[pdf]](https://arxiv.org/pdf/1412.1897.pdf)
- Contrastive Explanations in Neural Networks [[pdf]](https://arxiv.org/pdf/2008.00178.pdf) [[code]](https://github.com/olivesgatech/Contrastive-Explanations) [[slides]](https://gukyeongkwon.github.io/slides/mohit_icip2020_slides.pdf)
- Towards better understanding of gradient-based attribution methods for Deep Neural Networks [[pdf]](https://arxiv.org/pdf/1711.06104.pdf)
- On the (In)fidelity and Sensitivity of Explanations [[pdf]](https://arxiv.org/pdf/1901.09392.pdf) [[code]](https://github.com/chihkuanyeh/saliency_evaluation)
- Unsupervised learning of object semantic parts from internal states of CNNs by population encoding [[pdf]](https://arxiv.org/pdf/1511.06855.pdf)
- Diverse feature visualizations reveal invariances in early layers of deep neural networks [[pdf]](https://arxiv.org/pdf/1807.10589.pdf)
- Interpretation of Neural Networks is Fragile [[pdf]](https://arxiv.org/pdf/1710.10547.pdf)
- Towards Better Analysis of Deep Convolutional Neural Networks [[pdf]](https://arxiv.org/pdf/1604.07043.pdf)
- Do semantic parts emerge in Convolutional Neural Networks? [[pdf]](https://arxiv.org/pdf/1607.03738.pdf)
- Do Convolutional Neural Networks Learn Class Hierarchy? [[pdf]](https://arxiv.org/pdf/1710.06501.pdf)
- A Benchmark for Interpretability Methods in Deep Neural Networks [[pdf]](https://arxiv.org/pdf/1806.10758.pdf)
- On the Robustness of Interpretability Methods [[pdf]](https://arxiv.org/pdf/1806.08049.pdf)
- Sanity Checks for Saliency Maps [[pdf]](https://arxiv.org/pdf/1810.03292.pdf)
- Sanity Checks for Saliency Metrics [[pdf]](https://arxiv.org/pdf/1912.01451.pdf)
- Relative Attributing Propagation: Interpreting the Comparative Contributions of Individual Units in Deep Neural Networks [[pdf]](https://arxiv.org/pdf/1904.00605.pdf)
- Transformer Interpretability Beyond Attention Visualization [[pdf]](https://arxiv.org/pdf/2012.09838.pdf) [[code]](https://github.com/hila-chefer/Transformer-Explainability) [[video]](https://www.youtube.com/watch?v=a0O_QhE9XFM&ab_channel=DataScienceBond)
- Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers [[pdf]](https://arxiv.org/pdf/2103.15679.pdf) [[code]](https://github.com/hila-chefer/Transformer-MM-Explainability)
- Optimizing Relevance Maps of Vision Transformers Improves Robustness [[pdf]](https://arxiv.org/pdf/2206.01161.pdf) [[code]](https://github.com/hila-chefer/RobustViT)
- Investigating the influence of noise and distractors on the interpretation of neural networks [[pdf]](https://arxiv.org/pdf/1611.07270.pdf)
- Do Explanations Explain? Model Knows Best [[pdf]](https://arxiv.org/pdf/2203.02269.pdf) [[code]](https://github.com/CAMP-eXplain-AI/Do-Explanations-Explain)
- Visualizing Deep Neural Network Decisions: Prediction Difference Analysis [[pdf]](https://arxiv.org/pdf/1702.04595.pdf) [[code]](https://github.com/lmzintgraf/DeepVis-PredDiff)
- Visualizing and Understanding Generative Adversarial Networks [[pdf]](https://arxiv.org/pdf/1811.10597.pdf) [[code]](https://github.com/CSAILVision/GANDissect) [[website]](http://gandissect.csail.mit.edu/)
- ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness [[pdf]](https://arxiv.org/pdf/1811.12231.pdf) [[code]](https://github.com/rgeirhos/texture-vs-shape)
- Deep Image Prior [[pdf]](https://arxiv.org/pdf/1711.10925.pdf) [[code]](https://github.com/DmitryUlyanov/deep-image-prior) [[code]](https://github.com/safwankdb/Deep-Image-Prior) [[code]](https://mlpeschl.com/post/deepimageprior/) [[website]](https://dmitryulyanov.github.io/deep_image_prior)
- How Do Vision Transformers Work? [[pdf]](https://arxiv.org/pdf/2202.06709.pdf)
- Breaking Batch Normalization for better explainability of Deep Neural Networks through Layer-wise Relevance Propagation [[pdf]](https://arxiv.org/pdf/2002.11018.pdf)
- Layer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers [[pdf]](https://arxiv.org/pdf/1604.00825.pdf)
- Revisiting The Evaluation of Class Activation Mapping for Explainability: A Novel Metric and Experimental Analysis [[pdf]](https://arxiv.org/pdf/2104.10252.pdf)
- Explaining image classifiers by removing input features using generative models [[pdf]](https://arxiv.org/pdf/1910.04256.pdf) [[code]](https://github.com/anguyen8/generative-attribution-methods)
- Do Vision Transformers See Like Convolutional Neural Networks? [[pdf]](https://arxiv.org/pdf/2108.08810.pdf)
- Explaining Classifiers using Adversarial Perturbations on the Perceptual Ball [[pdf]](https://arxiv.org/pdf/1912.09405.pdf)
- Explaining Knowledge Distillation by Quantifying the Knowledge [[pdf]](https://arxiv.org/pdf/2003.03622.pdf)
- Interpreting Super-Resolution Networks with Local Attribution Maps [[pdf]](https://arxiv.org/pdf/2011.11036.pdf)
- Is the deconvolution layer the same as a convolutional layer? [[pdf]](https://arxiv.org/ftp/arxiv/papers/1609/1609.07009.pdf)
- Towards Human-Understandable Visual Explanations: Imperceptible High-frequency Cues Can Better Be Removed [[pdf]](https://arxiv.org/pdf/2104.07954.pdf)
- Gradient Inversion with Generative Image Prior [[pdf]](https://arxiv.org/pdf/2110.14962.pdf) [[code]](https://github.com/ml-postech/gradient-inversion-generative-image-prior)
- Explaining Local, Global, And Higher-Order Interactions In Deep Learning [[pdf]](https://arxiv.org/pdf/2006.08601.pdf)
- Pitfalls of Explainable ML: An Industry Perspective [[pdf]](https://arxiv.org/pdf/2106.07758.pdf)
- Do Feature Attribution Methods Correctly Attribute Features? [[pdf]](https://arxiv.org/pdf/2104.14403.pdf) [[code]](https://github.com/YilunZhou/feature-attribution-evaluation)
- Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis [[pdf]](https://arxiv.org/pdf/2111.04138.pdf) [[code]](https://github.com/fel-thomas/Sobol-Attribution-Method)
- What do neural networks learn in image classification? A frequency shortcut perspective [[pdf]](https://arxiv.org/pdf/2307.09829.pdf)
- The effectiveness of feature attribution methods and its correlation with automatic evaluation scores [[pdf]](https://arxiv.org/pdf/2105.14944.pdf)
- Interpreting Deep Neural Networks with Relative Sectional Propagation by Analyzing Comparative Gradients and Hostile Activations [[pdf]](https://arxiv.org/pdf/2012.03434.pdf)
- The (Un)reliability of saliency methods [[pdf]](https://arxiv.org/pdf/1711.00867.pdf)
- Explaining Convolutional Neural Networks through Attribution-Based Input Sampling and Block-Wise Feature Aggregation [[pdf]](https://arxiv.org/pdf/2010.00672.pdf)
- Explainable Models with Consistent Interpretations [[pdf]](https://web.cs.ucdavis.edu/~hpirsiav/papers/gc_aaai21.pdf) [[code]](https://github.com/UMBCvision/Explainable-Models-with-Consistent-Interpretations)
- Interpreting Multivariate Shapley Interactions in DNNs [[pdf]](https://arxiv.org/pdf/2010.05045.pdf)
- Finding and Fixing Spurious Patterns with Explanations [[pdf]](https://arxiv.org/pdf/2106.02112.pdf)
- Monitoring Shortcut Learning using Mutual Information [[pdf]](https://arxiv.org/pdf/2206.13034.pdf)
- Dissecting Deep Learning Networks - Visualizing Mutual Information [[pdf]](https://www.mdpi.com/1099-4300/20/11/823)
- Revisiting Backpropagation Saliency Methods [[pdf]](https://arxiv.org/pdf/2004.02866.pdf)
- Towards Visually Explaining Variational Autoencoders [[pdf]](https://arxiv.org/pdf/1911.07389.pdf) [[code]](https://github.com/liuem607/expVAE) [[code]](https://github.com/FrankBrongers/Reproducing_expVAE) [[video]](https://www.youtube.com/watch?v=6FqVcSAfSkI&ab_channel=ComputerVisionFoundationVideos) [[video]](https://www.youtube.com/watch?v=3XOgqhf-GZM&t=1034s&ab_channel=VipulVaibhaw)
- Understanding Integrated Gradients with SmoothTaylor for Deep Neural Network Attribution [[pdf]](https://arxiv.org/pdf/2004.10484.pdf)
- Understanding Deep Networks via Extremal Perturbations and Smooth Masks [[pdf]](https://arxiv.org/pdf/1910.08485.pdf) [[code]](https://github.com/facebookresearch/TorchRay)
- Interpretable and Fine-Grained Visual Explanations for Convolutional Neural Networks [[pdf]](https://arxiv.org/pdf/1908.02686.pdf)
- Towards Robust Interpretability with Self-Explaining Neural Networks [[pdf]](https://arxiv.org/pdf/1806.07538.pdf)
- Influence-Directed Explanations for Deep Convolutional Networks [[pdf]](https://arxiv.org/pdf/1802.03788.pdf)
- Interpretable Basis Decomposition for Visual Explanation [[pdf]](https://openaccess.thecvf.com/content_ECCV_2018/papers/Antonio_Torralba_Interpretable_Basis_Decomposition_ECCV_2018_paper.pdf) [[code]](https://github.com/CSAILVision/IBD)
- Real Time Image Saliency for Black Box Classifiers [[pdf]](https://arxiv.org/pdf/1705.07857.pdf)
- Bias Also Matters: Bias Attribution for Deep Neural Network Explanation [[pdf]](http://proceedings.mlr.press/v97/wang19p/wang19p.pdf)
- Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation [[pdf]](https://arxiv.org/pdf/1902.00407.pdf)
- Distilling Critical Paths in Convolutional Neural Networks [[pdf]](https://arxiv.org/pdf/1811.02643.pdf)
- Understanding intermediate layers using linear classifier probes [[pdf]](https://arxiv.org/pdf/1610.01644.pdf)
- Neural Response Interpretation through the Lens of Critical Pathways [[pdf]](https://arxiv.org/pdf/2103.16886.pdf) [[code]](https://github.com/CAMP-eXplain-AI/PathwayGrad) [[code]](https://github.com/CAMP-eXplain-AI/RoarTorch)
- Interpret Neural Networks by Identifying Critical Data Routing Paths [[pdf]](https://openaccess.thecvf.com/content_cvpr_2018/papers/Wang_Interpret_Neural_Networks_CVPR_2018_paper.pdf)
- Reconstructing Training Data from Trained Neural Networks [[pdf]](https://arxiv.org/pdf/2206.07758v1.pdf) [[website]](https://giladude1.github.io/reconstruction/)
- Visualizing Deep Similarity Networks [[pdf]](https://arxiv.org/pdf/1901.00536.pdf) [[code]](https://github.com/GWUvision/Similarity-Visualization)
- Improving Deep Learning Interpretability by Saliency Guided Training [[pdf]](https://arxiv.org/pdf/2111.14338.pdf) [[code]](https://github.com/ayaabdelsalam91/saliency_guided_training)
- Understanding Prediction Discrepancies in Machine Learning Classifiers [[pdf]](https://arxiv.org/pdf/2104.05467.pdf)
- Intriguing Properties of Vision Transformers [[pdf]](https://arxiv.org/pdf/2105.10497.pdf) [[code]](https://github.com/Muzammal-Naseer/Intriguing-Properties-of-Vision-Transformers)
- From Clustering to Cluster Explanations via Neural Networks [[pdf]](https://arxiv.org/pdf/1906.07633.pdf)
- Compositional Explanations of Neurons [[pdf]](https://arxiv.org/pdf/2006.14032.pdf)
- What Does CNN Shift Invariance Look Like? A Visualization Study [[pdf]](https://arxiv.org/pdf/2011.04127.pdf) [[code]](https://github.com/jakehlee/interactive-invariance) [[project]](https://jakehlee.github.io/visualize-invariance)
- Explainability Methods for Graph Convolutional Neural Networks [[pdf]](https://openaccess.thecvf.com/content_CVPR_2019/papers/Pope_Explainability_Methods_for_Graph_Convolutional_Neural_Networks_CVPR_2019_paper.pdf) [[code]](https://github.com/ndey96/GCNN-Explainability)
- What do Vision Transformers Learn? A Visual Exploration [[pdf]](https://arxiv.org/pdf/2212.06727.pdf)
- Learning Accurate and Interpretable Decision Rule Sets from Neural Networks [[pdf]](https://arxiv.org/pdf/2103.02826.pdf)
- Visual Explanation for Deep Metric Learning [[pdf]](https://arxiv.org/pdf/1909.12977.pdf) [[code]](https://github.com/Jeff-Zilence/Explain_Metric_Learning)
- Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations [[pdf]](https://arxiv.org/pdf/1703.03717.pdf)
- Understanding Black-box Predictions via Influence Functions [[pdf]](https://arxiv.org/pdf/1703.04730.pdf) [[code]](https://github.com/nimarb/pytorch_influence_functions)
- Unmasking Clever Hans predictors and assessing what machines really learn [[pdf]](https://www.nature.com/articles/s41467-019-08987-4.pdf)
- Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation [[pdf]](https://arxiv.org/pdf/1903.10992.pdf)
- Quantitative Evaluations on Saliency Methods: An Experimental Study [[pdf]](https://arxiv.org/pdf/2012.15616.pdf)
- Metrics for saliency map evaluation of deep learning explanation methods [[pdf]](https://arxiv.org/pdf/2201.13291.pdf)
- Neural Networks are Decision Trees [[pdf]](https://arxiv.org/pdf/2210.05189.pdf)
- Towards Generating Human-Centered Saliency Maps without Sacrificing Accuracy [[blog]](https://katelyn98.github.io/blog/2022/vlr-project/)
- Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data [[pdf]](https://arxiv.org/pdf/2002.06716.pdf) [[code]](https://github.com/CalculatedContent/ww-trends-2020) [[code]](https://github.com/CalculatedContent/WeightWatcher) [[pip]](https://pypi.org/project/weightwatcher/) [[powerlaw]](https://github.com/jeffalstott/powerlaw)
- Exploring Explainability for Vision Transformers [[blog]](https://jacobgil.github.io/deeplearning/vision-transformer-explainability) [[code]](https://github.com/jacobgil/vit-explain)
- Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces [[pdf]](https://arxiv.org/pdf/2212.14855.pdf)
- Are Transformers More Robust Than CNNs? [[pdf]](https://arxiv.org/pdf/2111.05464.pdf) [[code]](https://github.com/ytongbai/ViTs-vs-CNNs)
- Exploring Corruption Robustness: Inductive Biases in Vision Transformers and MLP-Mixers [[pdf]](https://arxiv.org/pdf/2106.13122.pdf) [[code]](https://github.com/katelyn98/CorruptionRobustness)
- Explanatory Interactive Machine Learning [[pdf]](https://ml-research.github.io/papers/teso2019aies_XIML.pdf)
- Toward Faithful Explanatory Active Learning with Self-explainable Neural Nets [[pdf]](https://ceur-ws.org/Vol-2444/ialatecml_paper1.pdf)
- Studying How to Efficiently and Effectively Guide Models with Explanations [[pdf]](https://openaccess.thecvf.com/content/ICCV2023/papers/Rao_Studying_How_to_Efficiently_and_Effectively_Guide_Models_with_Explanations_ICCV_2023_paper.pdf) [[supp]](https://openaccess.thecvf.com/content/ICCV2023/supplemental/Rao_Studying_How_to_ICCV_2023_supplemental.pdf)
- Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [[pdf]](https://arxiv.org/pdf/2004.09034.pdf)
- Fixing Localization Errors to Improve Image Classification [[pdf]](https://homes.esat.kuleuven.be/~konijn/publications/2020/sun2.pdf)
- Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias [[pdf]](https://arxiv.org/pdf/2001.03152.pdf)
- On Guiding Visual Attention with Language Specification [[pdf]](https://arxiv.org/pdf/2202.08926.pdf)
- Improving Interpretability via Regularization of Neural Activation Sensitivity [[pdf]](https://arxiv.org/pdf/2211.08686.pdf)
- L1-Norm Gradient Penalty for Noise Reduction of Attribution Maps [[pdf]](https://openaccess.thecvf.com/content_CVPRW_2019/papers/Explainable%20AI/Kiritoshi_L1-Norm_Gradient_Penalty_for_Noise_Reduction_of_Attribution_Maps_CVPRW_2019_paper.pdf)
- Identifying Spurious Correlations and Correcting them with an Explanation-based Learning [[pdf]](https://arxiv.org/pdf/2211.08285.pdf)
- Visual Attention Consistency under Image Transforms for Multi-Label Image Classification [[pdf]](https://openaccess.thecvf.com/content_CVPR_2019/papers/Guo_Visual_Attention_Consistency_Under_Image_Transforms_for_Multi-Label_Image_Classification_CVPR_2019_paper.pdf)
- Improving performance of deep learning models with axiomatic attribution priors and expected gradients [[pdf]](https://arxiv.org/pdf/1906.10670.pdf)
- Fast Axiomatic Attribution for Neural Networks [[pdf]](https://arxiv.org/pdf/2111.07668.pdf) [[code]](https://github.com/visinf/fast-axiomatic-attribution)
- Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation [[pdf]](https://arxiv.org/pdf/1902.00407.pdf)
- Detecting Statistical Interactions from Neural Network Weights [[pdf]](https://arxiv.org/pdf/1705.04977.pdf)
- What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods [[pdf]](https://arxiv.org/pdf/2112.04417.pdf) [[code]](https://github.com/serre-lab/Meta-predictor) [[blog]](https://serre-lab.github.io/Meta-predictor/)
- The Hidden Language of Diffusion Models [[pdf]](https://arxiv.org/pdf/2306.00966.pdf) [[code]](https://github.com/hila-chefer/Conceptor) [[website]](https://hila-chefer.github.io/Conceptor/)
- Investigating Vision Transformer representations [[blog]](https://keras.io/examples/vision/probing_vits/)
- Mean Attention Distance in Vision Transformers [[pdf]](https://arxiv.org/pdf/2010.11929.pdf) [[code]](https://colab.research.google.com/github/all-things-vits/code-samples/blob/main/probing/mean_attention_distance.ipynb)
- Interpreting Vision and Language Generative Models with Semantic Visual Priors [[pdf]](https://arxiv.org/pdf/2304.14986v2.pdf)
- Learning Concise and Descriptive Attributes for Visual Recognition [[pdf]](https://arxiv.org/pdf/2308.03685.pdf)
- Visual Classification via Description from Large Language Models [[pdf]](https://arxiv.org/pdf/2210.07183.pdf) [[code]](https://github.com/sachit-menon/classify_by_description_release) [[website]](https://cv.cs.columbia.edu/sachit/classviadescr/)
- Representation Engineering: A Top-Down Approach to AI Transparency [[pdf]](https://arxiv.org/pdf/2310.01405.pdf) [[code]](https://github.com/andyzoujm/representation-engineering) [[website]](https://www.ai-transparency.org/)
- Multimodal Neurons in Pretrained Text-Only Transformers [[pdf]](https://arxiv.org/pdf/2308.01544.pdf) [[website]](https://multimodal-interpretability.csail.mit.edu/Multimodal-Neurons-in-Text-Only-Transformers/)
- Are Vision Language Models Texture or Shape Biased and Can We Steer Them? [[pdf]](https://arxiv.org/pdf/2403.09193.pdf)
- Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation [[pdf]](https://arxiv.org/pdf/2311.17216.pdf) [[website]](https://interpretdiffusion.github.io/)
- **[OpenXAI]** Towards a Transparent Evaluation of Model Explanations [[pdf]](https://arxiv.org/pdf/2206.11104.pdf) [[code]](https://github.com/AI4LIFE-GROUP/OpenXAI) [[website]](https://open-xai.github.io/)
- **[TracIn]** Estimating Training Data Influence by Tracing Gradient Descent [[pdf]](https://arxiv.org/pdf/2002.08484.pdf) [[code]](https://github.com/frederick0329/TracIn) [[code]](https://github.com/ovyan/TracIn)
- **[VoG]** Estimating Example Difficulty using Variance of Gradients [[pdf]](https://arxiv.org/pdf/2008.11600.pdf) [[code]](https://github.com/chirag126/VOG) [[project]](https://varianceofgradients.github.io/)
- **[D-RISE]** Black-box Explanation of Object Detectors via Saliency Maps [[pdf]](https://arxiv.org/pdf/2006.03204.pdf)
- **[SmoothGrad]** Removing noise by adding noise [[pdf]](https://arxiv.org/pdf/1706.03825.pdf)
- **[Integrated Gradients]** Axiomatic Attribution for Deep Networks [[pdf]](https://arxiv.org/pdf/1703.01365.pdf) [[code]](https://www.tensorflow.org/tutorials/interpretability/integrated_gradients) [[code]](https://vl8r.eu/posts/2021/10/15/how-the-integrated-gradients-method-works/)
- **[BlurIG]** Attribution in Scale and Space [[pdf]](https://arxiv.org/pdf/2004.03383.pdf) [[code]](https://github.com/PAIR-code/saliency)
- **[IDGI]** A Framework to Eliminate Explanation Noise from Integrated Gradients [[pdf]](https://arxiv.org/pdf/2303.14242.pdf) [[code]](https://github.com/yangruo1226/IDGI)
- **[GIG]** Guided Integrated Gradients: an Adaptive Path Method for Removing Noise [[pdf]](https://arxiv.org/pdf/2106.09788.pdf) [[code]](https://github.com/PAIR-code/saliency)
- **[SPI]** Beyond Single Path Integrated Gradients for Reliable Input Attribution via Randomized Path Sampling [[pdf]](https://openaccess.thecvf.com/content/ICCV2023/papers/Jeon_Beyond_Single_Path_Integrated_Gradients_for_Reliable_Input_Attribution_via_ICCV_2023_paper.pdf) [[supp]](https://openaccess.thecvf.com/content/ICCV2023/supplemental/Jeon_Beyond_Single_Path_ICCV_2023_supplemental.pdf)
- **[IIA]** Visual Explanations via Iterated Integrated Attributions [[pdf]](https://openaccess.thecvf.com/content/ICCV2023/papers/Barkan_Visual_Explanations_via_Iterated_Integrated_Attributions_ICCV_2023_paper.pdf) [[supp]](https://openaccess.thecvf.com/content/ICCV2023/supplemental/Barkan_Visual_Explanations_via_ICCV_2023_supplemental.pdf) [[code]](https://github.com/iia-iccv23/iia)
- **[Integrated Hessians]** Explaining Explanations: Axiomatic Feature Interactions for Deep Networks [[pdf]](https://arxiv.org/pdf/2002.04138.pdf) [[code]](https://github.com/suinleelab/path_explain)
- **[Archipelago]** How does this interaction affect me? Interpretable attribution for feature interactions [[pdf]](https://arxiv.org/pdf/2006.10965.pdf) [[code]](https://github.com/mtsang/archipelago)
- **[I-GOS]** Visualizing Deep Networks by Optimizing with Integrated Gradients [[pdf]](https://arxiv.org/pdf/1905.00954.pdf)
- **[MoreauGrad]** Sparse and Robust Interpretation of Neural Networks via Moreau Envelope [[pdf]](https://arxiv.org/pdf/2302.05294.pdf) [[code]](https://github.com/buyeah1109/MoreauGrad)
- **[SAGs]** One Explanation is Not Enough: Structured Attention Graphs for Image Classification [[pdf]](https://arxiv.org/pdf/2011.06733.pdf)
- **[LRP]** On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation [[pdf]](https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0130140&type=printable) [[pdf]](https://iphome.hhi.de/samek/pdf/MonXAI19.pdf) [[pdf]](https://www.sciencedirect.com/science/article/pii/S1051200417302385) [[tutorial]](https://git.tu-berlin.de/gmontavon/lrp-tutorial) [[code]](https://github.com/fhvilshoj/TorchLRP) [[code]](https://github.com/deepfindr/xai-series/blob/master/05_lrp.py) [[blog]](https://towardsdatascience.com/indepth-layer-wise-relevance-propagation-340f95deb1ea)
- **[DeepDream]** Inceptionism: Going Deeper into Neural Networks [[blog]](https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html) [[code]](https://github.com/eriklindernoren/PyTorch-Deep-Dream) [[code]](https://github.com/gordicaleksa/pytorch-deepdream) [[code]](https://github.com/ProGamerGov/neural-dream)
- **[Archipelago]** How does this interaction affect me? Interpretable attribution for feature interactions [[pdf]](https://arxiv.org/pdf/2006.10965.pdf)
- **[RISE]** Randomized Input Sampling for Explanation of Black-box Models [[pdf]](https://arxiv.org/pdf/1806.07421.pdf) [[code]](https://github.com/eclique/RISE) [[website]](https://cs-people.bu.edu/vpetsiuk/rise/)
- **[DeepLIFT]** Learning Important Features Through Propagating Activation Differences [[pdf]](https://arxiv.org/pdf/1704.02685.pdf) [[video]](https://www.youtube.com/playlist?list=PLJLjQOkqSRTP3cLB2cOOi_bQFw6KPGKML) [[code]](https://github.com/kundajelab/deeplift)
- **[ROAD]** A Consistent and Efficient Evaluation Strategy for Attribution Methods [[pdf]](https://arxiv.org/pdf/2202.00449.pdf) [[code]](https://github.com/tleemann/road_evaluation)
- **[Layer Masking]** Towards Improved Input Masking for Convolutional Neural Networks [[pdf]](https://arxiv.org/pdf/2211.14646.pdf) [[code]](https://github.com/SriramB-98/layer_masking)
- **[Summit]** Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations [[pdf]](https://arxiv.org/pdf/1904.02323.pdf)
- **[SHAP]** A Unified Approach to Interpreting Model Predictions [[pdf]](https://arxiv.org/pdf/1705.07874.pdf) [[code]](https://github.com/slundberg/shap)
- **[MM-SHAP]** A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks [[pdf]](https://arxiv.org/pdf/2212.08158.pdf) [[code]](https://github.com/Heidelberg-NLP/MM-SHAP) [[video]](https://www.youtube.com/watch?v=RLaiomLMK9I&ab_channel=AICoffeeBreakwithLetitia)
- **[Anchors]** High-Precision Model-Agnostic Explanations [[pdf]](https://homes.cs.washington.edu/~marcotcr/aaai18.pdf) [[code]](https://github.com/marcotcr/anchor)
- **[Layer Conductance]** How Important Is a Neuron? [[pdf]](https://arxiv.org/pdf/1805.12233.pdf) [[pdf]](https://arxiv.org/pdf/1807.09946.pdf)
- **[BiLRP]** Building and Interpreting Deep Similarity Models [[pdf]](https://arxiv.org/pdf/2003.05431.pdf) [[code]](https://github.com/oeberle/BiLRP_explain_similarity)
- **[CGC]** Consistent Explanations by Contrastive Learning [[pdf]](https://arxiv.org/pdf/2110.00527.pdf) [[code]](https://github.com/UCDvision/CGC)
- **[DeepInversion]** Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion [[pdf]](https://arxiv.org/pdf/1912.08795.pdf) [[code]](https://github.com/NVlabs/DeepInversion)
- **[GradInversion]** See through Gradients: Image Batch Recovery via GradInversion [[pdf]](https://arxiv.org/pdf/2104.07586.pdf)
- **[GradViT]** Gradient Inversion of Vision Transformers [[pdf]](https://arxiv.org/pdf/2203.11894.pdf)
- **[Plug-In Inversion]** Model-Agnostic Inversion for Vision with Data Augmentations [[pdf]](https://proceedings.mlr.press/v162/ghiasi22a/ghiasi22a.pdf)
- **[GIFD]** A Generative Gradient Inversion Method with Feature Domain Optimization [[pdf]](https://arxiv.org/pdf/2308.04699.pdf)
- **[X-OIA]** Explainable Object-induced Action Decision for Autonomous Vehicles [[pdf]](https://arxiv.org/pdf/2003.09405.pdf) [[code]](https://github.com/Twizwei/bddoia_project) [[website]](https://twizwei.github.io/bddoia_project/)
- **[CAT-XPLAIN]** Causality for Inherently Explainable Transformers [[pdf]](https://arxiv.org/pdf/2206.14841.pdf) [[code]](https://github.com/mvrl/CAT-XPLAIN)
- **[CLRP]** Understanding Individual Decisions of CNNs via Contrastive Backpropagation [[pdf]](https://arxiv.org/pdf/1812.02100.pdf) [[code]](https://github.com/JindongGu/Contrastive-LRP)
- **[HINT]** Leveraging Explanations to Make Vision and Language Models More Grounded [[pdf]](https://arxiv.org/pdf/1902.03751.pdf)
- **[BagNet]** Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet [[pdf]](https://arxiv.org/pdf/1904.00760.pdf) [[code]](https://github.com/wielandbrendel/bag-of-local-features-models) [[blog]](https://sh-tsang.medium.com/review-bagnet-approximating-cnns-with-bag-of-local-features-models-works-surprisingly-well-on-125f4295c433)
- **[SMERF]** Sanity Simulations for Saliency Methods [[pdf]](https://arxiv.org/pdf/2105.06506.pdf)
- **[ELUDE]** Generating interpretable explanations via a decomposition into labelled and unlabelled features [[pdf]](https://arxiv.org/pdf/2206.07690.pdf)
- **[C3LT]** Cycle-Consistent Counterfactuals by Latent Transformations [[pdf]](https://arxiv.org/pdf/2203.15064.pdf)
- **[B-cos]** Alignment is All We Need for Interpretability [[pdf]](https://arxiv.org/pdf/2205.10268.pdf) [[code]](https://github.com/moboehle/B-cos)
- **[ShapNets]** Shapley Explanation Networks [[pdf]](https://arxiv.org/pdf/2104.02297.pdf) [[code]](https://github.com/inouye-lab/ShapleyExplanationNetworks)
- **[CALM]** Keep CALM and Improve Visual Feature Attribution [[pdf]](https://arxiv.org/pdf/2106.07861.pdf) [[code]](https://github.com/naver-ai/calm)
- **[SGLRP]** Explaining Convolutional Neural Networks using Softmax Gradient Layer-wise Relevance Propagation [[pdf]](https://arxiv.org/pdf/1908.04351.pdf)
- **[DTD]** Explaining NonLinear Classification Decisions with Deep Taylor Decomposition [[pdf]](https://arxiv.org/pdf/1512.02479.pdf) [[code]](https://github.com/myc159/Deep-Taylor-Decomposition)
- **[GradCAT]** Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding [[pdf]](https://arxiv.org/pdf/2105.12723.pdf)
- **[FastSHAP]** Real-Time Shapley Value Estimation [[pdf]](https://arxiv.org/pdf/2107.07436.pdf) [[code]](https://github.com/iancovert/fastshap)
- **[VisualBackProp]** Efficient visualization of CNNs [[pdf]](https://arxiv.org/pdf/1611.05418.pdf)
- **[NBDT]** Neural-Backed Decision Trees [[pdf]](https://arxiv.org/pdf/2004.00221.pdf) [[code]](https://github.com/alvinwan/neural-backed-decision-trees)
- **[XRAI]** Better Attributions Through Regions [[pdf]](https://arxiv.org/pdf/1906.02825.pdf)
- **[MeGe, ReCo]** How Good is your Explanation? Algorithmic Stability Measures to Assess the Quality of Explanations for Deep Neural Networks [[pdf]](https://arxiv.org/pdf/2009.04521.pdf)
- **[FCDD]** Explainable Deep One-Class Classification [[pdf]](https://arxiv.org/pdf/2007.01760.pdf) [[code]](https://github.com/liznerski/fcdd)
- **[DiCE]** Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations [[pdf]](https://arxiv.org/pdf/1905.07697.pdf) [[code]](https://github.com/interpretml/DiCE) [[blog]](https://www.microsoft.com/en-us/research/blog/open-source-library-provides-explanation-for-machine-learning-through-diverse-counterfactuals/)
- **[ARM]** Blending Anti-Aliasing into Vision Transformer [[pdf]](https://arxiv.org/pdf/2110.15156.pdf) [[code]](https://github.com/amazon-research/anti-aliasing-transformer)
- **[RelEx]** Building Reliable Explanations of Unreliable Neural Networks: Locally Smoothing Perspective of Model Interpretation [[pdf]](https://arxiv.org/pdf/2103.14332.pdf) [[code]](https://github.com/JBNU-VL/RelEx)
- **[X-Pruner]** eXplainable Pruning for Vision Transformers [[pdf]](https://arxiv.org/pdf/2303.04935.pdf) [[code]](https://github.com/vickyyu90/XPruner)
- **[ShearletX]** Explaining Image Classifiers with Multiscale Directional Image Representation [[pdf]](https://arxiv.org/pdf/2211.12857.pdf)
- **[MACO]** Unlocking Feature Visualization for Deeper Networks with MAgnitude Constrained Optimization [[pdf]](https://arxiv.org/pdf/2306.06805.pdf) [[website]](https://serre-lab.github.io/Lens/)
- **[Guided Zoom]** Questioning Network Evidence for Fine-Grained Classification [[pdf]](https://arxiv.org/pdf/1812.02626.pdf) [[pdf]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9335497) [[code]](https://github.com/andreazuna89/Guided-Zoom)
- **[DAAM]** Interpreting Stable Diffusion Using Cross Attention [[pdf]](https://arxiv.org/pdf/2210.04885.pdf) [[code]](https://github.com/castorini/daam) [[demo]](https://huggingface.co/spaces/tetrisd/Diffusion-Attentive-Attribution-Maps)
- **[Diffusion Explainer]** Visual Explanation for Text-to-image Stable Diffusion [[pdf]](https://arxiv.org/pdf/2305.03509.pdf) [[website]](https://poloclub.github.io/diffusion-explainer/) [[video]](https://www.youtube.com/watch?v=Zg4gxdIWDds&ab_channel=PoloClubofDataScience)
- **[ECLIP]** Exploring Visual Explanations for Contrastive Language-Image Pre-training [[pdf]](https://arxiv.org/pdf/2209.07046.pdf)
- **[CNC]** Correct-N-Contrast: A Contrastive Approach for Improving Robustness to Spurious Correlations [[pdf]](https://arxiv.org/pdf/2203.01517.pdf)
- **[AMC]** Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations [[pdf]](https://arxiv.org/pdf/2206.15462.pdf)
- **[ClickMe]** Learning what and where to attend [[pdf]](https://arxiv.org/pdf/1805.08819.pdf)
- **[MaskTune]** Mitigating Spurious Correlations by Forcing to Explore [[pdf]](https://arxiv.org/pdf/2210.00055.pdf)
- **[CoDA-Nets]** Convolutional Dynamic Alignment Networks for Interpretable Classifications [[pdf]](https://arxiv.org/pdf/2104.00032.pdf)
- **[ABN]** Attention Branch Network: Learning of Attention Mechanism for Visual Explanation [[pdf]](https://arxiv.org/pdf/1812.10025.pdf) [[pdf]](https://arxiv.org/pdf/1905.03540.pdf)
- **[RES]** A Robust Framework for Guiding Visual Explanation [[pdf]](https://arxiv.org/pdf/2206.13413.pdf)
- **[IAA]** Aligning Eyes between Humans and Deep Neural Network through Interactive Attention Alignment [[pdf]](https://arxiv.org/pdf/2202.02838.pdf)
- **[DiFull]** Towards Better Understanding Attribution Methods [[pdf]](https://openaccess.thecvf.com/content/CVPR2022/papers/Rao_Towards_Better_Understanding_Attribution_Methods_CVPR_2022_paper.pdf) [[code]](https://github.com/sukrutrao/Attribution-Evaluation)
- **[AttentionViz]** A Global View of Transformer Attention [[pdf]](https://arxiv.org/pdf/2305.03210.pdf)
- **[Rosetta Neurons]** Mining the Common Units in a Model Zoo [[pdf]](https://arxiv.org/pdf/2306.09346.pdf) [[code]](https://github.com/yossigandelsman/rosetta_neurons) [[website]](https://yossigandelsman.github.io/rosetta_neurons/)
- **[SAFARI]** Versatile and Efficient Evaluations for Robustness of Interpretability [[pdf]](https://arxiv.org/pdf/2208.09418.pdf)
- **[LANCE]** Stress-testing Visual Models by Generating Language-guided Counterfactual Images [[pdf]](https://arxiv.org/pdf/2305.19164.pdf) [[code]](https://github.com/virajprabhu/LANCE) [[website]](https://virajprabhu.github.io//lance-web/)
- **[FunnyBirds]** A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods [[pdf]](https://arxiv.org/pdf/2308.06248.pdf) [[code]](https://github.com/visinf/funnybirds/)
- **[MAGI]** Multi-Annotated Explanation-Guided Learning [[pdf]](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_MAGI_Multi-Annotated_Explanation-Guided_Learning_ICCV_2023_paper.pdf)
- **[CCE]** Towards Visual Contrastive Explanations for Neural Networks [[pdf]](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_Counterfactual-based_Saliency_Map_Towards_Visual_Contrastive_Explanations_for_Neural_Networks_ICCV_2023_paper.pdf)
- **[CNN Filter DB]** An Empirical Investigation of Trained Convolutional Filters [[pdf]](https://openaccess.thecvf.com/content/CVPR2022/papers/Gavrikov_CNN_Filter_DB_An_Empirical_Investigation_of_Trained_Convolutional_Filters_CVPR_2022_paper.pdf) [[code]](https://github.com/paulgavrikov/CNN-Filter-DB)
- **[VLSlice]** Interactive Vision-and-Language Slice Discovery [[pdf]](https://arxiv.org/pdf/2309.06703.pdf) [[code]](https://github.com/slymane/vlslice) [[website]](https://ericslyman.com/vlslice/) [[demo]](https://drive.google.com/file/d/1JkbVXnCds6rOErUx-YWZmp3mQ3IDJuhi/view) [[video]](https://drive.google.com/file/d/1mOuvjphNb2xNDC7shoGbPwyjbfArwud4/view) [[video]](https://www.youtube.com/watch?v=2CMDcGGsMjo&list=PLUxOP3kBxs2JYA5KT0YEmNJEyjqAqLOf3&index=2&ab_channel=CollegeofEngineering-OregonStateUniversity)
- **[Feature Sieve]** Overcoming Simplicity Bias in Deep Networks using a Feature Sieve [[pdf]](https://arxiv.org/pdf/2301.13293.pdf) [[blog]](https://blog.research.google/2024/02/intervening-on-early-readouts-for.html)

## CAM Papers
- **[CAM]** Learning Deep Features for Discriminative Localization [[pdf]](https://arxiv.org/pdf/1512.04150.pdf)
- **[Grad-CAM]** Visual Explanations from Deep Networks via Gradient-based Localization [[pdf]](https://arxiv.org/pdf/1610.02391.pdf) [[code]](https://github.com/ramprs/grad-cam/) [[code]](https://github.com/ruthcfong/pytorch-grad-cam) [[website]](http://gradcam.cloudcv.org/)
- **[Grad-CAM++]** Improved Visual Explanations for Deep Convolutional Networks [[pdf]](https://arxiv.org/pdf/1710.11063.pdf) [[code]](https://github.com/adityac94/Grad_CAM_plus_plus)
- **[Score-CAM]** Score-Weighted Visual Explanations for Convolutional Neural Networks [[pdf]](https://arxiv.org/pdf/1910.01279.pdf) [[code]](https://github.com/haofanwang/Score-CAM) [[code]](https://github.com/yiskw713/ScoreCAM)
- **[LayerCAM]** Exploring Hierarchical Class Activation Maps for Localization [[pdf]](http://mftp.mmcheng.net/Papers/21TIP_LayerCAM.pdf) [[code]](https://github.com/PengtaoJiang/LayerCAM-jittor)
- **[Eigen-CAM]** Class Activation Map using Principal Components [[pdf]](https://arxiv.org/ftp/arxiv/papers/2008/2008.00299.pdf)
- **[XGrad-CAM]** Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs [[pdf]](https://arxiv.org/pdf/2008.02312.pdf) [[code]](https://github.com/Fu0511/XGrad-CAM)
- **[Ablation-CAM]** Visual Explanations for Deep Convolutional Network via Gradient-free Localization [[pdf]](https://openaccess.thecvf.com/content_WACV_2020/papers/Desai_Ablation-CAM_Visual_Explanations_for_Deep_Convolutional_Network_via_Gradient-free_Localization_WACV_2020_paper.pdf)
- **[Group-CAM]** Group Score-Weighted Visual Explanations for Deep Convolutional Networks [[pdf]](https://arxiv.org/pdf/2103.13859.pdf) [[code]](https://github.com/wofmanaf/Group-CAM)
- **[FullGrad]** Full-Gradient Representation for Neural Network Visualization [[pdf]](https://arxiv.org/pdf/1905.00780.pdf)
- **[Relevance-CAM]** Your Model Already Knows Where to Look [[pdf]](https://openaccess.thecvf.com/content/CVPR2021/papers/Lee_Relevance-CAM_Your_Model_Already_Knows_Where_To_Look_CVPR_2021_paper.pdf) [[code]](https://github.com/mongeoroo/Relevance-CAM)
- **[Poly-CAM]** High resolution class activation map for convolutional neural networks [[pdf]](https://arxiv.org/pdf/2204.13359.pdf) [[code]](https://github.com/aenglebert/polycam)
- **[Smooth Grad-CAM++]** An Enhanced Inference Level Visualization Technique for Deep Convolutional Neural Network Models [[pdf]](https://arxiv.org/pdf/1908.01224.pdf) [[code]](https://github.com/yiskw713/SmoothGradCAMplusplus)
- **[Zoom-CAM]** Generating Fine-grained Pixel Annotations from Image Labels [[pdf]](https://arxiv.org/pdf/2010.08644.pdf)
- **[FD-CAM]** Improving Faithfulness and Discriminability of Visual Explanation for CNNs [[pdf]](https://arxiv.org/pdf/2206.08792.pdf) [[code]](https://github.com/crishhh1998/FD-CAM)
- **[LIFT-CAM]** Towards Better Explanations of Class Activation Mapping [[pdf]](https://arxiv.org/pdf/2102.05228.pdf)
- **[Shap-CAM]** Visual Explanations for Convolutional Neural Networks based on Shapley Value [[pdf]](https://arxiv.org/pdf/2208.03608.pdf)
- **[HiResCAM]** Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks [[pdf]](https://arxiv.org/pdf/2011.08891.pdf)
- **[FAM]** Visual Explanations for the Feature Representations from Deep Convolutional Networks [[pdf]](https://openaccess.thecvf.com/content/CVPR2022/papers/Wu_FAM_Visual_Explanations_for_the_Feature_Representations_From_Deep_Convolutional_CVPR_2022_paper.pdf)
- **[MinMaxCAM]** Improving object coverage for CAM-based Weakly Supervised Object Localization [[pdf]](https://arxiv.org/pdf/2104.14375.pdf)

## LIME-based
- **[LIME]** "Why Should I Trust You?": Explaining the Predictions of Any Classifier [[pdf]](https://arxiv.org/pdf/1602.04938.pdf) [[code]](https://github.com/marcotcr/lime)
- **[InteractionLIME]** Model-Agnostic Visual Explanations via Approximate Bilinear Models [[pdf]](https://cris.vub.be/ws/portalfiles/portal/97448865/ICIP_2023_InteractionLIME_openAccess.pdf)
- **[NormLime]** A New Feature Importance Metric for Explaining Deep Neural Networks [[pdf]](https://arxiv.org/pdf/1909.04200.pdf)
- **[GALE]** Global Aggregations of Local Explanations for Black Box models [[pdf]](https://arxiv.org/pdf/1907.03039.pdf)
- **[D-LIME]** A Deterministic Local Interpretable Model-Agnostic Explanations Approach for Computer-Aided Diagnosis Systems [[pdf]](https://arxiv.org/pdf/1906.10263.pdf)

## Concept Bottleneck Models
- **[CBM]** Concept Bottleneck Models [[pdf]](https://arxiv.org/pdf/2007.04612.pdf) [[code]](https://github.com/yewsiang/ConceptBottleneck)
- **[Label-free CBM]** Label-Free Concept Bottleneck Models [[pdf]](https://arxiv.org/pdf/2304.06129.pdf) [[code]](https://github.com/Trustworthy-ML-Lab/Label-free-CBM)
- **[PCBMs]** Post-hoc Concept Bottleneck Models [[pdf]](https://arxiv.org/pdf/2205.15480.pdf) [[code]](https://github.com/mertyg/post-hoc-cbm)
- **[CDM]** Sparse Linear Concept Discovery Models [[pdf]](https://arxiv.org/pdf/2308.10782.pdf) [[code]](https://github.com/konpanousis/ConceptDiscoveryModels)
- **[BotCL]** Learning Bottleneck Concepts in Image Classification [[pdf]](https://arxiv.org/pdf/2304.10131.pdf) [[code]](https://github.com/wbw520/BotCL)
- **[LaBo]** Language Model Guided Concept Bottlenecks for Interpretable Image Classification [[pdf]](https://arxiv.org/pdf/2211.11158.pdf) [[code]](https://github.com/YueYANG1996/LaBo)
- **[CompMap]** Do Vision-Language Pretrained Models Learn Composable Primitive Concepts? [[pdf]](https://arxiv.org/pdf/2203.17271.pdf) [[code]](https://github.com/tttyuntian/vlm_primitive_concepts) [[website]](https://vlm-primitive-concepts.github.io/)
- **[FVLC]** Faithful Vision-Language Interpretation via Concept Bottleneck Models [[pdf]](https://openreview.net/pdf?id=rp0EdI8X4e)
- Promises and Pitfalls of Black-Box Concept Learning Models [[pdf]](https://arxiv.org/pdf/2106.13314.pdf)
- Do Concept Bottleneck Models Learn as Intended? [[pdf]](https://arxiv.org/pdf/2105.04289.pdf)

## Neuron Annotation
- **[Network Dissection]**: Quantifying Interpretability of Deep Visual Representations [[pdf]](https://arxiv.org/pdf/1704.05796.pdf) [[code]](https://github.com/CSAILVision/NetDissect) [[website]](http://netdissect.csail.mit.edu/)
- **[CLIP-Dissect]** Automatic Description of Neuron Representations in Deep Vision Networks [[pdf]](https://arxiv.org/pdf/2204.10965.pdf) [[code]](https://github.com/Trustworthy-ML-Lab/CLIP-dissect)
- **[Net2Vec]** Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks [[pdf]](https://arxiv.org/pdf/1801.03454.pdf)
- **[MILAN]** Natural Language Descriptions of Deep Visual Features [[pdf]](https://arxiv.org/pdf/2201.11114.pdf) [[code]](https://github.com/evandez/neuron-descriptions) [[website]](http://milan.csail.mit.edu/)
- **[INViTE]** INterpret and Control Vision-Language Models with Text Explanations [[pdf]](https://openreview.net/pdf?id=5iENGLEJKG) [[code]](https://github.com/tonychenxyz/vit-interpret)
- **[CLIP-Decomposition]** Interpreting CLIP's Image Representation via Text-Based Decomposition [[pdf]](https://arxiv.org/pdf/2310.05916.pdf) [[code]](https://github.com/yossigandelsman/clip_text_span) [[website]](https://yossigandelsman.github.io/clip_decomposition/)
- **[ZS-A2T]** Zero-shot Translation of Attention Patterns in VQA Models to Natural Language [[pdf]](https://arxiv.org/pdf/2311.05043.pdf) [[code]](https://github.com/ExplainableML/ZS-A2T)
- **[FALCON]** Identifying Interpretable Subspaces in Image Representations [[pdf]](https://arxiv.org/pdf/2307.10504.pdf) [[code]](https://github.com/NehaKalibhat/falcon-explain)
- **[STAIR]** Learning Sparse Text and Image Representation in Grounded Tokens [[pdf]](https://arxiv.org/pdf/2301.13081.pdf)
- **[DISCOVER]** Making Vision Networks Interpretable via Competition and Dissection [[pdf]](https://openreview.net/pdf?id=sWNOvNXGLP)
- **[DeViL]** Decoding Vision features into Language [[pdf]](https://arxiv.org/pdf/2309.01617.pdf) [[code]](https://github.com/ExplainableML/DeViL)
- **[LaViSE]** Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention [[pdf]](https://arxiv.org/pdf/2204.04601.pdf) [[code]](https://github.com/YuYang0901/LaViSE)

## Prototype/Concept-Based
- **[ProtoTrees]** Neural Prototype Trees for Interpretable Fine-grained Image Recognition [[pdf]](https://arxiv.org/pdf/2012.02046.pdf) [[code]](https://github.com/M-Nauta/ProtoTree)
- **[ProtoPNet]** This Looks Like That: Deep Learning for Interpretable Image Recognition [[pdf]](https://arxiv.org/pdf/1806.10574.pdf) [[code]](https://github.com/cfchen-duke/ProtoPNet)
- **[ST-ProtoPNet]** Learning Support and Trivial Prototypes for Interpretable Image Classification [[pdf]](https://arxiv.org/pdf/2301.04011.pdf)
- **[Deformable ProtoPNet]** An Interpretable Image Classifier Using Deformable Prototypes [[pdf]](https://arxiv.org/pdf/2111.15000.pdf)
- **[SPARROW]** Semantically Coherent Prototypes for Image Classification [[pdf]](https://www.bmvc2021-virtualconference.com/assets/papers/0896.pdf)
- **[Proto2Proto]** Can you recognize the car, the way I do? [[pdf]](https://arxiv.org/pdf/2204.11830.pdf) [[code]](https://github.com/archmaester/proto2proto)
- **[PDiscoNet]** Semantically consistent part discovery for fine-grained recognition [[pdf]](https://arxiv.org/pdf/2309.03173.pdf) [[code]](https://github.com/robertdvdk/part_detection)
- **[ProtoPool]** Interpretable Image Classification with Differentiable Prototypes Assignment [[pdf]](https://arxiv.org/pdf/2112.02902.pdf) [[code]](https://github.com/gmum/ProtoPool)
- **[ProtoPShare]** Prototype Sharing for Interpretable Image Classification and Similarity Discovery [[pdf]](https://arxiv.org/pdf/2011.14340.pdf)
- **[PW-Net]** Towards Interpretable Deep Reinforcement Learning with Human-Friendly Prototypes [[pdf]](https://openreview.net/pdf?id=hWwY_Jq0xsN) [[code]](https://openreview.net/attachment?id=hWwY_Jq0xsN&name=supplementary_material)
- **[ProtoPDebug]** Concept-level Debugging of Part-Prototype Networks [[pdf]](https://arxiv.org/pdf/2205.15769.pdf)
- **[DSX]** Describe, Spot and Explain: Interpretable Representation Learning for Discriminative Visual Reasoning [[pdf]](https://ieeexplore.ieee.org/document/10106785)
- **[HINT]** Hierarchical Neuron Concept Explainer [[pdf]](https://arxiv.org/pdf/2203.14196.pdf) [[code]](https://github.com/AntonotnaWang/HINT)
- **[ConceptSHAP]** On Completeness-aware Concept-Based Explanations in Deep Neural Networks [[pdf]](https://arxiv.org/pdf/1910.07969.pdf) [[code]](https://github.com/chihkuanyeh/concept_exp)
- **[CW]** Concept Whitening for Interpretable Image Recognition [[pdf]](https://arxiv.org/pdf/2002.01650.pdf)
- **[VRX]** Interpreting with Structural Visual Concepts [[pdf]](https://arxiv.org/pdf/2105.00290.pdf)
- **[MOCE]** Extracting Model-Oriented Concepts for Explaining Deep Neural Networks [[pdf]](https://ieeexplore.ieee.org/document/10412652) [[code]](https://github.com/gyeomo/MOCE)
- **[ConceptExplainer]** Interactive Explanation for Deep Neural Networks from a Concept Perspective [[pdf]](https://arxiv.org/pdf/2204.01888.pdf)
- **[ProtoSim]** Prototype-based Dataset Comparison [[pdf]](https://arxiv.org/pdf/2309.02401.pdf) [[code]](https://github.com/Nanne/ProtoSim) [[website]](https://nanne.github.io/ProtoSim/)
- **[TCAV]** Quantitative Testing with Concept Activation Vectors [[pdf]](https://arxiv.org/pdf/1711.11279.pdf) [[code]](https://github.com/tensorflow/tcav) [[book chapter]](https://christophm.github.io/interpretable-ml-book/detecting-concepts.html)
- **[SACV]** Hidden Layer Interpretation with Spatial Activation Concept Vector [[pdf]](https://arxiv.org/pdf/2205.11511.pdf) [[code]](https://github.com/AntonotnaWang/Spatial-Activation-Concept-Vector)
- **[ACE]** Towards Automatic Concept-based Explanations [[pdf]](https://arxiv.org/pdf/1902.03129.pdf) [[code]](https://github.com/amiratag/ACE)
- **[DFF]** Deep Feature Factorization For Concept Discovery [[pdf]](https://arxiv.org/pdf/1806.10206.pdf) [[code]](https://github.com/edocollins/DFF) [[code]](https://github.com/jacobgil/pytorch-grad-cam/blob/master/pytorch_grad_cam/feature_factorization/deep_feature_factorization.py) [[blog and code]](https://jacobgil.github.io/pytorch-gradcam-book/Deep%20Feature%20Factorizations.html)
- **[CRP]** From “Where” to “What”: Towards Human-Understandable Explanations through Concept Relevance Propagation [[pdf]](https://arxiv.org/pdf/2206.03208.pdf) [[code]](https://github.com/rachtibat/zennit-crp)
- **[FeatUp]** A Model-Agnostic Framework for Features at Any Resolution [[pdf]](https://arxiv.org/pdf/2403.10516.pdf) [[code]](https://github.com/mhamilton723/FeatUp) [[colab]](https://colab.research.google.com/github/mhamilton723/FeatUp/blob/main/example_usage.ipynb) [[website]](https://mhamilton.net/featup.html) [[demo]](https://huggingface.co/spaces/mhamilton723/FeatUp)
- **[LENS]** A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation [[pdf]](https://arxiv.org/pdf/2306.07304.pdf) [[website]](https://serre-lab.github.io/Lens/)
- **[CRAFT]** Concept Recursive Activation FacTorization for Explainability [[pdf]](https://arxiv.org/pdf/2211.10154.pdf) [[code]](https://github.com/deel-ai/Craft) [[website]](https://serre-lab.github.io/Lens/)
- Deep ViT Features as Dense Visual Descriptors [[pdf]](https://arxiv.org/pdf/2112.05814.pdf) [[supp]](https://dino-vit-features.github.io/sm/index.html) [[code]](https://github.com/ShirAmir/dino-vit-features) [[website]](https://dino-vit-features.github.io/index.html)
- Evaluation and Improvement of Interpretability for Self-Explainable Part-Prototype Networks [[pdf]](https://arxiv.org/pdf/2212.05946.pdf) [[code]](https://github.com/hqhQAQ/EvalProtoPNet)

## Distill Papers
- [Distill](https://distill.pub/)
- Multimodal Neurons in Artificial Neural Networks [[paper]](https://distill.pub/2021/multimodal-neurons/) [[blog]](https://openai.com/blog/multimodal-neurons/) [[code]](https://github.com/openai/CLIP-featurevis)
- The Building Blocks of Interpretability [[paper]](https://distill.pub/2018/building-blocks/)
- Visualizing the Impact of Feature Attribution Baselines [[paper]](https://distill.pub/2020/attribution-baselines/)
- An Overview of Early Vision in InceptionV1 [[paper]](https://distill.pub/2020/circuits/early-vision/)
- Feature Visualization [[paper]](https://distill.pub/2017/feature-visualization/)
- Differentiable Image Parameterizations [[paper]](https://distill.pub/2018/differentiable-parameterizations/)
- Deconvolution and Checkerboard Artifacts [[paper]](https://distill.pub/2016/deconv-checkerboard/)
- Visualizing memorization in RNNs [[paper]](https://distill.pub/2019/memorization-in-rnns/)
- Exploring Neural Networks with Activation Atlases [[paper]](https://distill.pub/2019/activation-atlas/)

## XAI/Analysis of Self-Supervised Models and Transfer Learning
- High Fidelity Visualization of What Your Self-Supervised Representation Knows About [[pdf]](https://arxiv.org/pdf/2112.09164.pdf)
- How Well Do Self-Supervised Models Transfer? [[pdf]](https://arxiv.org/pdf/2011.13377.pdf) [[code]](https://github.com/linusericsson/ssl-transfer)
- A critical analysis of self-supervision, or what we can learn from a single image [[pdf]](https://arxiv.org/pdf/1904.13132.pdf) [[code]](https://github.com/yukimasano/linear-probes) [[video]](https://www.youtube.com/watch?v=l5he9JNJqHA&t=24s&ab_channel=YannicKilcher)
- How transferable are features in deep neural networks? [[pdf]](https://arxiv.org/pdf/1411.1792.pdf)
- Understanding the Role of Self-Supervised Learning in Out-of-Distribution Detection Task [[pdf]](https://arxiv.org/pdf/2110.13435v1.pdf)
- Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning [[pdf]](https://arxiv.org/pdf/2206.08347.pdf) [[code]](https://github.com/mgwillia/unsupervised-analysis) [[website]](https://mgwillia.github.io/exploring-unsupervised/)
- Revealing the Dark Secrets of Masked Image Modeling [[pdf]](https://arxiv.org/pdf/2205.13543.pdf)
- Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization [[pdf]](https://arxiv.org/pdf/2012.02166.pdf) [[code]](https://github.com/shirgur/AGFVisualization)
- Understanding Failure Modes of Self-Supervised Learning [[pdf]](https://arxiv.org/pdf/2203.01881.pdf)
- Explaining Self-Supervised Image Representations with Visual Probing [[pdf]](https://www.ijcai.org/proceedings/2021/0082.pdf) [[pdf]](https://arxiv.org/pdf/2106.11054v1.pdf) [[code]](https://github.com/BioNN-InfoTech/visual-probes)
- Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations [[pdf]](https://arxiv.org/pdf/2304.13089.pdf)
- What Happens to the Source Domain in Transfer Learning? [[pdf]](https://openreview.net/pdf?id=BsqmRU5hkB)
- Overwriting Pretrained Bias with Finetuning Data [[pdf]](https://arxiv.org/pdf/2303.06167.pdf)
- Exploring Model Transferability through the Lens of Potential Energy [[pdf]](https://arxiv.org/pdf/2308.15074.pdf) [[code]](https://github.com/lixiaotong97/PED)
- How Far Pre-trained Models Are from Neural Collapse on the Target Dataset Informs their Transferability [[pdf]](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_How_Far_Pre-trained_Models_Are_from_Neural_Collapse_on_the_ICCV_2023_paper.pdf) [[supp]](https://openaccess.thecvf.com/content/ICCV2023/supplemental/Wang_How_Far_Pre-trained_Models_Are_from_Neural_Collapse_on_the_ICCV_2023_supplemental.pdf)
- What Contrastive Learning Learns Beyond Class-wise Features? [[pdf]](https://openreview.net/pdf?id=T-NiH_wB1O)
- Are Large-scale Datasets Necessary for Self-Supervised Pre-training? [[pdf]](https://arxiv.org/pdf/2112.10740.pdf)
- What makes instance discrimination good for transfer learning? [[pdf]](https://arxiv.org/pdf/2006.06606.pdf) [[website]](http://nxzhao.com/projects/good_transfer/)
- Revisiting the Transferability of Supervised Pretraining: an MLP Perspective [[pdf]](https://arxiv.org/pdf/2112.00496.pdf)
- Intriguing Properties of Contrastive Losses [[pdf]](https://arxiv.org/pdf/2011.02803.pdf) [[code]](https://github.com/google-research/simclr/tree/master/colabs/intriguing_properties)
- When Does Contrastive Visual Representation Learning Work? [[pdf]](https://arxiv.org/pdf/2105.05837.pdf)
- What Makes for Good Views for Contrastive Learning? [[pdf]](https://arxiv.org/pdf/2005.10243v1.pdf) [[code]](https://github.com/HobbitLong/PyContrast)
- What Should Not Be Contrastive in Contrastive Learning [[pdf]](https://arxiv.org/pdf/2008.05659.pdf)
- Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases [[pdf]](https://arxiv.org/pdf/2007.13916.pdf)
- Are all negatives created equal in contrastive instance discrimination? [[pdf]](https://arxiv.org/pdf/2010.06682.pdf)
- Improving Pixel-based MIM by Reducing Wasted Modeling Capability [[pdf]](https://arxiv.org/pdf/2308.00261.pdf) [[code]](https://github.com/open-mmlab/mmpretrain)

## Circuits/Mechanistic Interpretability
- Circuits [[series]](https://distill.pub/2020/circuits/)
- Transformer Circuits [[series]](https://transformer-circuits.pub/)
- Progress measures for grokking via mechanistic interpretability [[pdf]](https://arxiv.org/pdf/2301.05217.pdf)
- Circuit Component Reuse Across Tasks in Transformer Language Models [[pdf]](https://openreview.net/attachment?id=fpoAYV6Wsk&name=pdf)
- Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks [[pdf]](https://openreview.net/attachment?id=A0HKeKl4Nl&name=pdf)
- [TransformerLens](https://github.com/neelnanda-io/TransformerLens)

## Natural Language Explanations (Supervised)
- **[GVE]** Generating visual explanations [[pdf]](https://arxiv.org/pdf/1603.08507.pdf)
- **[PJ-X]** Multimodal Explanations: Justifying Decisions and Pointing to the Evidence [[pdf]](https://arxiv.org/pdf/1802.08129.pdf) [[code]](https://github.com/Seth-Park/MultimodalExplanations)
- **[FME]** Faithful Multimodal Explanation for Visual Question Answering [[pdf]](https://arxiv.org/pdf/1809.02805.pdf)
- **[RVT]** Natural Language Rationales with Full-Stack Visual Reasoning [[pdf]](https://arxiv.org/pdf/2010.07526.pdf) [[code]](https://github.com/allenai/visual-reasoning-rationalization)
- **[e-UG]** e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks [[pdf]](https://arxiv.org/pdf/2105.03761.pdf) [[code]](https://github.com/maximek3/e-ViL)
- **[NLX-GPT]** A Model for Natural Language Explanations in Vision and Vision-Language Tasks [[pdf]](https://arxiv.org/pdf/2203.05081.pdf) [[code]](https://github.com/fawazsammani/nlxgpt)
- **[Uni-NLX]** Unifying Textual Explanations for Vision and Vision-Language Tasks [[pdf]](https://arxiv.org/pdf/2308.09033.pdf) [[code]](https://github.com/fawazsammani/uni-nlx)
- **[Explain Yourself]** Leveraging Language Models for Commonsense Reasoning [[pdf]](https://arxiv.org/pdf/1906.02361.pdf)
- **[e-SNLI]** Natural Language Inference with Natural Language Explanations [[pdf]](https://arxiv.org/pdf/1812.01193.pdf)
- **[CLEVR-X]** A Visual Reasoning Dataset for Natural Language Explanations [[pdf]](https://arxiv.org/pdf/2204.02380.pdf) [[code]](https://github.com/ExplainableML/CLEVR-X) [[website]](https://explainableml.github.io/CLEVR-X/)
- **[VQA-E]** Explaining, Elaborating, and Enhancing Your Answers for Visual Questions [[pdf]](https://arxiv.org/pdf/1803.07464.pdf)
- **[PtE]** Are Training Resources Insufficient? Predict First Then Explain! [[pdf]](https://arxiv.org/pdf/2110.02056.pdf)
- **[WT5]** Training Text-to-Text Models to Explain their Predictions [[pdf]](https://arxiv.org/pdf/2004.14546.pdf)
- **[RExC]** Knowledge-Grounded Self-Rationalization via Extractive and Natural Language Explanations [[pdf]](https://arxiv.org/pdf/2106.13876.pdf) [[code]](https://github.com/majumderb/rexc)
- **[ELV]** Towards Interpretable Natural Language Understanding with Explanations as Latent Variables [[pdf]](https://arxiv.org/pdf/2011.05268.pdf) [[code]](https://github.com/JamesHujy/ELV)
- **[FEB]** Few-Shot Self-Rationalization with Natural Language Prompts [[pdf]](https://arxiv.org/pdf/2111.08284.pdf)
- **[CALeC]** Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations [[pdf]](https://arxiv.org/pdf/2207.11401.pdf)
- **[OFA-X]** Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations [[pdf]](https://arxiv.org/pdf/2212.04231.pdf) [[code]](https://github.com/ofa-x/OFA-X)
- **[S3C]** Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning [[pdf]](https://openaccess.thecvf.com/content/CVPR2023/papers/Suo_S3C_Semi-Supervised_VQA_Natural_Language_Explanation_via_Self-Critical_Learning_CVPR_2023_paper.pdf)
- **[ReVisE]** A Recursive Approach Towards Vision-Language Explanation [[pdf]](https://arxiv.org/pdf/2311.12391.pdf) [[code]](https://github.com/para-lost/ReVisE)
- **[Multimodal-CoT]** Multimodal Chain-of-Thought Reasoning in Language Models [[pdf]](https://arxiv.org/pdf/2302.00923.pdf) [[code]](https://github.com/amazon-science/mm-cot)
- **[CCoT]** Compositional Chain-of-Thought Prompting for Large Multimodal Models [[pdf]](https://arxiv.org/pdf/2311.17076.pdf)
- Grounding Visual Explanations [[pdf]](https://arxiv.org/pdf/1807.09685.pdf)
- Textual Explanations for Self-Driving Vehicles [[pdf]](https://arxiv.org/pdf/1807.11546.pdf) [[code]](https://github.com/JinkyuKimUCB/explainable-deep-driving)
- Measuring Association Between Labels and Free-Text Rationales [[pdf]](https://arxiv.org/pdf/2010.12762.pdf) [[code]](https://github.com/allenai/label_rationale_association)
- Reframing Human-AI Collaboration for Generating Free-Text Explanations [[pdf]](https://arxiv.org/pdf/2112.08674.pdf)
- Few-Shot Out-of-Domain Transfer Learning of Natural Language Explanations [[pdf]](https://arxiv.org/pdf/2112.06204.pdf)

## XAI for NLP
- Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned [[pdf]](https://arxiv.org/pdf/1905.09418.pdf) [[code]](https://github.com/lena-voita/the-story-of-heads)
- Quantifying Attention Flow in Transformers [[pdf]](https://arxiv.org/pdf/2005.00928.pdf)
- Locating and Editing Factual Associations in GPT [[pdf]](https://arxiv.org/pdf/2202.05262.pdf) [[code]](https://github.com/kmeng01/rome) [[colab]](https://colab.research.google.com/github/kmeng01/rome/blob/main/notebooks/rome.ipynb) [[colab]](https://colab.research.google.com/github/kmeng01/rome/blob/main/notebooks/causal_trace.ipynb) [[video]](https://www.youtube.com/watch?v=_NMQyOu2HTo&ab_channel=YannicKilcher) [[website]](https://rome.baulab.info/)
- Visualizing and Understanding Neural Machine Translation [[pdf]](https://aclanthology.org/P17-1106.pdf)
- Transformer Feed-Forward Layers Are Key-Value Memories [[pdf]](https://arxiv.org/pdf/2012.14913.pdf)
- A Diagnostic Study of Explainability Techniques for Text Classification [[pdf]](https://arxiv.org/pdf/2009.13295.pdf) [[code]](https://github.com/copenlu/xai-benchmark)
- A Survey of the State of Explainable AI for Natural Language Processing [[pdf]](https://arxiv.org/pdf/2010.00711.pdf)
- How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking [[pdf]](https://arxiv.org/pdf/2004.14992.pdf) [[code]](https://github.com/nicola-decao/diffmask)
- Why use attention as explanation when we have saliency methods? [[pdf]](https://arxiv.org/pdf/2010.05607.pdf)
- Attention is Not Only a Weight: Analyzing Transformers with Vector Norms [[pdf]](https://arxiv.org/pdf/2004.10102.pdf)
- Attention is not Explanation [[pdf]](https://arxiv.org/pdf/1902.10186.pdf)
- Attention is not not Explanation [[pdf]](https://arxiv.org/pdf/1908.04626.pdf)
- Analyzing Individual Neurons in Pre-trained Language Models [[pdf]](https://arxiv.org/pdf/2010.02695.pdf)
- Identifying and Controlling Important Neurons in Neural Machine Translation [[pdf]](https://arxiv.org/pdf/1811.01157.pdf)
- “Will You Find These Shortcuts?” A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification [[pdf]](https://arxiv.org/pdf/2111.07367.pdf) [[blog]](https://ai.googleblog.com/2022/12/will-you-find-these-shortcuts.html)
- Interpreting Language Models with Contrastive Explanations [[pdf]](https://arxiv.org/pdf/2202.10419.pdf) [[code]](https://github.com/kayoyin/interpret-lm)
- Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers [[pdf]](https://arxiv.org/pdf/2212.10559.pdf) [[pdf]](https://openreview.net/pdf?id=fzbHRjAd8U) [[code]](https://github.com/microsoft/LMOps)
- Discretized Integrated Gradients for Explaining Language Models [[pdf]](https://arxiv.org/pdf/2108.13654.pdf) [[code]](https://github.com/INK-USC/DIG)
- Did the Model Understand the Question? [[pdf]](https://arxiv.org/pdf/1805.05492.pdf)
- Explaining Compositional Semantics for Neural Sequence Models [[pdf]](https://arxiv.org/pdf/1911.06194.pdf) [[code]](https://github.com/INK-USC/hierarchical-explanation-neural-sequence-models)
- Fooling Explanations in Text Classifiers [[pdf]](https://openreview.net/pdf?id=j3krplz_4w6)
- Interpreting GPT: The Logit Lens [[blog]](https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens)
- A Circuit for Indirect Object Identification in GPT-2 small [[pdf]](https://openreview.net/pdf?id=NpsVSN6o4ul)
- Inside BERT from BERT-related-papers Github [[link]](https://github.com/tomohideshibata/BERT-related-papers#inside-bert)
- Massive Activations in Large Language Models [[pdf]](https://arxiv.org/pdf/2402.17762.pdf) [[code]](https://github.com/locuslab/massive-activations) [[website]](https://eric-mingjie.github.io/massive-activations/index.html)
- Language Models Represent Space and Time [[pdf]](https://arxiv.org/pdf/2310.02207.pdf) [[code]](https://github.com/wesg52/world-models)
- [Awesome LLM Interpretability](https://github.com/JShollaj/awesome-llm-interpretability)

## Review Papers
- Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications [[pdf]](https://arxiv.org/pdf/2003.07631.pdf)
- Benchmarking and Survey of Explanation Methods for Black Box Models [[pdf]](https://arxiv.org/pdf/2102.13076.pdf)
- An Empirical Study of Deep Neural Network Explanation Methods [[pdf]](https://proceedings.neurips.cc/paper/2020/file/2c29d89cc56cdb191c60db2f0bae796b-Paper.pdf) [[code]](https://github.com/nesl/Explainability-Study)
- Methods for Interpreting and Understanding Deep Neural Networks [[pdf]](https://arxiv.org/pdf/1706.07979.pdf)
- From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI [[pdf]](https://arxiv.org/pdf/2201.08164.pdf)
- Leveraging Explanations in Interactive Machine Learning: An Overview [[pdf]](https://arxiv.org/pdf/2207.14526.pdf)

## Object-Centric Learning
- **[SLOT-Attention]** Object-Centric Learning with Slot Attention [[pdf]](https://arxiv.org/pdf/2006.15055.pdf) [[code]](https://github.com/lucidrains/slot-attention) [[code]](https://github.com/evelinehong/slot-attention-pytorch)
- **[SCOUTER]** Slot Attention-based Classifier for Explainable Image Recognition [[pdf]](https://arxiv.org/pdf/2009.06138.pdf) [[code]](https://github.com/wbw520/scouter)
- **[SPOT]** Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers [[pdf]](https://arxiv.org/pdf/2312.00648.pdf) [[code]](https://github.com/gkakogeorgiou/spot)

## XAI Libraries for Vision
- [Captum](https://captum.ai/)
- PyTorch Grad-CAM [[github]](https://github.com/jacobgil/pytorch-grad-cam) [[docs]](https://jacobgil.github.io/pytorch-gradcam-book/introduction.html)
- Lucid [[tensorflow]](https://github.com/tensorflow/lucid) [[pytorch]](https://github.com/greentfrapp/lucent)
- Zennit [[github]](https://github.com/chr5tphr/zennit) [[docs]](https://zennit.readthedocs.io/en/latest/) [[paper]](https://arxiv.org/pdf/2106.13200.pdf)
- TorchCAM [[github]](https://github.com/frgfm/torch-cam) [[docs]](https://frgfm.github.io/torch-cam/) [[demo]](https://huggingface.co/spaces/frgfm/torch-cam)
- [pytorch-cnn-visualizations](https://github.com/utkuozbulak/pytorch-cnn-visualizations)
- VL-InterpreT [[pdf]](https://arxiv.org/pdf/2203.17247.pdf) [[github]](https://github.com/IntelLabs/VL-InterpreT) [[demo]](http://vlinterpret38-env-2.eba-bgxp4fxk.us-east-2.elasticbeanstalk.com/) [[video]](https://www.youtube.com/watch?v=4Rj15Hi_Pdo&ab_channel=CognitiveAI)
- [DeepExplain](https://github.com/marcoancona/DeepExplain)
- TorchRay [[github]](https://github.com/facebookresearch/TorchRay) [[docs]](https://facebookresearch.github.io/TorchRay/)
- [grad-cam-pytorch](https://github.com/kazuto1011/grad-cam-pytorch)
- [ViT-Prisma](https://github.com/soniajoseph/ViT-Prisma)
- [CLIP Explainability](https://github.com/sMamooler/CLIP_Explainability)

## XAI Libraries for NLP
- [BertViz](https://github.com/jessevig/bertviz)
- [Transformers Interpret](https://github.com/cdpierse/transformers-interpret)
- [Ecco](https://github.com/jalammar/ecco)
- LIT [[github]](https://github.com/PAIR-code/lit) [[website]](https://pair-code.github.io/lit/) [[blog]](https://ai.googleblog.com/2020/11/the-language-interpretability-tool-lit.html)

## Other Awesomes
- [awesome-explainable-ai](https://github.com/wangyongjie-ntu/Awesome-explainable-AI)
- [awesome-xai](https://github.com/altamiracorp/awesome-xai)

## Other Resources
- [Explainable AI: Interpreting, Explaining and Visualizing Deep Learning](https://link.springer.com/book/10.1007/978-3-030-28954-6)
- [Interpretable Machine Learning](https://christophm.github.io/interpretable-ml-book/)
- [Transformer Circuits](https://transformer-circuits.pub/)
- [OpenAI Microscope](https://microscope.openai.com/models)
- [Summary - Captum](https://captum.ai/docs/attribution_algorithms)
- [Alibi Docs](https://docs.seldon.io/projects/alibi/en/stable/)
- [jacobgil blogs](https://jacobgil.github.io/)
- [Stanford CS231n slides](http://cs231n.stanford.edu/slides/2022/lecture_8_ruohan.pdf)
- [TU Berlin Notes](https://www3.math.tu-berlin.de/numerik/CoSIPICDL2017/Talks/mueller.pdf)
- [Tutorial Notebooks](https://github.com/1202kbs/Understanding-NN)
- NPTEL-NOC IITM Videos [[Early Methods]](https://www.youtube.com/watch?v=a4TDSLGhKi8&ab_channel=NPTEL-NOCIITM) [[Visualization Methods]](https://www.youtube.com/watch?v=u3FBpyUA1dc&ab_channel=NPTEL-NOCIITM) [[CAM Methods]](https://www.youtube.com/watch?v=VmbBnSv3otc&ab_channel=NPTEL-NOCIITM) [[Recent Methods]](https://www.youtube.com/watch?v=9OzwN-Ub6Lg&ab_channel=NPTEL-NOCIITM) [[Beyond Explaining]](https://www.youtube.com/watch?v=9Moxmab_Y4I&ab_channel=NPTEL-NOCIITM)
- [AI Explained Video Series by Fiddler AI](https://www.youtube.com/watch?v=TORUp11Of-8&list=PL9ekywqME2AiINPJUmAy2bk0DoSRlltP-&index=1&ab_channel=FiddlerAI)
- [XAI Explained Video Series by DeepFindr](https://www.youtube.com/watch?v=OZJ1IgSgP9E&list=PLV8yxwGOxvvovp-j6ztxhF3QcKXT6vORU&index=1&ab_channel=DeepFindr)
- [Visualizing and Understanding Stanford Video](https://www.youtube.com/watch?v=6wcs6szJWMY&t=2668s&ab_channel=StanfordUniversitySchoolofEngineering)
- [CVPR 2021 Tutorial](https://interpretablevision.github.io/)
- [CVPR 2023 Tutorial](https://all-things-vits.github.io/atv/)
- [CS231n Assignments Solutions](https://github.com/srinadhu/CS231n)
- Filter and Feature Maps Visualization [[blog]](https://towardsdatascience.com/how-to-visualize-convolutional-features-in-40-lines-of-code-70b7d87b0030) [[blog]](https://www.kaggle.com/code/magokecol/pytorch-feature-maps-visualizer-snake-version/notebook) [[blog]](https://debuggercafe.com/visualizing-filters-and-feature-maps-in-convolutional-neural-networks-using-pytorch/) [[pytorch discuss]](https://discuss.pytorch.org/t/visualize-feature-map/29597/2)
- Hooks in PyTorch [[tutorial]](https://web.stanford.edu/~nanbhas/blog/forward-hooks-pytorch/) [[tutorial]](https://towardsdatascience.com/the-one-pytorch-trick-which-you-should-know-2d5e9c1da2ca) [[tutorial]](https://medium.com/the-dl/how-to-use-pytorch-hooks-5041d777f904) [[tutorial]](https://pytorch.org/tutorials/beginner/former_torchies/nnft_tutorial.html)
- Feature Extraction using Torch FX [[tutorial]](https://pytorch.org/blog/FX-feature-extraction-torchvision/)
- Feature extraction for model inspection [[tutorial]](https://pytorch.org/vision/stable/feature_extraction.html)