{"id":3220,"url":"https://github.com/BirkhoffG/Explainable-ML-Papers","name":"Explainable-ML-Papers","description":"A list of research papers of explainable machine learning.","projects_count":59,"last_synced_at":"2026-07-04T15:00:19.130Z","repository":{"id":121540389,"uuid":"294112700","full_name":"BirkhoffG/Explainable-ML-Papers","owner":"BirkhoffG","description":"A list of research papers of explainable machine learning.","archived":false,"fork":false,"pushed_at":"2021-06-25T12:13:01.000Z","size":14,"stargazers_count":49,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2026-06-16T22:04:46.251Z","etag":null,"topics":["academic","awesome","counterfactual-explanations","explainability","explainable-ml","explanations","human-ai-interaction","human-in-the-loop","human-in-the-loop-machine-learning","interpretability","interpretable-ml","interpretable-models","machine-learning","paper","recourse","research","survey","trustworthy-machine-learning","xai"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BirkhoffG.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-09-09T12:54:35.000Z","updated_at":"2026-05-05T13:32:51.000Z","dependencies_parsed_at":"2024-01-07T04:54:20.649Z","dependency_job_id":null,"html_url":"https://github.com/BirkhoffG/Explainable-ML-Papers","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/BirkhoffG/Explainable-ML-Papers","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BirkhoffG%2FExplainable-ML-Papers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BirkhoffG%2FExplainable-ML-Papers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BirkhoffG%2FExplainable-ML-Papers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BirkhoffG%2FExplainable-ML-Papers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BirkhoffG","download_url":"https://codeload.github.com/BirkhoffG/Explainable-ML-Papers/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BirkhoffG%2FExplainable-ML-Papers/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35125718,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-04T02:00:05.987Z","response_time":113,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"created_at":"2024-01-04T20:29:26.780Z","updated_at":"2026-07-04T15:00:19.130Z","primary_language":null,"list_of_lists":false,"displayable":true,"categories":["1. General Idea","5. Evaluate Explainable ML","6. Useful Resources","2. Global Explanation","3. Local Explanation","4. Explainability in Human-in-the-loop ML"],"sub_categories":["Survey","Evaluating Interpretability","Courses \u0026 Talks","Interpretable Models","Model Distillation","Representation-based Explanation","Self-Explaining Neural Network","Feature-based Explanation","Example-based Explanation","Counterfactual Explanation","Evaluating Faithfulness","Toolbox","Collections of Resources"],"readme":"# Papers on Explainable Machine Learning\n\nThis repository includes a collection of awesome research papers on **Explainable Machine Learning** (also referred as Explainable AI/XAI, Interpretable Machine Learning). As a rapidly emerging field, it can be frustrated when starting researching this field buried by enormous amount of papers (and un-unified terminologies). I hope this repository can help new ML researchers/practitioners to learn about this field with lesser pain and stress.\n\nUnlike most repositories you find in GitHub which maintain a comprehensive list of resources in Explainable ML, I try to keep this list short to make it less intimating for beginners. It is definitely an objective selection which is based on my preferences and research tastes. \n\n\u003e Papers marked in **bold** are highly recommended to read.\n\n## 1. General Idea\n\n### Survey\n\n- The Mythos of Model Interpretability. *Lipton, 2016* [pdf](https://arxiv.org/abs/1606.03490)\n\n- Open the Black Box Data-Driven Explanation of Black Box Decision Systems. *Pedreschi et al.* [pdf](https://arxiv.org/pdf/1806.09936.pdf) \n\n- Techniques for Interpretable Machine Learning. *Du et al. 2018* [pdf](https://arxiv.org/pdf/1808.00033.pdf)\n  \u003cdetails\u003e\u003csummary\u003enotes\u003c/summary\u003e\n\n  - interpretable models (adding interpretable constraints, mimic learning) \n  - post-hoc global explanation, and post-hoc local explanation\n  \u003c/details\u003e\n\n- Explaining Explanations in AI. *Mittelstadt et. al., 2019* [pdf](https://arxiv.org/pdf/1811.01439.pdf)\n\n- **Explanation in artificial intelligence: Insights from the social sciences. *Miller, 2019*** [pdf](https://www.sciencedirect.com/science/article/pii/S0004370218305988)\n\n- **Explaining Explanations: An Overview of Interpretability of Machine Learning. *Gilpin et al. 2019*** [pdf](https://arxiv.org/pdf/1806.00069.pdf)\n  \u003cdetails\u003e\u003csummary\u003enotes\u003c/summary\u003e\n\n  - *tradeoff* between **Interpretability** and **completeness**: \n    - **Interpretability**: describe the internals of a system in a way that is *understandable* to human.\n    - **completeness**: describe the operation of a systm in an accurate way.  \n  \u003c/details\u003e\n\n- **Interpretable machine learning: definitions, methods, and applications. *Murdoch et al. 2019*** [pdf](https://arxiv.org/pdf/1901.04592v1.pdf) \n  \n- Explaining Deep Neural Networks. *Camburu, 2020* [pdf](https://arxiv.org/pdf/2010.01496.pdf)\n\n\n## 2. Global Explanation\n\n### Interpretable Models\n\n- **Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.** *Rudin, 2019* [pdf](https://www.nature.com/articles/s42256-019-0048-x.pdf)\n\n**Generalized Addictive Model**\n\n- Accurate intelligible models with pairwise interactions. *Lou et. al., 2013* [pdf](http://www.cs.cornell.edu/~yinlou/papers/lou-kdd13.pdf)\n\n- Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission. *Caruana et. al., 2015* [pdf](https://dl.acm.org/doi/pdf/10.1145/2783258.2788613) | [InterpretableML](https://github.com/interpretml/interpret)\n\n**Rule-based Method**\n\n- Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. *Letham et. al., 2015* [pdf](https://arxiv.org/pdf/1511.01644.pdf)\n\n- Interpretable Decision Sets: A Joint Framework for Description and Prediction. *Lakkaraju et. al., 2016* [pdf](https://dl.acm.org/doi/pdf/10.1145/2939672.2939874)\n\n**Scoring System**\n\n- Optimized Scoring Systems: Toward Trust in Machine Learning for Healthcare and Criminal Justice. *Rudin, 2018* [pdf](https://pubsonline.informs.org/doi/pdf/10.1287/inte.2018.0957)\n\n\n### Model Distillation\n\u003e Use interpretable models to approximate blackbox learning;  similar to the imitation learning in RL.\n\n- Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation. *Tan et. al., 2018* [pdf](https://arxiv.org/pdf/1710.06169.pdf)\n\n- Faithful and Customizable Explanations of Black Box Models. *Lakkaraju et. al. 2019* [pdf](https://dl.acm.org/doi/pdf/10.1145/3306618.3314229)\n\n\n### Representation-based Explanation\n\n- Interpretability Beyond Feature Attribution:  Quantitative Testing with Concept Activation Vectors (TCAV), *Kim et. al. 2018* [pdf](http://proceedings.mlr.press/v80/kim18d/kim18d.pdf)\n\n- This Looks Like That: Deep Learning for Interpretable Image Recognition. *Chen et al., 2019* [pdf](http://arxiv.org/abs/1806.10574)\n  \u003cdetails\u003e\u003csummary\u003eRelated papers\u003c/summary\u003e\n\n  - This Looks Like That, Because ... Explaining Prototypes for Interpretable Image Recognition. *Nauta et al., 2020* [pdf](https://arxiv.org/pdf/2011.02863.pdf)\n  - Learning to Explain With Complemental Examples. *Kanehira \u0026 Harada, 2019* [pdf](https://openaccess.thecvf.com/content_CVPR_2019/papers/Kanehira_Learning_to_Explain_With_Complemental_Examples_CVPR_2019_paper.pdf)\n  \u003c/details\u003e\n\n### Self-Explaining Neural Network\n\u003e Also offers example-based explanation\n\n\n- Towards Robust Interpretability with Self-Explaining Neural Networks. *Alvarez-Melis et. al., 2018* [pdf](http://papers.nips.cc/paper/8003-towards-robust-interpretability-with-self-explaining-neural-networks.pdf) \n\n- Deep Weighted Averaging Classifiers. *Card et al., 2019* [pdf](http://arxiv.org/pdf/1811.02579.pdf)\n\n\n## 3. Local Explanation\n\n\u003e Note: cumulating multiple local explanations can be viewed as constructing a global explanation.\n\n### Feature-based Explanation\n\n- Permutation importance: a corrected feature importance measure. *Altmann et. al. 2010* [link](https://academic.oup.com/bioinformatics/article/26/10/1340/193348) | [sklearn](https://scikit-learn.org/stable/modules/permutation_importance.html)\n\n- **\"Why Should I Trust You?\" Explaining the Predictions of Any Classifier. *Ribeiro et. al., 2016*** [pdf](http://sameersingh.org/files/papers/lime-kdd16.pdf) | [LIME](https://github.com/marcotcr/lime)\n\n- A Unified Approach to Interpreting Model Predictions. Lundberg \u0026 Lee, 2017 [pdf](http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf) | [SHAP](https://github.com/slundberg/shap)\n\n- Anchors: High-Precision Model-Agnostic Explanations. *Ribeiro et. al. 2018* [pdf](https://homes.cs.washington.edu/~marcotcr/aaai18.pdf)\n\n### Example-based Explanation\n\n- Examples are not enough, learn to criticize! Criticism for Interpretability. *Kim et. al., 2016* [pdf](https://beenkim.github.io/papers/KIM2016NIPS_MMD.pdf)\n\n### Counterfactual Explanation\n\n\u003e Also referred as algorithmic recourse or contrastive explanation.\n\n- Counterfactual Explanations for Machine Learning: A Review. *Verma et al., 2020* [pdf](https://arxiv.org/pdf/2010.10596.pdf)\n- A survey of algorithmic recourse: definitions, formulations, solutions, and prospects. *Karimi et al., 2020* [pdf](http://arxiv.org/pdf/2010.04050.pdf)\n\n**Minimize distance counterfactuals**\n\n- Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR. *Wachter et. al., 2017* [pdf](https://arxiv.org/ftp/arxiv/papers/1711/1711.00399.pdf)\n\n- Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations. *Mothilal et al., 2019* [pdf](https://arxiv.org/pdf/1905.07697.pdf)\n\n**Minimize cost (algorithmic recourse)**\n\n- Actionable Recourse in Linear Classification. *Ustun et al., 2019* [pdf](https://arxiv.org/pdf/1809.06514.pdf)\n\n- Algorithmic Recourse: from Counterfactual Explanations to Interventions. *Karimi et al., 2021* [pdf](https://arxiv.org/pdf/2002.06278.pdf)\n\n**Causal constraints**\n\n- Preserving Causal Constraints in Counterfactual Explanations for Machine Learning Classifiers. *Mahajan et al., 2020* [pdf](http://arxiv.org/pdf/1912.03277.pdf)\n\n\n## 4. Explainability in Human-in-the-loop ML\n\n\u003e HCI's perspective of Explainable ML\n\n- Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models. *Krause et. al., 2016* [pdf](https://dl.acm.org/doi/pdf/10.1145/2858036.2858529)\n- Human-centered Machine Learning: a Machine-in-the-loop Approach. *Tan, 2018* [blog](https://medium.com/@ChenhaoTan/human-centered-machine-learning-a-machine-in-the-loop-approach-ed024db34fe7)\n- Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda. *Abdul et. al., 2018* [pdf](https://www.cs.ubc.ca/~conati/522/532b-2019/papers/LinExplanationSurveyCHI2018Survey.pdf)\n- Explaining models: an empirical study of how explanations impact fairness judgment. *Dodge et. al., 2019* [pdf](https://arxiv.org/pdf/1901.07694.pdf)\n- Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making. *Cai et. al, 2019* [pdf](https://dl.acm.org/doi/pdf/10.1145/3290605.3300234)\n- **Designing Theory-Driven User-Centric Explainable AI. *Wang et. al., 2019*** [pdf](https://dl.acm.org/doi/pdf/10.1145/3290605.3300831)\n- **Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance.** *Bansal et al., 2021* [pdf](https://arxiv.org/pdf/2006.14779.pdf)\n\n\n## 5. Evaluate Explainable ML\n\n\u003e Evaluation of explainable ML can be loosely categorized into two classes:\n\u003e - *faithfulness* on evaluating how well the explanation reflects the true inner behavior of the black-box model.\n\u003e - *interpretability* on evaluating how understandable the explanation to human.\n\n- The Price of Interpretability. *Bertsimas et. al., 2019* [pdf](http://www.mit.edu/~jaillet/general/1907.03419.pdf)\n\n- Beyond Accuracy: Behavioral Testing of NLP Models with Checklist. *Ribeiro et. al., 2020* [pdf](https://arxiv.org/pdf/2005.04118.pdf) @ ACL 2020 Best Paper\n\n### Evaluating Faithfulness\n\n\u003e Evaluate whether or not the explanation faithfully reflects how model works (it turns out that 100% faithfully is often not the case in post-hoc explanations).\n\n- Sanity Checks for Saliency Maps *Adebayo et al., 2018* [pdf](https://papers.nips.cc/paper/2018/file/294a8ed24b1ad22ec2e7efea049b8737-Paper.pdf)\n- Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness? *Jacovi \u0026 Goldberg, 2020* [ACL](https://www.aclweb.org/anthology/2020.acl-main.386/)\n\n**Robust Explanation**\n\n- Interpretation of Neural Networks Is Fragile. *Ghorbani et. al., 2019* [pdf](https://www.aaai.org/ojs/index.php/AAAI/article/view/4252)\n\n- Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods. *Slack et. al., 2020* [pdf](https://arxiv.org/pdf/1911.02508.pdf)\n\n- Robust and Stable Black Box Explanations. *Lakkaraju et. al., 2020* [pdf](https://proceedings.icml.cc/static/paper_files/icml/2020/5945-Paper.pdf)\n### Evaluating Interpretability\n\u003e Evaluate interpretability (does the explanations make sense to human or not).\n\n- **Towards A Rigorous Science of Interpretable Machine Learning.** *Doshi-Velez \u0026 Kim. 2017* [pdf](https://arxiv.org/pdf/1702.08608.pdf)\n- 'It's Reducing a Human Being to a Percentage'; Perceptions of Justice in Algorithmic Decisions *Binns et al., 2018* [pdf](https://arxiv.org/pdf/1801.10408)\n- Human Evaluation of Models Built for Interpretability.  *Lage et. al., 2019* [pdf](https://aaai.org/ojs/index.php/HCOMP/article/view/5280/5132)\n- Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning. *Kaur et. al., 2019* [pdf](https://dl.acm.org/doi/pdf/10.1145/3313831.3376219)\n- Manipulating and Measuring Model Interpretability. *Poursabzi-Sangdeh et al., 2021* [pdf](https://arxiv.org/pdf/1802.07810.pdf)\n\n## 6. Useful Resources\n\n### Courses \u0026 Talks\n\n- **Tutorial on Explainable ML** [Website](https://explainml-tutorial.github.io/)\n- Interpretability and Explainability in Machine Learning, Fall 2019 *@ Harvard University by Hima Lakkaraju* [Course](https://interpretable-ml-class.github.io/)\n- Human-centered Machine Learning *@University of Colorado Boulder by Chenhao Tan* [course](https://github.com/BoulderDS/human-centered-machine-learning)\n- Model Explainability Forum *by TWIML AI Podcast* [YouTube](https://www.youtube.com/watch?v=B2QBnVnbt7A) | [link](https://twimlai.com/2020-model-explainability-forum/)\n\n### Collections of Resources\n- XAI-Papers [GitHub](https://github.com/anguyen8/XAI-papers)\n\n### Toolbox\n\n- InterpretML [GitHub](https://github.com/interpretml/interpret)\n\n","projects_url":"https://awesome.ecosyste.ms/api/v1/lists/birkhoffg%2Fexplainable-ml-papers/projects"}