awesome-machine-learning-interpretability

A curated list of awesome responsible machine learning resources.
https://github.com/jphall663/awesome-machine-learning-interpretability

Last synced: about 10 hours ago
JSON representation

Technical Resources
- Open Source/Access Responsible AI Software Packages
  - Scikit-learn - learn.org/stable/modules/decomposition.html#sparse-principal-components-analysis-sparsepca-and-minibatchsparsepca) | "a variant of [principal component analysis, PCA], with the goal of extracting the set of sparse components that best reconstruct the data.” |
  - arules
  - elasticnet - Net and also provides functions for doing sparse PCA." |
  - glmnet - net regularization path for linear regression, logistic and multinomial regression models, Poisson regression, Cox model, multiple-response Gaussian, and the grouped multinomial regression." |
  - rpart
  - Hugging Face, BiasAware: Dataset Bias Detection
  - TensorBoard Projector
  - What-if Tool
  - algofairness
  - Bayesian Case Model
  - Bayesian Rule List (BRL)
  - cdt15, Causal Discovery Lab., Shiga University - Gaussianity of the data." |
  - Falling Rule List (FRL)
  - Grad-CAM - CAM is a technique for making convolutional neural networks more transparent by visualizing the regions of input that are important for predictions in computer vision models. |
  - Sparse Principal Components (GLRM)
  - Monotonic
  - parity-fairness
  - ProtoPNet - duke?style=social) | "This code package implements the prototypical part network (ProtoPNet) from the paper "This Looks Like That: Deep Learning for Interpretable Image Recognition" (to appear at NeurIPS 2019), by Chaofan Chen (Duke University), Oscar Li| (Duke University), Chaofan Tao (Duke University), Alina Jade Barnett (Duke University), Jonathan Su (MIT Lincoln Laboratory), and Cynthia Rudin (Duke University).” |
  - rationale
  - Decision Trees - parametric supervised learning method used for classification and regression.” |
  - Generalized Linear Models
  - scikit-multiflow
  - text_explainability - known state-of-the-art explainability approaches for text can be composed.” |
  - text_sensitivity
  - ALEPlot - order interaction effects in black-box supervised learning models." |
  - DALEXtra: Extension for 'DALEX' Package
  - interpret: Fit Interpretable Machine Learning Models
  - fairness
  - forestmodel
  - fscaret
  - gam
  - glm2
  - Penalized Generalized Linear Models
  - Monotonic GBM
  - Sparse Principal Components (GLRM)
  - ICEbox: Individual Conditional Expectation Plot Toolbox
  - live
  - modelDown
  - modelOriented - based MI².AI. |
  - quantreg
  - RuleFit
  - Scalable Bayesian Rule Lists (SBRL)
  - shapper
  - smbinning
  - Scikit-Explain - friendly Python module for machine learning explainability," featuring PD and ALE plots, LIME, SHAP, permutation importance and Friedman's H, among other methods. |
  - LDNOOBW
  - RuleFit
  - fairness-comparison - ![](https://img.shields.io/github/stars/algofairness/fairness-comparison?style=social) | "meant to facilitate the benchmarking of fairness aware machine learning algorithms.” |
  - BlackBoxAuditing - ![](https://img.shields.io/github/stars/algofairness/BlackBoxAuditing?style=social) | "Research code for auditing and exploring black box machine-learning models.” |
  - themis-ml - ![](https://img.shields.io/github/stars/cosmicBboy/themis-ml?style=social) | "A Python library built on top of pandas and sklearnthat implements fairness-aware machine learning algorithms.” |
  - fairml - ![](https://img.shields.io/github/stars/adebayoj/fairml?style=social) | "a python toolbox auditing the machine learning models for bias.” |
  - Themis - ![](https://img.shields.io/github/stars/LASER-UMASS/Themis?style=social) | "A testing-based approach for measuring discrimination in a software system.” |
  - Explainable Boosting Machine EBM/GA2M - ![](https://img.shields.io/github/stars/interpretml/interpret?style=social) | "an open-source package that incorporates state-of-the-art machine learning interpretability techniques under one roof. With this package, you can train interpretable glassbox models and explain blackbox systems. InterpretML helps you understand your model's global behavior, or understand the reasons behind individual predictions.” |
  - vip - ![](https://img.shields.io/github/stars/koalaverse/vip?style=social) | "An R package for constructing variable importance plots (VIPs)." |
  - tensorflow/lucid - ![](https://img.shields.io/github/stars/tensorflow/lucid?style=social) | "a collection of infrastructure and tools for research in neural network interpretability.” |
  - tensorflow/model-analysis - ![](https://img.shields.io/github/stars/tensorflow/model-analysis?style=social) | "a library for evaluating TensorFlow models. It allows users to evaluate their models on large amounts of data in a distributed manner, using the same metrics defined in their trainer. These metrics can be computed over different slices of data and visualized in Jupyter notebooks.” |
  - ingredients - ![](https://img.shields.io/github/stars/ModelOriented/ingredients?style=social) | "A collection of tools for assessment of feature importance and feature effects."|
  - DrWhyAI - ![](https://img.shields.io/github/stars/ModelOriented/DrWhy?style=social) | "DrWhy is [a] collection of tools for eXplainable AI (XAI). It's based on shared principles and simple grammar for exploration, explanation and visualisation of predictive models." |
  - iBreakDown - ![](https://img.shields.io/github/stars/ModelOriented/iBreakDown?style=social) | "A model agnostic tool for explanation of predictions from black boxes ML models."|
  - dalex - ![](https://img.shields.io/github/stars/ModelOriented/DALEX?style=social) | "moDel Agnostic Language for Exploration and eXplanation.” |
  - xgboostExplainer - ![](https://img.shields.io/github/stars/AppliedDataSciencePartners/xgboostExplainer?style=social) | "An R package that makes xgboost models fully interpretable. |
  - PAIR-code - facets - ![](https://img.shields.io/github/stars/PAIR-code/facets?style=social) | "Visualizations for machine learning datasets." |
  - Keras-vis - ![](https://img.shields.io/github/stars/raghakot/keras-vis?style=social) | "a high-level toolkit for visualizing and debugging your trained keras neural net models.” |
  - tensorflow/model-card-toolkit - ![](https://img.shields.io/github/stars/tensorflow/model-card-toolkit?style=social) | "streamlines and automates generation of Model Cards, machine learning documents that provide context and transparency into a model's development and performance. Integrating the MCT into your ML pipeline enables you to share model metadata and metrics with researchers, developers, reporters, and more.” |
  - yellowbrick - ![](https://img.shields.io/github/stars/DistrictDataLabs/yellowbrick?style=social) | "A suite of visual diagnostic tools called "Visualizers" that extend the scikit-learn API to allow human steering of the model selection process.” |
  - xai - ![](https://img.shields.io/github/stars/EthicalML/xai?style=social) | "A Machine Learning library that is designed with AI explainability in its core.” |
  - cleverhans - ![](https://img.shields.io/github/stars/cleverhans-lab/cleverhans?style=social) | "An adversarial example library for constructing attacks, building defenses, and benchmarking both.” |
  - allennlp - ![](https://img.shields.io/github/stars/allenai/allennlp?style=social) | "An open-source NLP research library, built on PyTorch.” |
  - ydata-profiling - ![](https://img.shields.io/github/stars/ydataai/ydata-profiling?style=social) | "Provide(s) a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution.” |
  - lime - ![](https://img.shields.io/github/stars/marcotcr/lime?style=social) | "explaining what machine learning classifiers (or models) are doing. At the moment, we support explaining individual predictions for text classifiers or classifiers that act on tables (numpy arrays of numerical or categorical data) or images, with a package called lime (short for local interpretable model-agnostic explanations).” |
  - TensorWatch - ![](https://img.shields.io/github/stars/microsoft/tensorwatch?style=social) | "a debugging and visualization tool designed for data science, deep learning and reinforcement learning from Microsoft Research. It works in Jupyter Notebook to show real-time visualizations of your machine learning training and perform several other key analysis tasks for your models and data.” |
  - xdeep - ![](https://img.shields.io/github/stars/datamllab/xdeep?style=social) | "An open source Python library for Interpretable Machine Learning.” |
  - tf-explain - ![](https://img.shields.io/github/stars/sicara/tf-explain?style=social) | "Implements interpretability methods as Tensorflow 2.x callbacks to ease neural network's understanding.” |
  - tensorflow/lattice - ![](https://img.shields.io/github/stars/tensorflow/lattice?style=social) | "a library that implements constrained and interpretable lattice based models. It is an implementation of Monotonic Calibrated Interpolated Look-Up Tables in TensorFlow.” |
  - MindsDB - ![](https://img.shields.io/github/stars/mindsdb/mindsdb?style=social) | "enables developers to build AI tools that need access to real-time data to perform their tasks.” |
  - DiCE - ![](https://img.shields.io/github/stars/interpretml/DiCE?style=social) | "Generate Diverse Counterfactual Explanations for any machine learning model.” |
  - fairlearn - ![](https://img.shields.io/github/stars/fairlearn/fairlearn?style=social) | "a Python package that empowers developers of artificial intelligence (AI) systems to assess their system's fairness and mitigate any observed unfairness issues. Fairlearn contains mitigation algorithms as well as metrics for model assessment. Besides the source code, this repository also contains Jupyter notebooks with examples of Fairlearn usage.” |
  - causalml - ![](https://img.shields.io/github/stars/uber/causalml?style=social) | "Uplift modeling and causal inference with machine learning algorithms.” |
  - captum - ![](https://img.shields.io/github/stars/pytorch/captum?style=social) | "Model interpretability and understanding for PyTorch.” |
  - Causal Discovery Toolbox - ![](https://img.shields.io/github/stars/FenTechSolutions/CausalDiscoveryToolbox?style=social) | "Package for causal inference in graphs and in the pairwise settings. Tools for graph structure recovery and dependencies are included.” |
  - iNNvestigate neural nets - ![](https://img.shields.io/github/stars/albermax/innvestigate?style=social) | A comprehensive Python library to analyze and interpret neural network behaviors in Keras, featuring a variety of methods like Gradient, LRP, and Deep Taylor. |
  - Optimal Sparse Decision Trees - ![](https://img.shields.io/github/stars/xiyanghu/OSDT?style=social) | "This accompanies the paper, ["Optimal Sparse Decision Trees"](https://arxiv.org/abs/1904.12847) by Xiyang Hu, Cynthia Rudin, and Margo Seltzer.” |
  - sklearn-expertsys - ![](https://img.shields.io/github/stars/tmadl/sklearn-expertsys?style=social) | "a scikit-learn compatible wrapper for the Bayesian Rule List classifier developed by Letham et al., 2015, extended by a minimum description length-based discretizer (Fayyad & Irani, 1993) for continuous data, and by an approach to subsample large datasets for better performance.” |
  - imodels - ![](https://img.shields.io/github/stars/csinva/imodels?style=social) | "Python package for concise, transparent, and accurate predictive modeling. All sklearn-compatible and easy to use.” |
  - pyGAM - ![](https://img.shields.io/github/stars/dswah/pyGAM?style=social) | "Generalized Additive Models in Python.” |
  - shapley - ![](https://img.shields.io/github/stars/benedekrozemberczki/shapley?style=social) | "a Python library for evaluating binary classifiers in a machine learning ensemble.” |
  - ml_privacy_meter - ![](https://img.shields.io/github/stars/privacytrustlab/ml_privacy_meter?style=social) | "an open-source library to audit data privacy in statistical and machine learning algorithms. The tool can help in the data protection impact assessment process by providing a quantitative analysis of the fundamental privacy risks of a (machine learning) model.” |
  - AI Fairness 360 - ![](https://img.shields.io/github/stars/Trusted-AI/AIF360?style=social) | "A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.” |
  - aequitas - ![](https://img.shields.io/github/stars/dssg/aequitas?style=social) | "Aequitas is an open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers to audit machine learning models for discrimination and bias, and to make informed and equitable decisions around developing and deploying predictive tools.” |
  - deepvis - ![](https://img.shields.io/github/stars/yosinski/deep-visualization-toolbox?style=social) | "the code required to run the Deep Visualization Toolbox, as well as to generate the neuron-by-neuron visualizations using regularized optimization.” |
  - Alibi - ![](https://img.shields.io/github/stars/SeldonIO/alibi?style=social) | "Alibi is an open source Python library aimed at machine learning model inspection and interpretation. The focus of the library is to provide high-quality implementations of black-box, white-box, local and global explanation methods for classification and regression models.” |
  - pytorch-grad-cam - ![](https://img.shields.io/github/stars/jacobgil/pytorch-grad-cam?style=social) | "a package with state of the art methods for Explainable AI for computer vision. This can be used for diagnosing model predictions, either in production or while developing models. The aim is also to serve as a benchmark of algorithms and metrics for research of new explainability methods.” |
  - tensorflow/privacy - ![](https://img.shields.io/github/stars/tensorflow/privacy?style=social) | "the source code for TensorFlow Privacy, a Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy. The library comes with tutorials and analysis tools for computing the privacy guarantees provided.” |
  - manifold - ![](https://img.shields.io/github/stars/uber/manifold?style=social) | "A model-agnostic visual debugging tool for machine learning." |
  - PiML-Toolbox - ![](https://img.shields.io/github/stars/SelfExplainML/PiML-Toolbox?style=social) | "a new Python toolbox for interpretable machine learning model development and validation. Through low-code interface and high-code APIs, PiML supports a growing list of inherently interpretable ML models.” |
  - xplique - ![](https://img.shields.io/github/stars/deel-ai/xplique?style=social) | "A Python toolkit dedicated to explainability. The goal of this library is to gather the state of the art of Explainable AI to help you understand your complex neural network models.” |
  - dtreeviz - ![](https://img.shields.io/github/stars/parrt/dtreeviz?style=social) | "A python library for decision tree visualization and model interpretation.” |
  - explainerdashboard - ![](https://img.shields.io/github/stars/oegedijk/explainerdashboard?style=social) | "Quickly build Explainable AI dashboards that show the inner workings of so-called "blackbox" machine learning models.” |
  - explainX - ![](https://img.shields.io/github/stars/explainX/explainx?style=social) | "Explainable AI framework for data scientists. Explain & debug any blackbox machine learning model with a single line of code.” |
  - gplearn - ![](https://img.shields.io/github/stars/trevorstephens/gplearn?style=social) | "implements Genetic Programming in Python, with a scikit-learn inspired and compatible API.” |
  - ecco - ![](https://img.shields.io/github/stars/jalammar/ecco?style=social) | "Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BERT, RoBERTA, T5, and T0).” |
  - REVISE: REvealing VIsual biaSEs - ![](https://img.shields.io/github/stars/princetonvisualai/revise-tool?style=social) | "A tool that automatically detects possible forms of bias in a visual dataset along the axes of object-based, attribute-based, and geography-based patterns, and from which next steps for mitigation are suggested.” |
  - ExplainaBoard - ![](https://img.shields.io/github/stars/neulab/ExplainaBoard?style=social) | "a tool that inspects your system outputs, identifies what is working and what is not working, and helps inspire you with ideas of where to go next.” |
  - PDPbox - ![](https://img.shields.io/github/stars/SauceCat/PDPbox?style=social) | "Python Partial Dependence Plot toolbox. Visualize the influence of certain features on model predictions for supervised machine learning algorithms, utilizing partial dependence plots.” |
  - Giskard - ![](https://img.shields.io/github/stars/Giskard-AI/giskard?style=social) | "The testing framework dedicated to ML models, from tabular to LLMs. Scan AI models to detect risks of biases, performance issues and errors. In 4 lines of code.” |
  - anchor - ![](https://img.shields.io/github/stars/marcotcr/anchor?style=social) | "Code for 'High-Precision Model-Agnostic Explanations' paper.” |
  - foolbox - ![](https://img.shields.io/github/stars/bethgelab/foolbox?style=social) | "A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX.” |
  - TextFooler - ![](https://img.shields.io/github/stars/jind11/TextFooler?style=social) | "A Model for Natural Language Attack on Text Classification and Inference"
  - casme - ![](https://img.shields.io/github/stars/kondiz/casme?style=social) | "contains the code originally forked from the ImageNet training in PyTorch that is modified to present the performance of classifier-agnostic saliency map extraction, a practical algorithm to train a classifier-agnostic saliency mapping by simultaneously training a classifier and a saliency mapping.” |
  - ContrastiveExplanation - Foil Trees - ![](https://img.shields.io/github/stars/MarcelRobeer/ContrastiveExplanation?style=social) | "provides an explanation for why an instance had the current outcome (fact) rather than a targeted outcome of interest (foil). These counterfactual explanations limit the explanation to the features relevant in distinguishing fact from foil, thereby disregarding irrelevant features.” |
  - DeepLIFT - ![](https://img.shields.io/github/stars/kundajelab/deeplift?style=social) | "This repository implements the methods in 'Learning Important Features Through Propagating Activation Differences' by Shrikumar, Greenside & Kundaje, as well as other commonly-used methods such as gradients, gradient-times-input (equivalent to a version of Layerwise Relevance Propagation for ReLU networks), guided backprop and integrated gradients.” |
  - eli5 - ![](https://img.shields.io/github/stars/TeamHG-Memex/eli5?style=social) | "A library for debugging/inspecting machine learning classifiers and explaining their predictions.” |
  - Integrated-Gradients - ![](https://img.shields.io/github/stars/ankurtaly/Integrated-Gradients?style=social) | "a variation on computing the gradient of the prediction output w.r.t. features of the input. It requires no modification to the original network, is simple to implement, and is applicable to a variety of deep models (sparse and dense, text and vision).” |
  - L2X - ![](https://img.shields.io/github/stars/Jianbo-Lab/L2X?style=social) | "Code for replicating the experiments in the paper [Learning to Explain: An Information-Theoretic Perspective on Model Interpretation](https://arxiv.org/pdf/1802.07814.pdf) at ICML 2018, by Jianbo Chen, Mitchell Stern, Martin J. Wainwright, Michael I. Jordan.” |
  - lofo-importance - ![](https://img.shields.io/github/stars/aerdem4/lofo-importance?style=social) | "LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice, for a model of choice, by iteratively removing each feature from the set, and evaluating the performance of the model, with a validation scheme of choice, based on the chosen metric.” |
  - pyBreakDown - ![](https://img.shields.io/github/stars/MI2DataLab/pyBreakDown?style=social) | See [dalex](https://dalex.drwhy.ai/). |
  - responsibly - ![](https://img.shields.io/github/stars/ResponsiblyAI/responsibly?style=social) | "Toolkit for Auditing and Mitigating Bias and Fairness of Machine Learning Systems.” |
  - treeinterpreter - ![](https://img.shields.io/github/stars/andosa/treeinterpreter?style=social) | "Package for interpreting scikit-learn's decision tree and random forest predictions.” |
  - woe - ![](https://img.shields.io/github/stars/boredbird/woe?style=social) | "Tools for WoE Transformation mostly used in ScoreCard Model for credit rating.” |
  - PyCEbox - ![](https://img.shields.io/github/stars/AustinRochford/PyCEbox?style=social) | "Python Individual Conditional Expectation Plot Toolbox.” |
  - skope-rules - ![](https://img.shields.io/github/stars/scikit-learn-contrib/skope-rules?style=social) | "a Python machine learning module built on top of scikit-learn and distributed under the 3-Clause BSD license.” |
  - checklist - ![](https://img.shields.io/github/stars/marcotcr/checklist?style=social) | "Beyond Accuracy: Behavioral Testing of NLP models with CheckList.” |
  - SALib - ![](https://img.shields.io/github/stars/SALib/SALib?style=social) | "Python implementations of commonly used sensitivity analysis methods. Useful in systems modeling to calculate the effects of model inputs or exogenous factors on outputs of interest.” |
  - solas-ai-disparity - ![](https://img.shields.io/github/stars/SolasAI/solas-ai-disparity?style=social) | "a collection of tools that allows modelers, compliance, and business stakeholders to test outcomes for bias or discrimination using widely accepted fairness metrics.” |
  - DIANNA - ![](https://img.shields.io/github/stars/dianna-ai/dianna?style=social) | "DIANNA is a Python package that brings explainable AI (XAI) to your research project. It wraps carefully selected XAI methods in a simple, uniform interface. It's built by, with and for (academic) researchers and research software engineers working on machine learning projects.” |
  - fastshap - ![](https://img.shields.io/github/stars/bgreenwell/fastshap?style=social) | "The goal of fastshap is to provide an efficient and speedy approach (at least relative to other implementations) for computing approximate Shapley values, which help explain the predictions from any machine learning model." |
  - flashlight - ![](https://img.shields.io/github/stars/mayer79/flashlight?style=social) | "The goal of this package is [to] shed light on black box machine learning models." |
  - OptBinning - ![](https://img.shields.io/github/stars/guillermo-navas-palencia/optbinning?style=social) | "a library written in Python implementing a rigorous and flexible mathematical programming formulation to solve the optimal binning problem for a binary, continuous and multiclass target type, incorporating constraints not previously addressed.” |
  - ml-fairness-gym - ![](https://img.shields.io/github/stars/google/ml-fairness-gym?style=social) | "a set of components for building simple simulations that explore the potential long-run impacts of deploying machine learning-based decision systems in social environments.” |
  - Quantus - ![](https://img.shields.io/github/stars/understandable-machine-intelligence-lab/Quantus?style=social) | "Quantus is an eXplainable AI toolkit for responsible evaluation of neural network explanations." |
  - robustness - ![](https://img.shields.io/github/stars/MadryLab/robustness?style=social) | "a package we (students in the [MadryLab](http://madry-lab.ml/)) created to make training, evaluating, and exploring neural networks flexible and easy.” |
  - RISE - ![](https://img.shields.io/github/stars/eclique/RISE?style=social) | "contains source code necessary to reproduce some of the main results in the paper: [Vitali Petsiuk](http://cs-people.bu.edu/vpetsiuk/), [Abir Das](http://cs-people.bu.edu/dasabir/), [Kate Saenko](http://ai.bu.edu/ksaenko.html) (BMVC, 2018) [and] [RISE: Randomized Input Sampling for Explanation of Black-box Models](https://arxiv.org/abs/1806.07421).” |
  - DeepExplain - ![](https://img.shields.io/github/stars/marcoancona/DeepExplain?style=social) | "provides a unified framework for state-of-the-art gradient and perturbation-based attribution methods. It can be used by researchers and practitioners for better undertanding the recommended existing models, as well for benchmarking other attribution methods.” |
  - tensorflow/fairness-indicators - ![](https://img.shields.io/github/stars/tensorflow/fairness-indicators?style=social) | "designed to support teams in evaluating, improving, and comparing models for fairness concerns in partnership with the broader Tensorflow toolkit.” |
  - explabox - ![](https://img.shields.io/github/stars/MarcelRobeer/explabox?style=social) | "aims to support data scientists and machine learning (ML) engineers in explaining, testing and documenting AI/ML models, developed in-house or acquired externally. The explabox turns your ingestibles (AI/ML model and/or dataset) into digestibles (statistics, explanations or sensitivity insights).” |
  - modelStudio - ![](https://img.shields.io/github/stars/ModelOriented/modelStudio?style=social) | "Automates the explanatory analysis of machine learning predictive models."|
  - tensorflow/tcav - ![](https://img.shields.io/github/stars/tensorflow/tcav?style=social) | "Testing with Concept Activation Vectors (TCAV) is a new interpretability method to understand what signals your neural networks models uses for prediction.” |
  - Born-again Tree Ensembles - ![](https://img.shields.io/github/stars/vidalt/BA-Trees?style=social) | "Born-Again Tree Ensembles: Transforms a random forest into a single, minimal-size, tree with exactly the same prediction function in the entire feature space (ICML 2020)." |)
  - lime - ![](https://img.shields.io/github/stars/thomasp85/lime?style=social) | "R port of the Python lime package."|
  - lit - ![](https://img.shields.io/github/stars/pair-code/lit?style=social) | "The Learning Interpretability Tool (LIT, formerly known as the Language Interpretability Tool) is a visual, interactive ML model-understanding tool that supports text, image, and tabular data. It can be run as a standalone server, or inside of notebook environments such as Colab, Jupyter, and Google Cloud Vertex AI notebooks.” |
  - CalculatedContent, WeightWatcher - ![](https://img.shields.io/github/stars/calculatedcontent/weightwatcher?style=social) | "The WeightWatcher tool for predicting the accuracy of Deep Neural Networks." |
  - tensorflow/model-remediation - ![](https://img.shields.io/github/stars/tensorflow/model-remediation?style=social) | "a library that provides solutions for machine learning practitioners working to create and train models in a way that reduces or eliminates user harm resulting from underlying performance biases.” |
  - scikit-fairness - ![](https://img.shields.io/github/stars/koaning/scikit-fairness?style=social) | Historical link. Merged with [fairlearn](https://fairlearn.org/). |
  - learning-fair-representations - ![](https://img.shields.io/github/stars/zjelveh/learning-fair-representations?style=social) | "Python numba implementation of Zemel et al. 2013 <http://www.cs.toronto.edu/~toni/Papers/icml-final.pdf>" |
  - debiaswe - ![](https://img.shields.io/github/stars/tolga-b/debiaswe?style=social) | "Remove problematic gender bias from word embeddings.” |
  - pySS3 - ![](https://img.shields.io/github/stars/sergioburdisso/pyss3?style=social) | "The SS3 text classifier is a novel and simple supervised machine learning model for text classification which is interpretable, that is, it has the ability to naturally (self)explain its rationale.” |
  - LangFair - ![](https://img.shields.io/github/stars/cvs-health/langfair?style=social) | "LangFair is a Python library for conducting use-case level LLM bias and fairness assessments"
  - DoWhy - ![](https://img.shields.io/github/stars/microsoft/dowhy?style=social) | "DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.” |
  - shap - ![](https://img.shields.io/github/stars/slundberg/shap?style=social) | "a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions"
  - shapFlex - ![](https://img.shields.io/github/stars/nredell/shapFlex?style=social) | Computes stochastic Shapley values for machine learning models to interpret them and evaluate fairness, including causal constraints in the feature space. |
  - pymc3 - ![](https://img.shields.io/github/stars/pymc-devs/pymc3?style=social) | "PyMC (formerly PyMC3) is a Python package for Bayesian statistical modeling focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.” |
  - langtest - ![](https://img.shields.io/github/stars/JohnSnowLabs/langtest?style=social) | "LangTest: Deliver Safe & Effective Language Models" |
  - effector - ![](https://img.shields.io/github/stars/givasile/effector?style=social) | "eXplainable AI for Tabular Data" |
  - DiscriLens - ![](https://img.shields.io/github/stars/wangqianwen0418/DiscriLens?style=social) | "Discrimination in Machine Learning." |
  - PAIR-code - datacardsplaybook - ![](https://img.shields.io/github/stars/PAIR-code/datacardsplaybook?style=social) | "The Data Cards Playbook helps dataset producers and publishers adopt a people-centered approach to transparency in dataset documentation." |
  - PAIR-code - knowyourdata - ![](https://img.shields.io/github/stars/PAIR-code/knowyourdata?style=social) | "A tool to help researchers and product teams understand datasets with the goal of improving data quality, and mitigating fairness and bias issues." |
  - TensorBoard Projector
  - Certifiably Optimal RulE ListS - ![](https://img.shields.io/github/stars/nlarusstone/corels?style=social) | "CORELS is a custom discrete optimization technique for building rule lists over a categorical feature space." |
  - Secure-ML - ![](https://img.shields.io/github/stars/shreya-28/Secure-ML?style=social) | "Secure Linear Regression in the Semi-Honest Two-Party Setting." |
  - acd - ![](https://img.shields.io/github/stars/csinva/hierarchical_dnn_interpretations?style=social) | "Produces hierarchical interpretations for a single prediction made by a pytorch neural network. Official code for *Hierarchical interpretations for neural network predictions*.” |
  - ALEPython - ![](https://img.shields.io/github/stars/blent-ai/ALEPython?style=social) | "Python Accumulated Local Effects package.” |
  - Aletheia - ![](https://img.shields.io/github/stars/SelfExplainML/Aletheia?style=social) | "A Python package for unwrapping ReLU DNNs.” |
  - Bayesian Ors-Of-Ands - ![](https://img.shields.io/github/stars/wangtongada/BOA?style=social) | "This code implements the Bayesian or-of-and algorithm as described in the BOA paper. We include the tictactoe dataset in the correct formatting to be used by this code.” |
  - fair-classification - ![](https://img.shields.io/github/stars/mbilalzafar/fair-classification?style=social) | "Python code for training fair logistic regression classifiers.” |
  - fairness_measures_code - ![](https://img.shields.io/github/stars/megantosh/fairness_measures_code?style=social) | "contains implementations of measures used to quantify discrimination.” |
  - H2O-3 Monotonic GBM
  - h2o-LLM-eval - ![](https://img.shields.io/github/stars/h2oai/h2o-LLM-eval?style=social) | "Large-language Model Evaluation framework with Elo Leaderboard and A-B testing." |
  - hate-functional-tests - ![](https://img.shields.io/github/stars/paul-rottger/hate-functional-tests?style=social) | HateCheck: A dataset and test suite from an ACL 2021 paper, offering functional tests for hate speech detection models, including extensive case annotations and testing functionalities. |
  - interpret_with_rules - ![](https://img.shields.io/github/stars/clips/interpret_with_rules?style=social) | "induces rules to explain the predictions of a trained neural network, and optionally also to explain the patterns that the model captures from the training data, and the patterns that are present in the original dataset.” |
  - InterpretME - ![](https://img.shields.io/github/stars/SDM-TIB/InterpretME?style=social) | "integrates knowledge graphs (KG) with machine learning methods to generate interesting meaningful insights. It helps to generate human- and machine-readable decisions to provide assistance to users and enhance efficiency.” |
  - keract - ![](https://img.shields.io/github/stars/philipperemy/keract?style=social) | Keract is a tool for visualizing activations and gradients in Keras models; it's meant to support a wide range of Tensorflow versions and to offer an intuitive API with Python examples. |
  - LiFT - ![](https://img.shields.io/github/stars/linkedin/LiFT?style=social) | "The LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness and the mitigation of bias in large-scale machine learning workflows. The measurement module includes measuring biases in training data, evaluating fairness metrics for ML models, and detecting statistically significant differences in their performance across different subgroups.” |
  - lilac - ![](https://img.shields.io/github/stars/lilacai/lilac?style=social) | "Curate better data for LLMs." |
  - lrp_toolbox - ![](https://img.shields.io/github/stars/sebastian-lapuschkin/lrp_toolbox?style=social) | "The Layer-wise Relevance Propagation (LRP) algorithm explains a classifer's prediction specific to a given data point by attributing relevance scores to important components of the input by using the topology of the learned model itself.” |
  - MLextend - to-day data science tasks.” |
  - mllp - ![](https://img.shields.io/github/stars/12wang3/mllp?style=social) | "This is a PyTorch implementation of Multilayer Logical Perceptrons (MLLP) and Random Binarization (RB) method to learn Concept Rule Sets (CRS) for transparent classification tasks, as described in our paper: [Transparent Classification with Multilayer Logical Perceptrons and Random Binarization](https://arxiv.org/abs/1912.04695).” |
  - Monotonic Constraints
  - parity-fairness
  - pjsaelin / Cubist - ![](https://img.shields.io/github/stars/pjaselin/Cubist?style=social) | "A Python package for fitting Quinlan's Cubist regression model" |
  - Privacy-Preserving-ML - ![](https://img.shields.io/github/stars/abhinav-bohra/Privacy-Preserving-ML?style=social) | "Implementation of privacy-preserving SVM assuming public model private data scenario (data in encrypted but model parameters are unencrypted) using adequate partial homomorphic encryption.” |
  - ProtoPNet - ![](https://img.shields.io/github/stars/cfchen-duke?style=social) | "This code package implements the prototypical part network (ProtoPNet) from the paper "This Looks Like That: Deep Learning for Interpretable Image Recognition" (to appear at NeurIPS 2019), by Chaofan Chen (Duke University), Oscar Li| (Duke University), Chaofan Tao (Duke University), Alina Jade Barnett (Duke University), Jonathan Su (MIT Lincoln Laboratory), and Cynthia Rudin (Duke University).” |
  - pytorch-innvestigate - ![](https://img.shields.io/github/stars/fgxaos/pytorch-innvestigate?style=social) | "PyTorch implementation of Keras already existing project: [https://github.com/albermax/innvestigate/](https://github.com/albermax/innvestigate/).” |
  - Risk-SLIM - ![](https://img.shields.io/github/stars/ustunb/risk-SLIM?style=social) | "a machine learning method to fit simple customized risk scores in python.” |
  - SAGE - ![](https://img.shields.io/github/stars/iancovert/sage?style=social) | "SAGE (Shapley Additive Global importancE) is a game-theoretic approach for understanding black-box machine learning models. It quantifies each feature's importance based on how much predictive power it contributes, and it accounts for complex feature interactions using the Shapley value.” |
  - Super-sparse Linear Integer models - SLIMs - ![](https://img.shields.io/github/stars/ustunb/slim-python?style=social) | "a package to learn customized scoring systems for decision-making problems.” |
  - tensorfuzz - ![](https://img.shields.io/github/stars/brain-research/tensorfuzz?style=social) | "a library for performing coverage guided fuzzing of neural networks.” |
  - text_explainability - known state-of-the-art explainability approaches for text can be composed.” |
  - text_sensitivity
  - TRIAGE - ![](https://img.shields.io/github/stars/seedatnabeel/TRIAGE?style=social) | "This repository contains the implementation of TRIAGE, a "Data-Centric AI" framework for data characterization tailored for regression.” |
  - XGBoost
  - Causal SVM - ![](https://img.shields.io/github/stars/shangtai/githubcausalsvm?style=social) | "We present a new machine learning approach to estimate whether a treatment has an effect on an individual, in the setting of the classical potential outcomes framework with binary outcomes." |
  - ExplainPrediction - ![](https://img.shields.io/github/stars/rmarko/ExplainPrediction?style=social) | "Generates explanations for classification and regression models and visualizes them." |
  - fairmodels - ![](https://img.shields.io/github/stars/ModelOriented/fairmodels?style=social) | "Flexible tool for bias detection, visualization, and mitigation. Use models explained with DALEX and calculate fairness classification metrics based on confusion matrices using fairness_check() or try newly developed module for regression models using fairness_check_regression()." |
  - featureImportance - ![](https://img.shields.io/github/stars/giuseppec/featureImportance?style=social) | "An extension for the mlr package and allows to compute the permutation feature importance in a model-agnostic manner." |
  - H2O-3 Monotonic GBM
  - H2O-3 Penalized Generalized Linear Models
  - H2O-3 Sparse Principal Components
  - iml - ![](https://img.shields.io/github/stars/christophM/iml?style=social) | "An R package that interprets the behavior and explains predictions of machine learning models."|
  - lightgbmExplainer - ![](https://img.shields.io/github/stars/lantanacamara/lightgbmExplainer?style=social) | "An R package that makes LightGBM models fully interpretable."|
  - mcr - ![](https://img.shields.io/github/stars/aaronjfisher/mcr?style=social) | "An R package for Model Reliance and Model Class Reliance."|
  - shapleyR - ![](https://img.shields.io/github/stars/redichh/ShapleyR?style=social) | "An R package that provides some functionality to use mlr tasks and models to generate shapley values." |
  - contextual-AI - ![](https://img.shields.io/github/stars/SAP/contextual-ai?style=social) | "Contextual AI adds explainability to different stages of machine learning pipelines | data, training, and inference | thereby addressing the trust gap between such ML systems and their users. It does not refer to a specific algorithm or ML method — instead, it takes a human-centric view and approach to AI.” |
  - counterfit - ![](https://img.shields.io/github/stars/Azure/counterfit?style=social) | "a CLI that provides a generic automation layer for assessing the security of ML models.” |
  - TorchUncertainty - ![](https://img.shields.io/github/stars/ENSTA-U2IS/torch-uncertainty?style=social) | "A package designed to help you leverage uncertainty quantification techniques and make your deep neural networks more reliable.” |
- Common or Useful Datasets
- Machine Learning Environment Management Tools
  - mlflow
  - dvc
  - gigantum - driven science." |
  - mlmd - ![mlmd stars](https://img.shields.io/github/stars/google/ml-metadata?style=social) | "For recording and retrieving metadata associated with ML developer and data scientist workflows." |
  - modeldb - ![modeldb stars](https://img.shields.io/github/stars/VertaAI/modeldb?style=social) | "Open Source ML Model Versioning, Metadata, and Experiment Management." |
  - Opik - ![](https://img.shields.io/github/stars/comet-ml/opik?style=social) | "Evaluate, test, and ship LLM applications across your dev and production lifecycles." |
  - neptune
- Benchmarks
  - Sociotechnical Safety Evaluation Repository
  - HELM
  - Nvidia MLPerf
  - OpenML Benchmarking Suites
  - Real Toxicity Prompts (Allen Institute for AI)
  - GEM
  - TrustLLM-Benchmark
  - Trust-LLM-Benchmark Leaderboard
  - SafetyPrompts.com
  - WAVES: Benchmarking the Robustness of Image Watermarks
  - MLCommons, MLCommons AI Safety v0.5 Proof of Concept
  - ML.ENERGY Leaderboard - tuned ones, can generate human-like responses to chat prompts. Using Zeus for energy measurement, we created a leaderboard for LLM chat energy consumption." |
  - benchm-ml - ![](https://img.shields.io/github/stars/szilard/benchm-ml?style=social) | "A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.)." |
  - Hugging Face, evaluate - ![](https://img.shields.io/github/stars/huggingface/evaluate?style=social) | "Evaluate: A library for easily evaluating machine learning models and datasets." |
  - EleutherAI, Language Model Evaluation Harness - ![](https://img.shields.io/github/stars/EleutherAI/lm-evaluation-harness?style=social) | "A framework for few-shot evaluation of language models." |
  - TruthfulQA - ![](https://img.shields.io/github/stars/sylinrl/TruthfulQA?style=social) | "TruthfulQA: Measuring How Models Imitate Human Falsehoods." |
  - DecodingTrust - ![](https://img.shields.io/github/stars/huggingface/evaluate?style=social) | "A Comprehensive Assessment of Trustworthiness in GPT Models." |
  - Bias Benchmark for QA dataset-BBQ - ![](https://img.shields.io/github/stars/nyu-mll/bbq?style=social) | "Repository for the Bias Benchmark for QA dataset." |
  - jphall663, Generative AI Risk Management Resources - ![](https://img.shields.io/github/stars/jphall663/gai_risk_management?style=social) | "A place for ideas and drafts related to GAI risk management." |
  - Winogender Schemas - ![](https://img.shields.io/github/stars/rudinger/winogender-schemas?style=social) | "Data for evaluating gender bias in coreference resolution systems." |
  - MLCommons, AI Luminate: A collaborative, transparent approach to safer AI
  - Evidently AI 100+ LLM benchmarks and evaluation datasets
  - i-gallegos, Fair-LLM-Benchmark - ![](https://img.shields.io/github/stars/i-gallegos/Fair-LLM-Benchmark?style=social) | Benchmark from "Bias and Fairness in Large Language Models: A Survey" |
  - MLCommons, Introducing v0.5 of the AI Safety Benchmark from MLCommons
  - ModelSlant.com
  - OpenML Benchmarking Suites
  - Real Toxicity Prompts - Allen Institute for AI
  - Wild-Time: A Benchmark of in-the-Wild Distribution Shifts over Time - ![](https://img.shields.io/github/stars/huaxiuyao/Wild-Time?style=social) | "Benchmark for Natural Temporal Distribution Shift (NeurIPS 2022)." |
  - yandex-research - tabred - ![](https://img.shields.io/github/stars/yandex-research/tabred?style=social) | "A Benchmark of Tabular Machine Learning in-the-Wild with real-world industry-grade tabular datasets." |
- Personal Data Protection Tools
  - LLM Dataset Inference: Did you train on my dataset? - ![](https://img.shields.io/github/stars/pratyushmaini/llm_dataset_inference?style=social) | "Official Repository for Dataset Inference for LLMs" |
- Archived
Community and Official Guidance Resources
Miscellaneous Resources
Education Resources
AI Incidents, Critiques, and Research Resources

Programming Languages

Python 70 Jupyter Notebook 37 R 12 HTML 5 C++ 3 TypeScript 2 JavaScript 2 Java 2 CSS 1 MDX 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

awesome-machine-learning-interpretability

Technical Resources

Open Source/Access Responsible AI Software Packages

Common or Useful Datasets

Machine Learning Environment Management Tools

Benchmarks

Personal Data Protection Tools

Archived

Community and Official Guidance Resources

Official Policy, Frameworks, and Guidance

Community Frameworks and Guidance

Conferences and Workshops

Documents in Legal Genres

Miscellaneous Resources

AI Law, Policy, and Guidance Trackers

Challenges and Competitions

Curated Bibliographies

List of Lists

Critiques of AI

Groups and Organizations

Education Resources

Comprehensive Software Examples and Tutorials

Free-ish Books

Glossaries and Dictionaries

Open-ish Classes

Podcasts and Channels

AI Incidents, Critiques, and Research Resources

Groups and Organizations

Critiques of AI

List of Lists

AI Law, Policy, and Guidance Trackers

Challenges and Competitions

Curated Bibliographies

awesome-machine-learning-interpretability

Technical Resources

Open Source/Access Responsible AI Software Packages

Common or Useful Datasets

Machine Learning Environment Management Tools

Benchmarks

Personal Data Protection Tools

Archived

Community and Official Guidance Resources

Official Policy, Frameworks, and Guidance

Community Frameworks and Guidance

Conferences and Workshops

Documents in Legal Genres

Miscellaneous Resources

AI Law, Policy, and Guidance Trackers

AI Incident Information Sharing Resources

Challenges and Competitions

Curated Bibliographies

List of Lists

Critiques of AI

Groups and Organizations

Education Resources

Comprehensive Software Examples and Tutorials

Free-ish Books

Glossaries and Dictionaries

Open-ish Classes

Podcasts and Channels

AI Incidents, Critiques, and Research Resources

Groups and Organizations

Critiques of AI

List of Lists

AI Law, Policy, and Guidance Trackers

AI Incident Information Sharing Resources

Challenges and Competitions

Curated Bibliographies