{"id":13779451,"url":"https://github.com/deel-ai/influenciae","last_synced_at":"2025-05-16T00:16:32.369Z","repository":{"id":64341127,"uuid":"413582472","full_name":"deel-ai/influenciae","owner":"deel-ai","description":"👋 Influenciae is a Tensorflow Toolbox for Influence Functions","archived":false,"fork":false,"pushed_at":"2024-04-18T15:47:20.000Z","size":2129,"stargazers_count":63,"open_issues_count":9,"forks_count":5,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-05-11T12:37:22.639Z","etag":null,"topics":["explainability","explainable-ai","fairness-ai","influence-functions","misclassification","outlier-detection"],"latest_commit_sha":null,"homepage":"https://deel-ai.github.io/influenciae","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deel-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-04T21:03:50.000Z","updated_at":"2025-04-24T20:52:44.000Z","dependencies_parsed_at":"2023-02-10T16:00:29.163Z","dependency_job_id":"e2665c26-4dcc-494e-ad37-e27d27c38826","html_url":"https://github.com/deel-ai/influenciae","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deel-ai%2Finfluenciae","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deel-ai%2Finfluenciae/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deel-ai%2Finfluenciae/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deel-ai%2Finfluenciae/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deel-ai","download_url":"https://codeload.github.com/deel-ai/influenciae/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254442822,"owners_count":22071878,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["explainability","explainable-ai","fairness-ai","influence-functions","misclassification","outlier-detection"],"created_at":"2024-08-03T18:01:05.336Z","updated_at":"2025-05-16T00:16:32.233Z","avatar_url":"https://github.com/deel-ai.png","language":"Python","funding_links":[],"categories":["Libraries"],"sub_categories":["Task Agnostic"],"readme":"\u003cdiv align=\"center\"\u003e\r\n    \u003cimg src=\"docs/assets/banner2.png\" width=\"75%\" alt=\"Influenciae\" align=\"center\" /\u003e\r\n\u003c/div\u003e\r\n\u003cbr\u003e\r\n\r\n\u003cdiv align=\"center\"\u003e\r\n    \u003ca href=\"#\"\u003e\r\n        \u003cimg src=\"https://img.shields.io/badge/Python-3.7, 3.8, 3.9, 3.10-efefef\"\u003e\r\n    \u003c/a\u003e\r\n    \u003ca href=\"#tf\"\u003e\r\n        \u003cimg src=\"https://img.shields.io/badge/TensorFlow-2.7, 2.8, 2.9-00458A\"\u003e\r\n    \u003c/a\u003e\r\n    \u003ca href=\"https://github.com/deel-ai/influenciae/actions/workflows/linter.yml\"\u003e\r\n        \u003cimg alt=\"PyLint\" src=\"https://github.com/deel-ai/influenciae/actions/workflows/linter.yml/badge.svg\"\u003e\r\n    \u003c/a\u003e\r\n    \u003ca href=\"https://github.com/deel-ai/influenciae/actions/workflows/tests.yml\"\u003e\r\n        \u003cimg alt=\"Tox\" src=\"https://github.com/deel-ai/influenciae/actions/workflows/tests.yml/badge.svg\"\u003e\r\n    \u003c/a\u003e\r\n    \u003ca href=\"https://github.com/deel-ai/influenciae/actions/workflows/publish.yml\"\u003e\r\n        \u003cimg alt=\"Pypi\" src=\"https://github.com/deel-ai/influenciae/actions/workflows/publish.yml/badge.svg\"\u003e\r\n    \u003c/a\u003e\r\n    \u003ca href=\"#\"\u003e\r\n        \u003cimg src=\"https://img.shields.io/badge/License-MIT-efefef\"\u003e\r\n    \u003c/a\u003e\r\n    \u003cbr\u003e\r\n    \u003ca href=\"https://deel-ai.github.io/influenciae/\"\u003e\u003cstrong\u003eExplore Influenciae docs »\u003c/strong\u003e\u003c/a\u003e\r\n\u003c/div\u003e\r\n\u003cbr\u003e\r\n\r\nInfluenciae is a Python toolkit dedicated to computing influence values for the discovery of potentially problematic samples in a dataset and the generation of data-centric explanations for deep learning models. In this library based on Tensorflow, we gather state-of-the-art methods for estimating the importance of training samples and their influence on test data-points for validating the quality of datasets and of the models trained on them.\r\n\r\n## 🔥 Tutorials\r\n\r\nWe propose some hands-on tutorials to get familiar with the library and it's API:\r\n\r\n- [**Getting Started**](https://colab.research.google.com/drive/1vQ6seX6KOr48zx4nLELoy9j1X4jzQv1p?usp=sharing) \u003csub\u003e [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1vQ6seX6KOr48zx4nLELoy9j1X4jzQv1p?usp=sharing) \u003c/sub\u003e\r\n- [**Benchmarking with Mislabeled sample detection**](https://colab.research.google.com/drive/1_5-RC_YBHptVCElBbjxWfWQ1LMU20vOp?usp=sharing) \u003csub\u003e [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1_5-RC_YBHptVCElBbjxWfWQ1LMU20vOp?usp=sharing) \u003c/sub\u003e\r\n- [**Using the first order influence calculator**](https://colab.research.google.com/drive/1WlYcQNu5obhVjhonN2QYi8ybKyZJl4iY?usp=sharing) \u003csub\u003e [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WlYcQNu5obhVjhonN2QYi8ybKyZJl4iY?usp=sharing) \u003c/sub\u003e\r\n- [**Using the second order influence calculator**](https://colab.research.google.com/drive/1qNvKiU3-aZWhRA0rxS6X3ebeNkoznJJe?usp=sharing) \u003csub\u003e [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1qNvKiU3-aZWhRA0rxS6X3ebeNkoznJJe?usp=sharing) \u003c/sub\u003e\r\n- [**Using Arnoldi Influence Calculator**](https://colab.research.google.com/drive/1rQU33sbD0YW1cZMRlJmS15EW5O16yoDE?usp=sharing) \u003csub\u003e [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1rQU33sbD0YW1cZMRlJmS15EW5O16yoDE?usp=sharing) \u003c/sub\u003e\r\n- [**Using TracIn**](https://colab.research.google.com/drive/1E94cGF46SUQXcCTNwQ4VGSjXEKm7g21c?usp=sharing) \u003csub\u003e [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1E94cGF46SUQXcCTNwQ4VGSjXEKm7g21c?usp=sharing) \u003c/sub\u003e\r\n- [**Using Representer Point Selection - L2 (RPS_L2)**](https://colab.research.google.com/drive/17W5s30LbxABbDd8hbdwYE56abyWjSC4u?usp=sharing) \u003csub\u003e [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/17W5s30LbxABbDd8hbdwYE56abyWjSC4u?usp=sharing) \u003c/sub\u003e\r\n- [**Using Representer Point Selection - Local Jacobian Expansion (RPS_LJE)**](https://colab.research.google.com/drive/14e7wwFRQJhY-huVYmJ7ri355kfLJgAPA?usp=sharing) \u003csub\u003e [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/14e7wwFRQJhY-huVYmJ7ri355kfLJgAPA?usp=sharing) \u003c/sub\u003e\r\n- [**Using Boundary-based Influence**](https://colab.research.google.com/drive/1785eHgT91FfqG1f25s7ovqd6JhP5uklh?usp=sharing) \u003csub\u003e [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1785eHgT91FfqG1f25s7ovqd6JhP5uklh?usp=sharing) \u003c/sub\u003e\r\n\r\n## 🚀 Quick Start\r\n\r\nInfluenciae requires a version of python 3.7 or higher and several libraries, including Tensorflow and Numpy. Installation can be done using Pypi:\r\n\r\n```python\r\npip install influenciae\r\n```\r\n\r\nOnce Influenciae is installed, there are two major applications for the different modules (that all follow the same API).\r\nSo, except for group-specific functions that are only available on the `influence` module, all the classes are able to compute self-influence values, the influence with one point wrt another, as well as find the top-k samples for both of these situations.\r\n\r\n### Discovering influential examples\r\n\r\nParticularly useful when validating datasets, influence functions (and related notions) allow for gaining an insight into what samples the models thinks to be \"important\". For this, the training dataset and a trained model are needed.\r\n\r\n```python\r\nfrom deel.influenciae.common import InfluenceModel, ExactIHVP\r\nfrom deel.influenciae.influence import FirstOrderInfluenceCalculator\r\nfrom deel.influenciae.utils import ORDER\r\n\r\n# load the model, the training loss (without reduction) and the training data (with the labels and in a batched TF dataset)\r\n\r\ninfluence_model = InfluenceModel(model, start_layer=target_layer, loss_function=loss_function)\r\nihvp_calculator = ExactIHVP(influence_model, train_dataset)\r\ninfluence_calculator = FirstOrderInfluenceCalculator(influence_model, train_dataset, ihvp_calculator)\r\ndata_and_influence_dataset = influence_calculator.compute_influence_values(train_dataset)\r\n# or influence_calculator.compute_top_k_from_training_dataset(train_dataset, k_samples, ORDER.DESCENDING) when the\r\n# dataset is too large\r\n```\r\n\r\nThis is also explained more in depth in the [Getting Started tutotial](https://colab.research.google.com/drive/1vQ6seX6KOr48zx4nLELoy9j1X4jzQv1p?usp=sharing) \u003csub\u003e [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1vQ6seX6KOr48zx4nLELoy9j1X4jzQv1p?usp=sharing) \u003c/sub\u003e\r\n\r\n### Explaining neural networks through their training data\r\n\r\nAnother application is to explain some model's predictions by looking on which training samples they are based on. Again, the training dataset, the model and the samples we wish to explain are needed.\r\n\r\n```python\r\nfrom deel.influenciae.common import InfluenceModel, ExactIHVP\r\nfrom deel.influenciae.influence import FirstOrderInfluenceCalculator\r\nfrom deel.influenciae.utils import ORDER\r\n\r\n# load the model, the training loss (without reduction), the training data and\r\n# the data to explain (with the labels and in batched a TF dataset)\r\n\r\ninfluence_model = InfluenceModel(model, start_layer=target_layer, loss_function=loss_function)\r\nihvp_calculator = ExactIHVP(influence_model, train_dataset)\r\ninfluence_calculator = FirstOrderInfluenceCalculator(influence_model, train_dataset, ihvp_calculator)\r\ndata_and_influence_dataset = influence_calculator.estimate_influence_values_in_batches(samples_to_explain, train_dataset)\r\n# or influence_calculator.top_k(samples_to_explain, train_dataset, k_samples, ORDER.DESCENDING) when the\r\n# dataset is too large\r\n```\r\n\r\nThis is also explained more in depth in the [Getting Started tutorial](https://colab.research.google.com/drive/1vQ6seX6KOr48zx4nLELoy9j1X4jzQv1p?usp=sharing) \u003csub\u003e [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1vQ6seX6KOr48zx4nLELoy9j1X4jzQv1p?usp=sharing) \u003c/sub\u003e\r\n\r\n### Determining the influence of groups of samples\r\n\r\nThe previous examples use notions of influence that are applied individually to each data-point, but it is possible to extend this to groups. That is, answer the question of what would a model look like if it hadn't seen a whole group of data-points during training, for example. This can be computed namely using the `FirstOrderInfluenceCalculator` and `SecondOrderInfluenceCalculator`, for implementations where pairwise interactions between each of the data-points are not taken into account and do, respectively.\r\n\r\nFor obtaining the groups' influence:\r\n\r\n```python\r\nfrom deel.influenciae.common import InfluenceModel, ExactIHVP\r\nfrom deel.influenciae.influence import SecondOrderInfluenceCalculator\r\n\r\n# load the model, the training loss (without reduction), the training data and\r\n# the data to explain (with the labels and in a batched TF dataset)\r\n\r\ninfluence_model = InfluenceModel(model, start_layer=target_layer, loss_function=loss_function)\r\nihvp_calculator = ExactIHVP(influence_model, train_dataset)\r\ninfluence_calculator = SecondOrderInfluenceCalculator(influence_model, train_dataset, ihvp_calculator)  # or FirstOrderInfluenceCalculator\r\ndata_and_influence_dataset = influence_calculator.estimate_influence_values_group(groups_train, groups_to_explain)\r\n```\r\n\r\nFor the data-centric explanations:\r\n\r\n```python\r\nfrom deel.influenciae.common import InfluenceModel, ExactIHVP\r\nfrom deel.influenciae.influence import SecondOrderInfluenceCalculator\r\n\r\n# load the model, the training loss (without reduction), the training data and\r\n# the data to explain (with the labels and in a batched TF dataset)\r\n\r\ninfluence_model = InfluenceModel(model, start_layer=target_layer, loss_function=loss_function)\r\nihvp_calculator = ExactIHVP(influence_model, train_dataset)\r\ninfluence_calculator = SecondOrderInfluenceCalculator(influence_model, train_dataset, ihvp_calculator)  # or FirstOrderInfluenceCalculator\r\ndata_and_influence_dataset = influence_calculator.estimate_influence_values_group(groups_train)\r\n```\r\n\r\n## 📦 What's Included\r\n\r\nAll the influence calculation methods work on Tensorflow models trained for any sort of task and on any type of data. Visualization functionality is implemented for image datasets only (for the moment).\r\n\r\n| **Influence Method**                                    | Source                                                                                             |                                                                              Tutorial                                                                               |\r\n|:--------------------------------------------------------|:---------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------:|\r\n| Influence Functions                                     | [Paper](https://arxiv.org/abs/1703.04730)                                                          | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WlYcQNu5obhVjhonN2QYi8ybKyZJl4iY?usp=sharing) |\r\n| RelatIF                                                 | [Paper](https://arxiv.org/pdf/2003.11630.pdf)                                                      | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WlYcQNu5obhVjhonN2QYi8ybKyZJl4iY?usp=sharing) |\r\n| Influence Functions  (first order, groups)              | [Paper](https://arxiv.org/abs/1905.13289)                                                          | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WlYcQNu5obhVjhonN2QYi8ybKyZJl4iY?usp=sharing) |\r\n| Influence Functions  (second order, groups)             | [Paper](https://arxiv.org/abs/1911.00418)                                                          | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1qNvKiU3-aZWhRA0rxS6X3ebeNkoznJJe?usp=sharing) |\r\n| Arnoldi iteration (Scaling Up Influence Functions)      | [Paper](https://arxiv.org/abs/2112.03052)  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1rQU33sbD0YW1cZMRlJmS15EW5O16yoDE?usp=sharing)  |\r\n| Trac-In                                                 | [Paper](https://arxiv.org/abs/2002.08484)                                                          | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1E94cGF46SUQXcCTNwQ4VGSjXEKm7g21c?usp=sharing) |\r\n| Representer Point Selection  (L2)                       | [Paper](https://arxiv.org/abs/1811.09720)                                                          | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/17W5s30LbxABbDd8hbdwYE56abyWjSC4u?usp=sharing) |\r\n| Representer Point Selection  (Local Jacobian Expansion) | [Paper](https://proceedings.neurips.cc/paper/2021/file/c460dc0f18fc309ac07306a4a55d2fd6-Paper.pdf) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/14e7wwFRQJhY-huVYmJ7ri355kfLJgAPA?usp=sharing) |\r\n| Boundary-based influence                                | --                                                                                                 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1785eHgT91FfqG1f25s7ovqd6JhP5uklh?usp=sharing) |\r\n\r\n## 👀 See Also\r\n\r\nThis library proposes implementations of some of the different popular ways of calculating the influence of data-points on TF, but there are also other ones using other frameworks. \r\n\r\nSome other tools for efficiently computing influence functions.\r\n\r\n- [Scaling Up Influence Functions](https://github.com/google-research/jax-influence) a Python library using JAX implementing a scalable algorithm for computing influence functions.\r\n- [FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging](https://github.com/salesforce/fast-influence-functions) a Python library using PyTorch implementing another scalable algorithm for computing influence functions.\r\n\r\nMore from the DEEL project:\r\n\r\n- [Xplique](https://github.com/deel-ai/xplique) a Python library exclusively dedicated to explaining neural networks.\r\n- [deel-lip](https://github.com/deel-ai/deel-lip) a Python library for training k-Lipschitz neural networks on TF.\r\n- [deel-torchlip](https://github.com/deel-ai/deel-torchlip) a Python library for training k-Lipschitz neural networks on PyTorch.\r\n- [DEEL White paper](https://arxiv.org/abs/2103.10529) a summary of the DEEL team on the challenges of certifiable AI and the role of data quality, representativity and explainability for this purpose.\r\n\r\n## 🙏 Acknowledgments\r\n\r\n\u003cimg align=\"right\" src=\"https://www.deel.ai/wp-content/uploads/2021/05/logo-DEEL.png\" width=\"25%\"\u003e\r\nThis project received funding from the French ”Investing for the Future – PIA3” program within the Artificial and Natural Intelligence Toulouse Institute (ANITI). The authors gratefully acknowledge the support of the \u003ca href=\"https://www.deel.ai/\"\u003e DEEL \u003c/a\u003e project.\r\n\r\n## 👨‍🎓 Creators\r\n\r\nThis library was first created as a research tool by [Agustin Martin PICARD](mailto:agustin-martin.picard@irt-saintexupery.com) in the context of the DEEL project with the help of [David Vigouroux](mailto:david.vigouroux@irt-saintexupery.com) and [Thomas FEL](http://thomasfel.fr). Later on, [Lucas Hervier](https://github.com/lucashervier) joined the team to transform the code base as a practical user-(almost)-friendly and efficient tool.\r\n\r\n## 🗞️ Citation\r\n\r\nIf you use Influenciae as part of your workflow in a scientific publication, please consider citing the 🗞️ [official paper](https://hal.science/hal-04284178/):\r\n\r\n```\r\n@unpublished{picard:hal-04284178,\r\n  TITLE = {Influenci{\\ae}: A library for tracing the influence back to the data-points},\r\n  AUTHOR = {Picard, Agustin Martin and Hervier, Lucas and Fel, Thomas and Vigouroux, David},\r\n  URL = {https://hal.science/hal-04284178},\r\n  NOTE = {working paper or preprint},\r\n  YEAR = {2023},\r\n  MONTH = Nov,\r\n  KEYWORDS = {Data-centric ai ; XAI ; Explainability ; Influence Functions ; Open-source toolbox},\r\n  PDF = {https://hal.science/hal-04284178/file/ms.pdf},\r\n  HAL_ID = {hal-04284178},\r\n  HAL_VERSION = {v1},\r\n}\r\n```\r\n\r\n## 📝 License\r\n\r\nThe package is released under \u003ca href=\"https://choosealicense.com/licenses/mit\"\u003e MIT license\u003c/a\u003e.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeel-ai%2Finfluenciae","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeel-ai%2Finfluenciae","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeel-ai%2Finfluenciae/lists"}