awesome-privacy-engineering

A curated list of resources related to privacy engineering
https://github.com/mplspunk/awesome-privacy-engineering

Last synced: 5 days ago
JSON representation

Awesome Privacy Engineering [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)
- Books
  - The Privacy Engineer's Manifesto: Getting from Policy to Code to QA to Value (Michelle Dennedy, Jonathan Fox, Tom Finneran)
  - The Algorithmic Foundation of Differential Privacy (Cynthia Dwork, Aaron Roth)
  - Privacy Engineering: A Dataflow and Ontological Approach (Ian Oliver)
  - The Algorithmic Foundation of Differential Privacy (Cynthia Dwork, Aaron Roth)
  - Strategic Privacy by Design (R. Jason Cronk)
  - Privacy Is Hard and Seven Other Myths: Achieving Privacy through Careful Design (Jaap-Henk Hoepman)
  - Strategic Privacy by Design (R. Jason Cronk)
  - Building an Anonymization Pipeline: Creating Safe Data (Luk Arbuckle, Khaled El Emam)
  - The Architecture of Privacy: On Engineering Technologies that Can Deliver Trustworthy Safeguards (Courtney Bowman, Ari Gesher, John K. Grant, Daniel Slate, Elissa Lerner)
  - Privacy Is Hard and Seven Other Myths: Achieving Privacy through Careful Design (Jaap-Henk Hoepman)
  - Practical Data Privacy (Katharine Jarmul) - data-privacy)
  - Threat Modeling: Designing for Security (Adam Shostack)
  - Threat Modeling: A Practical Guide for Development Teams (Izar Tarandach, Matthew J. Coles)
  - Data Privacy: A Runbook for Engineers (Nishant Bhajaria)
  - Information Privacy Engineering and Privacy by Design: Understanding Privacy Threats, Technology, and Regulations Based on Standards and Best Practices (William Stallings)
  - Privacy Design Strategies (The Little Blue Book) (Jaap-Henk Hoepman)
- Data Deletion, Data Mapping, and Data Subject Access Requests
  - Deleting Data Distributed Throughout Your Microservices Architecture - Microservices architectures tend to distribute responsibility for data throughout an organization. This poses challenges to ensuring that data is deleted.
  - Best Practices: GDPR and CCPA Compliance Using Delta Lake - Article that describes how to use Delta Lake on Databricks to manage General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) compliance for a data lake.
  - OpenDSR - A common framework enabling companies to work together to protect consumers' privacy and data rights (formerly known as OpenGDPR.)
  - Deleting Data Distributed Throughout Your Microservices Architecture - Microservices architectures tend to distribute responsibility for data throughout an organization. This poses challenges to ensuring that data is deleted.
  - Detecting PII Using Amazon Comprehend - To detect entities that contain personally identifiable information (PII) in a document, use the Amazon Comprehend DetectPiiEntities operation.
  - How to Delete User Data in an AWS Data Lake - This post walks through a framework that helps you purge individual user data within your organization’s AWS hosted data lake, and an analytics solution that uses different AWS storage layers, along with sample code targeting Amazon S3.
  - Privado - Privado is an open source static code analysis tool to discover data flows in the code. It detects the personal data being processed, and further maps the journey of the data from the point of collection to going to interesting sinks such as third parties, databases, logs, and internal APIs.
  - Best Practices: GDPR and CCPA Compliance Using Delta Lake - Article that describes how to use Delta Lake on Databricks to manage General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) compliance for a data lake.
  - Fides - An open-source tool that allows you to easily declare your systems' privacy characteristics, track privacy related changes to systems and data in version control, and enforce policies in both your source code and your runtime infrastructure.
  - Fideslang - Open-source description language for privacy to declare data types and data behaviors in your tech stack in order to simplify data privacy globally. Supports GDPR, CCPA, LGPD and ISO 19944.
  - Fidesops - DSAR Orchestration: Privacy Request automation to fulfill GDPR, CCPA, and LGPD data subject requests. (deprecated)
  - Amazon S3 Find and Forget
  - Klaro! - Klaro is a simple consent management platform (CMP) and privacy tool that helps you to be transparent about the third-party applications on your website.
  - PrivacyBot - PrivacyBot is a simple automated service to initiate CCPA deletion requests with data brokers. (deprecated)
  - Cookie Consent - An opensource, lightweight JavaScript plugin for alerting users about the use of cookies on a website. It is designed to help quickly comply with the European Union Cookie Law, CCPA, GDPR and other privacy laws.
  - Detecting PII Using Amazon Comprehend - Using Amazon Comprehend to detect entities that contain personally identifiable information (PII) in a text document.
  - Octopii - Octopii is an open-source AI-powered PII scanner that can look for image assets such as Government IDs, passports, photos and signatures in a directory.
  - Data Profiler - DataProfiler is a Python library created by Capital One to make data analysis, monitoring, and sensitive data detection easy.
  - PII Catcher - Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub.
  - Data Purging AWS Data Lake
  - Handling Data Erasure Requests in Your Data Lake with Amazon S3 Find and Forget - Amazon S3 Find and Forget enables you to find and delete records automatically in data lakes on Amazon S3.
- Machine Learning and Algorithmic Bias
  - Trustworthy ML Initiative - The Trustworthy ML Initiative is a community of researchers and practitioners working on topics related to machine learning models and algorithms that are accurate, explainable, fair, privacy-preserving, causal, and robust.
  - Measuring Fairness - How do you make sure a model works equally well for different groups of people?
  - How Randomized Response Can Help Collect Sensitive Information Responsibly - Giant datasets are revealing new patterns in cancer, income inequality and other important areas. However, the widespread availability of fast computers that can cross reference public data is making it harder to collect private information without inadvertently violating people's privacy. Modern randomization techniques can help preserve anonymity.
  - Can a Model Be Differentially Private and Fair? - Training with differential privacy limits the information about any one data point that is extractable but in some cases there’s an unexpected side-effect: reduced accuracy with underrepresented subgroups disparately impacted.
  - Hidden Bias - Models trained on real-world data can encode real-world bias. Hiding information about protected classes doesn't always fix things — sometimes it can even hurt.
  - How Federated Learning Protects Privacy - With federated learning, it’s possible to collaboratively train a model with data from multiple users without any raw data leaving their devices.
  - Privacy Considerations in Large Language Models
  - Explaining Decisions Made with AI - Guidance by the UK's Information Commissioner's Office (ICO) and The Alan Turing Institute aims to give organisations practical advice to help explain the processes, services and decisions delivered or assisted by AI, to the individuals affected by them.
  - Responsible AI by Design - Microsoft's hub for policies, practices, and tools that make up its framework for Responsible AI by Design. Includes a Responsible AI Standard, Responsible AI Impact Assessment Guide, and Responsible AI Impact Assessment Template.
  - Ethical Machine Learning - Spotting and Preventing Proxy Bias - Jupyter Notebook from rOpenSciLabs that explores several ways of detecting unintentional bias and removing it from a predictive model. (deprecated)
  - AI Fairness 360 (AIF360) - A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
  - Adversarial Robustness Toolbox (ART) - Python library from the Linux Foundation AI & Data Foundation (LF AI & Data) that enables developers and researchers to defend and evaluate machine learning models and applications against the adversarial threats of evasion, poisoning, extraction, and inference.
  - Pribot and Polisis - Polisis is a unique way of visualizing privacy policies. Using deep learning, it allows you to know what the company is collecting about you, what it is sharing, etc.
  - Ethical Machine Learning - Spotting and Preventing Proxy Bias - Jupyter Notebook from rOpenSciLabs that explores several ways of detecting unintentional bias and removing it from a predictive model.
  - Fairness in Machine Learning Engineering - Google's Machine Learning Crash Course includes a 70-minute section on fairness.
  - Why Some Models Leak Data - Machine learning models use large amounts of data, some of which can be sensitive. If they're not trained correctly, sometimes that data is inadvertently revealed.
  - Measuring Fairness - How do you make sure a model works equally well for different groups of people?
  - How Randomized Response Can Help Collect Sensitive Information Responsibly - Giant datasets are revealing new patterns in cancer, income inequality and other important areas. However, the widespread availability of fast computers that can cross reference public data is making it harder to collect private information without inadvertently violating people's privacy. Modern randomization techniques can help preserve anonymity.
  - Can a Model Be Differentially Private and Fair? - Training with differential privacy limits the information about any one data point that is extractable but in some cases there’s an unexpected side-effect: reduced accuracy with underrepresented subgroups disparately impacted.
  - Hidden Bias - Models trained on real-world data can encode real-world bias. Hiding information about protected classes doesn't always fix things — sometimes it can even hurt.
  - How Federated Learning Protects Privacy - With federated learning, it’s possible to collaboratively train a model with data from multiple users without any raw data leaving their devices.
  - Explaining Decisions Made with AI - Guidance by the UK's Information Commissioner's Office (ICO) and The Alan Turing Institute aims to give organisations practical advice to help explain the processes, services and decisions delivered or assisted by AI, to the individuals affected by them.
  - Responsible AI by Design - Microsoft's hub for policies, practices, and tools that make up its framework for Responsible AI by Design. Includes a Responsible AI Standard, Responsible AI Impact Assessment Guide, and Responsible AI Impact Assessment Template.
  - Explaining Decisions Made with AI - Guidance by the UK's Information Commissioner's Office (ICO) and The Alan Turing Institute aims to give organisations practical advice to help explain the processes, services and decisions delivered or assisted by AI, to the individuals affected by them.
  - Considerations for Sensitive Data within Machine Learning Datasets - This Google Cloud article aims to highlight some strategies for identifying and protecting sensitive information, and processes to help address security concerns you might have with your machine learning data.
  - Privacy Considerations in Large Language Models
  - Explainable Artificial Intelligence - This course syllabus from Harvard University aims to familiarize students with the recent advances in the emerging field of eXplainable Artificial Intelligence (XAI).
  - SecretFlow - SecretFlow is a unified framework for privacy-preserving data analysis and machine learning.
  - Private AI Bootcamp - Youtube playlist of lectures from the Private AI Bootcamp at Microsoft Research Redmond in December 2019.
  - Model Card Toolkit - Google's Model Card Toolkit streamlines and automates generation of [Model Cards](https://modelcards.withgoogle.com/about), machine learning documents that provide context and transparency into a model's development and performance.
  - Fairlearn - A Python package to assess and improve fairness of machine learning models.
  - Of Oaths and Checklists - A checklist for people who are working on data projects, authored by DJ Patil, Hilary Mason, and Mike Loukides.
  - Intro to AI Ethics - A Kaggle Learn course to explore practical tools to guide the moral design of AI systems.
  - Fairness and Machine Learning: Limitations and Opportunities - An online textbook by Solon Barocas, Moritz Hardt, and Arvind Narayanan.
  - Aequitas - An open source bias audit toolkit developed by the Center for Data Science and Public Policy at University of Chicago, can be used to audit the predictions of machine learning based risk assessment tools to understand different types of biases, and make informed decisions about developing and deploying such systems.
  - DrivenData: Deon - A command line tool to easily add an ethics checklist to your data science projects.
  - Machine Unlearning - A compilation of existing literature about machine unlearning, a process through which a machine learning model can be made to forget one of its training data points.
  - AI Nutrition Facts Labels - Tool from Twilio that allows generation of AI Nutrition Labels intended to give consumers and businesses a more transparent and clear view into ‘what's in the box’.
  - How to Incorporate Ethics and Risk into Your Machine Learning Development Process - To help highlight ethics and risk in machine learning, this article looks at the six steps involved in developing an ML system, what happens in each step, and the risk and ethics questions that arise.
  - ML Privacy Meter - A tool to quantify the privacy risks of machine learning models with respect to inference attacks, notably membership inference attacks
  - Responsible AI Toolbox - Responsible AI Toolbox is a suite of tools from Microsoft that provides a collection of model and data exploration and assessment user interfaces that enable a better understanding of AI systems. The Toolbox consists of four dashboards: an Error Analysis dashboard, an Interpretability dashboard, a Fairness dashboard, and a Responsible AI dashboard.
  - Apple Privacy-Preserving Machine Learning Workshop 2022 - In June 2022, Apple hosted the Workshop on Privacy-Preserving Machine Learning (PPML), which brought Apple and members of the academic research communities together to discuss the state of the art in the field of privacy-preserving machine learning through a series of talks and discussions. This post includes highlights from workshop discussions and recordings of select workshop talks.
- Homomorphic Encryption
  - blyss - Open-source SDK for accessing data privately using homomorphic encryption.
  - Palisade Homomorphic Encryption Software Library - An open-source project that provides efficient implementations of lattice cryptography building blocks and homomorphic encryption schemes.
  - TFHE - The original version of TFHE (Fast Fully Homomorphic Encryption Library over the Torus) that implements the base arithmetic and functionalities (bootstrapped and leveled), allowing you to perform computations over encrypted data.
  - Building Safe A.I.: A Tutorial for Encrypted Deep Learning - Blogpost on how to train a neural network that is fully encrypted during training.
  - HElib - HElib is an open-source software library that implements homomorphic encryption.
  - Microsoft SEAL - Microsoft SEAL is an easy-to-use open-source (MIT licensed) homomorphic encryption library developed by the Cryptography and Privacy Research group at Microsoft.
  - Concrete - The concrete ecosystem is a set of crates (packages in the Rust language) that implements Zama's variant of TFHE, while most of the complexity of fully homomorphic encryption is hidden under high-level APIs.
  - FHE.org - Community of researchers and developers interested in advancing Fully Homomorphic Encryption (FHE) and other secure computation techniques.
  - swift-homomorphic-encryption - Apple's open source [Swift package](https://www.swift.org/blog/announcing-swift-homomorphic-encryption/) that utilizes Private Information Retrieval (PIR).
  - nGraph-HE: A Graph Compiler for Deep Learning on Homomorphically Encrypted Data - Intel Research proposes an extension to its deep learning compiler to operate on homomorphically encrypted data.
  - Google Fully-Homomorphic-Encryption - This repository created by Google contains open-source libraries and tools to perform fully homomorphic encryption operations on an encrypted data set.
- Synthetic Data
  - GDPR Helpers - Generative models to automatically anonymize data to meet GDPR & CCPA standards.
  - Pynonymizer - Pynonymizer is a universal tool for translating sensitive production database dumps into anonymized copies.
  - Synthetic Data Vault - The Synthetic Data Vault (SDV) enables end users to easily generate Synthetic Data for different data modalities, including single table, multi-table and time series data.
  - Synthetic Data Generation: Quality, Privacy, Bias (Workshop at ICLR 2021) - Workshop on the intersection of challenges regarding quality, privacy and bias in synthetic data generation.
  - Bogus - Bogus is a simple fake data generator for .NET languages like C#, F# and VB.NET.
  - Gretel Synthetics - Synthetic data generators for structured and unstructured text, featuring differentially private learning.
  - Pynonymizer - Pynonymizer is a universal tool for translating sensitive production database dumps into anonymized copies.
  - Synthetic Data Vault - The Synthetic Data Vault (SDV) enables end users to easily generate Synthetic Data for different data modalities, including single table, multi-table and time series data.
  - Faker - Faker is a Python package that generates fake data for you.
  - Synthea - An open-source, synthetic patient generator that models the medical history of synthetic patients.
  - Anonymize Tabular Data to Meet GDPR Privacy Requirements - A blog post covering how to use Gretel's GDPR Helpers.
  - Differentially Private Synthetic Data via Foundation Model APIs (DPSDA) - This repo is a Python library to generate differentially private synthetic data without the need of any ML model training.
  - Data Synthesizer - DataSynthesizer generates synthetic data that simulates a given dataset.
  - synthpop - R package for producing synthetic versions of microdata containing confidential information so that they are safe to be released to users for exploratory analysis.
  - Mimesis - Mimesis is a high-performance fake data generator for Python, which provides data for a variety of purposes in a variety of languages.
  - plaitpy - plait.py is a program for generating fake data from composable yaml templates.
- Tagging Personally Identifiable Information
  - Categorizing Your AWS S3 Storage Using Tags - In addition to data classification, tagging offers benefits such as fine-grained access control of permissions and object lifecycle management.
  - Using Policy Tags in Google Cloud's BigQuery - Use policy tags to define access to your data, for example, when you use BigQuery column-level security.
  - Managing Tags in AWS Resource Groups - Tags are words or phrases that act as metadata that you can use to identify and organize your AWS resources. A resource can have up to 50 user-applied tags.
  - Quickstart for Tagging Tables in Google Cloud - Tutorial shows how to create a BigQuery dataset, copy data to a new table in your dataset, create a tag template, and attach the tag to your table.
  - Managing Tags in AWS Resource Groups - Tags are words or phrases that act as metadata that you can use to identify and organize your AWS resources. A resource can have up to 50 user-applied tags.
  - Quickstart for Tagging Tables in Google Cloud - Tutorial shows how to create a BigQuery dataset, copy data to a new table in your dataset, create a tag template, and attach the tag to your table.
  - Adding a Tag-Based PII Policy in Cloudera - How to add a PII tag-based policy. In this example, the author creates a tag-based policy for objects tagged "PII" in Atlas.
  - BigQuery PII Classifier - Google Cloud BigQuery PII Classifier is a solution to automate the process of discovering and tagging PII data across BigQuery tables and applying column-level access controls to restrict specific PII data types to certain users/groups.
  - Quickstart for Tagging Tables in Google Cloud - Tutorial shows how to create a BigQuery dataset, copy data to a new table in your dataset, create a tag template, and attach the tag to your table.
- Regulatory and Framework Resources
  - European Data Protection Supervisor Website Evidence Collector - The Website Evidence Collector tool automates the collection of evidence of personal data processing, such as cookies, or requests to third parties.
  - webXray - webXray is a tool for legal and compliance professionals to find privacy violations on the web.
  - UN Guide on Privacy-Enhancing Technologies for Official Statistics - This United Nations (UN) document presents methodologies and approaches to mitigating privacy risks when using sensitive or confidential data.
  - Privacy in M&A Transactions: The Playbook - The playbook is directed to mergers and acquisitions (M&A) and privacy teams to help identify potential privacy-related issues.
  - Emerging Privacy Enhancing Technologies: Current Regulatory and Policy Approaches - The Organisation for Economic Co-operation and Development (OECD)'s report reviews recent technological advancements and evaluates the effectiveness of different types of privacy enhancing technologies (PETs), as well as the challenges and opportunities they present.
  - webXray - webXray is a [command line tool](https://github.com/agilemobiledev/webXray) for analyzing webpage traffic and content, extracting legal policies, and identifying the companies which collect user data.
  - European Data Protection Board Guidelines 4/2019 on Article 25, Data Protection by Design and by Default - This document gives general guidance on the obligation of Data Protection by Design and by Default set forth in Article 25 in the GDPR.
  - A Guide to Privacy by Design - This document by Spain's Data Protection Agency, AEPD, provides guidance on implementation of Privacy by Design into systems and applications.
  - Guidance on Anonymisation and Pseudonymisation - This document from the Irish Data Protection Commission (DPC) offers guidance on implementation of anonymization and pseudonuymization.
  - UN Guide on Privacy-Enhancing Technologies for Official Statistics - This United Nations (UN) document presents methodologies and approaches to mitigating privacy risks when using sensitive or confidential data.
  - An Introduction to Privacy Engineering and Risk Management in Federal Systems (NIST IR 8062) - National Institute of Standards and Technology (NIST) Internal Report 8062 provides an introduction to the concepts of privacy engineering and risk management for US federal systems.
  - US State Privacy Legislation Tracker - The IAPP Westin Research Center actively tracks the proposed and enacted comprehensive privacy bills from across the United States.
  - An Introduction to Privacy Engineering and Risk Management in Federal Systems (NIST IR 8062) - National Institute of Standards and Technology (NIST) Internal Report 8062 provides an introduction to the concepts of privacy engineering and risk management for US federal systems.
  - Global Comprehensive Privacy Law Mapping Chart - The IAPP's Westin Research Center has created this chart mapping several comprehensive data protection laws.
  - Guidelines for Evaluating Differential Privacy Guarantees (NIST SP 800-226) - NIST Special Publication 800-226 is intended to help US federal agencies and practitioners of all backgrounds better understand how to evaluate promises made (and not made) when deploying differential privacy.
  - GDPR Developer Guide - In order to assist web and application developers in making their work GDPR-compliant, France's Data Protection Agency, the CNIL, has drawn up a guide of best practices.
  - Guidance on Anonymisation and Pseudonymisation - This document from the Irish Data Protection Commission (DPC) offers guidance on implementation of anonymization and pseudonuymization.
  - De-Identifying Government Datasets: Techniques and Governance (NIST SP 800-188) - NIST Special Publication 800-188 describes the use of deidentification while still allowing for the production of meaningful statistical analysis for US federal agencies.
  - European Data Protection Board Website Auditing Tool - The Website Auditing Tool is used to collect evidence and generate reports regarding trackers that are being used by websites.
  - European Data Protection Board Guidelines 4/2019 on Article 25, Data Protection by Design and by Default - This document gives general guidance on the obligation of Data Protection by Design and by Default set forth in Article 25 in the GDPR.
  - Data Protection/Privacy Mapping Project - Microsoft's Data Protection/Privacy Mapping Project facilitates consistent global comprehension and implementation of data protection with an open source mapping between ISO/IEC 27701 and global data protection and/or privacy laws and regulations.
  - A Guide to Privacy by Design - This document by Spain's Data Protection Agency, AEPD, provides guidance on implementation of Privacy by Design into systems and applications.
- Conferences
  - PEPR 2024 Conference
  - International Workshop on Privacy Engineering (IWPE)
  - PEPR 2023 Conference
  - PEPR 2022 Conference - 5ETT)
  - PEPR 2021 Conference
  - PEPR 2020 Conference
  - Symposium on Usable Privacy and Security (SOUPS)
  - USENIX Conference on Privacy Engineering Practice and Respect (PEPR)
  - International Workshop on Privacy Engineering (IWPE)
  - PEPR 2019 Conference
  - USENIX Enigma Conference
- Differential Privacy and Federated Learning
  - Uber SQL Differential Privacy - This repository contains a query analysis and rewriting framework to enforce differential privacy for general-purpose SQL queries. (deprecated)
  - FedML - FedML - The federated learning and distributed training library enabling machine learning anywhere at any scale. It's backed by [FedML, Inc](https://FedML.ai). Supporting large-scale geo-distributed training, cross-device federated learning on smartphones/IoTs, cross-silo federated learning on data silos, and research simulation. Best Paper Award at NeurIPS 2020.
  - Federated Compute Platform - This Google repository hosts infrastructure for compiling and running federated programs and computations in the cross-device setting.
  - Microsoft's SmartNoise - This toolkit uses state-of-the-art differential privacy techniques to inject noise into data, to prevent disclosure of sensitive information and manage exposure risk.
  - diffpriv: Easy Differential Privacy - R package that is an implementation of major general-purpose mechanisms for privatizing statistics, models, and machine learners, within the framework of differential privacy of Dwork et al. (2006).
  - sdcMicro: Statistical Disclosure Control Methods for Anonymization of Microdata and Risk Estimation - R package that can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files.
  - This Week in Machine Learning and AI
  - Differential Privacy Theory & Practice with Aaron Roth
  - Scalable Differential Privacy for Deep Learning with Nicolas Papernot
  - Epsilon Software for Private Machine Learning with Chang Lu
  - Privacy-Preserving Decentralized Data Science with Andrew Trask
  - A List of Real-World Uses of Differential Privacy - Blog post that compiles a list of real-world deployments of differential privacy, with their privacy parameters.
  - Privacy-Preserving AI - Video on Privacy Preserving AI (Andrew Trask) | MIT Deep Learning Series
  - Uber SQL Differential Privacy - This repository contains a query analysis and rewriting framework to enforce differential privacy for general-purpose SQL queries. (deprecated)
  - Google Differential Privacy Library - This repository contains libraries to generate ε- and (ε, δ)-differentially private statistics over datasets. Includes differential privacy "building block" libraries in C++, Go, and Java, as well as the following:
  - Privacy on Beam - A differential privacy framework built on top of Apache Beam.
  - ZetaSQL Differential Privacy Extension - Command line interface for running differentially private SQL queries with [ZetaSQL](https://github.com/google/zetasql).
  - DP-Auditorium - Used for auditing differential privacy guarantees.
  - PyDP - Python wrapper for Google's Differential Privacy project. The library provides a set of ε-differentially private algorithms, which can be used to produce aggregate statistics over numeric data sets containing private or sensitive information.
  - IBM's Differential Privacy Library - Diffprivlib is a general-purpose library for experimenting with, investigating and developing applications in, differential privacy.
  - RAPPOR - Randomized Aggregatable Privacy-Preserving Ordinal Response (RAPPOR) is a technology for crowdsourcing statistics from end-user client software, anonymously, with strong privacy guarantees. (deprecated)
  - A Friendly, Non-Technical Introduction to Differential Privacy - Blog post that provides simple explanations for the core concepts behind differential privacy.
  - A List of Real-World Uses of Differential Privacy - Blog post that compiles a list of real-world deployments of differential privacy, with their privacy parameters.
  - Differential Privacy at the U.S. Census Bureau - Video on how differential privacy is being implemented in the U.S. Census.
  - Microsoft's SmartNoise - This toolkit uses state-of-the-art differential privacy techniques to inject noise into data, to prevent disclosure of sensitive information and manage exposure risk.
  - diffpriv: Easy Differential Privacy - R package that is an implementation of major general-purpose mechanisms for privatizing statistics, models, and machine learners, within the framework of differential privacy of Dwork et al. (2006).
  - sdcMicro: Statistical Disclosure Control Methods for Anonymization of Microdata and Risk Estimation - R package that can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files.
  - Tumult Analytics - Tumult Analytics is a Python library for computing aggregate queries on tabular data using differential privacy.
  - Differential Privacy Theory & Practice with Aaron Roth
  - Differential Privacy at Bluecore with Zahi Karam
  - Scalable Differential Privacy for Deep Learning with Nicolas Papernot
  - Epsilon Software for Private Machine Learning with Chang Lu
  - Privacy-Preserving Decentralized Data Science with Andrew Trask
  - Differential Privacy at the U.S. Census Bureau - Video on how differential privacy is being implemented in the U.S. Census.
  - FedML - FedML - The federated learning and distributed training library enabling machine learning anywhere at any scale. It's backed by [FedML, Inc](https://FedML.ai). Supporting large-scale geo-distributed training, cross-device federated learning on smartphones/IoTs, cross-silo federated learning on data silos, and research simulation. Best Paper Award at NeurIPS 2020.
  - Federated Compute Platform - This Google repository hosts infrastructure for compiling and running federated programs and computations in the cross-device setting.
  - TensorFlow Privacy - Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy.
  - TensorFlow Encrypted - TF Encrypted is a framework for encrypted machine learning in TensorFlow.
  - Sharing Data with Differential Privacy: A Primer
  - Practitioners’ Guide to Accessing Emerging Differential Privacy Tools
  - Evaluating Differential Privacy Tools’ Performance
  - Getting Started with Scalable Differential Privacy Tools on the Cloud
  - Differential Privacy at Bluecore with Zahi Karam
  - FedJAX - Google's JAX-based open source library for federated learning simulations that emphasizes ease-of-use in research.
  - PPRL: Privacy Preserving Record Linkage - R package that is a toolbox for deterministic, probabilistic and privacy-preserving record linkage techniques.
  - A Friendly, Non-Technical Introduction to Differential Privacy - Blog post that provides simple explanations for the core concepts behind differential privacy.
  - PySyft - PySyft is a Python library for secure and private Deep Learning.
  - CrypTen - CrypTen is a framework for Privacy Preserving Machine Learning built on PyTorch.
  - Stochastic Tester - Used to help catch regressions that could make the differential privacy property no longer hold.
  - Differential Privacy Accounting Library - Used for tracking privacy budget.
  - PipelineDP - Write fast, flexible pipelines that use modern techniques to aggregate user data in a privacy-preserving manner.
  - Compute Private Statistics with PipelineDP - This Google Developer Codelab walks through how to produce private statistics with differentially private aggregations using the PipelineDP Python framework.
  - Practical Differential Privacy w/ Apache Beam - Blog post showing how to use Privacy on Beam from Google's differential privacy library.
  - Flower - Originated from a research project at the University of Oxford, [Flower (flwr)](https://github.com/adap/flower) is a framework for building federated learning systems with a goal to make federated learning accessible to everyone.
  - Opacus - A library that enables training PyTorch models with differential privacy.
  - FLUTE - Created by Microsoft Research, Federated Learning Utilities and Tools for Experimentation (FLUTE) is a framework for running large-scale offline federated learning simulations.
  - Computing Private Statistics with Privacy on Beam - This Google Developer Codelab walks through the use of Privacy on Beam to perform differentially private analysis in Go.
  - TensorFlow Federated - TensorFlow Federated (TFF) is an open-source framework for machine learning and other computations on decentralized data.
- De-Identification and Anonymization
  - NIST Privacy Engineering Program - De-Identification Tools
  - NIST Privacy Engineering Program - De-Identification Tools
  - Data Anonymizer Extension for PostgreSQL - A set of SQL functions that remove personally identifiable values from a PostgreSQL table and replace them with random-but-plausible values.
  - Transforming Data in Google Cloud Platform - This reference covers the available de-identification techniques, or transformations, that can be applied in Google Cloud's Data Loss Prevention (i.e., redaction, replacement, masking, crypto-based tokenization, bucketing, date shifting, and time extraction).
  - Redacting Sensitive Information with User-Defined Functions in Amazon Athena - Amazon Athena supports user-defined functions, a feature that enables you to write custom scalar functions and invoke them in SQL queries.
  - AWS AI-Powered Health Data Masking - The AI-Powered Health Data Masking solution in the AWS Solutions Library helps healthcare organizations identify and mask health data in images or text. (deprecated)
  - Anonymize Your Data Using Amazon S3 Object Lambda - Leverage AWS S3 Object Lambdas in order to anonymize data.
  - ARX Data Anonymization Tool - ARX is a comprehensive open source software for anonymizing sensitive personal data.
  - UTD Anonymization ToolBox - UT Dallas Data Security and Privacy Lab compiled various anonymization methods into a toolbox for public use by researchers.
  - Kodex - An open-source toolkit for privacy and security engineering. It helps you to automate data security and data protection measures in your data engineering workflows.
  - Anonimatron - Free, extendable, open source data anonymization tool.
  - Anonymizer MySQL - This simple tool will allow you to make anonymizerd clone of your database.
  - MySQL Data Anonymizer - MySQL Data Anonymizer is a PHP library that anonymizes your data in the database.
  - anonymize-it - The Elastic Machine Learning Team's general purpose tool for suppression, masking, and generalization of fields to aid data pseudonymization.
  - k-anonymity
  - delta-presence
  - t-closeness
  - Data Anonymization Tool - The Singapore PDPC has launched a free Data Anonymization tool to help organizations transform simple datasets by applying basic anonymization techniques.
  - UTD Anonymization ToolBox - UT Dallas Data Security and Privacy Lab compiled various anonymization methods into a toolbox for public use by researchers.
  - Data Anonymizer Extension for PostgreSQL - A set of SQL functions that remove personally identifiable values from a PostgreSQL table and replace them with random-but-plausible values.
  - Singapore Guide to Anonymization - The Singapore Personal Data Protection Commission (PDPC) has published the Guide on Basic Anonymization to provide more practical guidance for businesses on how to appropriately perform basic anonymization and de-identification of various datasets.
  - Transforming Data in Google Cloud Platform - This reference covers the available de-identification techniques, or transformations, that can be applied in Google Cloud's Data Loss Prevention (i.e., redaction, replacement, masking, crypto-based tokenization, bucketing, date shifting, and time extraction).
  - k-anonymity
  - k-map
  - l-diversity
  - delta-presence
  - Data Anonymization Tool - The Singapore PDPC has launched a free Data Anonymization tool to help organizations transform simple datasets by applying basic anonymization techniques.
  - Anonymizer - Anonymizer is a universal tool to create anonymized DBs for projects.
  - Singapore Guide to Anonymization - The Singapore Personal Data Protection Commission (PDPC) has published the Guide on Basic Anonymization to provide more practical guidance for businesses on how to appropriately perform basic anonymization and de-identification of various datasets.
  - Presidio - Context aware, pluggable and customizable PII anonymization service for text and images, developed by Microsoft.
  - Technical Privacy Metrics: a Systematic Survey - Paper by Isabel Wagner and David Eckhoff that discusses over 80 privacy metrics and introduces categorizations based on the aspect of privacy they measure, their required inputs, and the type of data that needs protection. They also present a method on how to choose privacy metrics based on nine questions that help identify the right privacy metrics for a given scenario.
  - Masked AI - Python SDK and CLI wrappers that enable safer usage of public large language models (LLMs) like OpenAI/GPT4 by removing sensitive data from prompts and replacing it with fake data before submitting to the OpenAI API.
  - A Visual Guide to Practical Data De-Identification (FPF Infographic)
  - Transforming Data in Google Cloud Platform - This reference covers the available de-identification techniques, or transformations, that can be applied in Google Cloud's Data Loss Prevention (i.e., redaction, replacement, masking, crypto-based tokenization, bucketing, date shifting, and time extraction).
  - k-map
  - l-diversity
  - Static Data Masking for Azure SQL Database and SQL Server - Microsoft's Static Data Masking is a data protection feature that helps users sanitize sensitive data in a copy of their SQL databases. It is compatible with SQL Server (SQL Server 2012 and newer), Azure SQL Database (DTU and vCore-based hosting options, excluding Hyperscale), and SQL Server on Azure Virtual Machines.
- Courses
  - Privacy by Design: Data Classification - LinkedIn Learning course by Nishant Bhajaria.
  - Privacy by Design: Data Sharing - LinkedIn Learning course by Nishant Bhajaria.
  - Implementing a Privacy, Risk, and Assurance Program - LinkedIn Learning course by Nishant Bhajaria.
  - Data Protocol - Courses to teach developers and technical professionals how to build products responsibly and partner with platforms effectively.
  - Data Privacy and Anonymization in R - Datacamp course that covers publicly releasing data sets with a differential privacy guarantee.
  - Privacy-Conscious Computer Systems - This class at Brown University (CSCI 2390) focuses on how to design computer systems that protect users' privacy.
  - Privacy by Design: Data Classification - LinkedIn Learning course by Nishant Bhajaria.
  - Privacy by Design: Data Sharing - LinkedIn Learning course by Nishant Bhajaria.
  - Implementing a Privacy, Risk, and Assurance Program - LinkedIn Learning course by Nishant Bhajaria.
  - Data Protocol - Courses to teach developers and technical professionals how to build products responsibly and partner with platforms effectively.
  - Privacy Quest - A gamified approach to learning about privacy engineering, Privacy Quest uses challenges and competitions to build your privacy and security knowledge.
  - Data Protocol - Courses to teach developers and technical professionals how to build products responsibly and partner with platforms effectively.
  - Privacy Quest - A gamified approach to learning about privacy engineering, Privacy Quest uses challenges and competitions to build your privacy and security knowledge.
  - Implementing a Privacy, Risk, and Assurance Program - LinkedIn Learning course by Nishant Bhajaria.
  - Carnegie Mellon University - Privacy Engineering Certificate - Four-week certificate program that revolves around a combination of mini-tutorials, class discussions, and hands-on exercises designed to ensure that students develop practical knowledge of all key privacy engineering areas.
  - Technical Privacy Masterclass - In four modules, this course from Privado is designed to deliver privacy leaders and their teams with an overview of the pillars of a proactive privacy program.
  - Hitchhiker's Guide to Privacy Engineering - The goal of this creative privacy project is to offer a fun, engaging, and immersive privacy learning experience for privacy lawyers to improve their technical privacy skills.
  - OpenMined Courses
  - Data Privacy and Anonymization in Python - Datacamp course on learning to process sensitive information with privacy-preserving techniques.
  - Secure and Private AI (Udacity) - Udacity course that covers how to extend PyTorch with the tools necessary to train AI models that preserve user privacy.
  - Privacy-Conscious Computer Systems - This class at Brown University (CSCI 2390) focuses on how to design computer systems that protect users' privacy.
  - Compliance Detective - A gamified approach to learning about privacy engineering, Compliance Detective (formerly Privacy Quest) uses challenges and competitions to build your privacy and security knowledge.
- Designing for Trust with Users
  - Data Permissions Catalogue - Catalogue created by the data consultancy IF to help teams make decisions about how, when, and why to collect and use data about people.
  - Contract Design Pattern Library - Library of guidelines, explanations, and examples to inspire and support you in exploring user-friendly approaches to contract simplification and visualization.
  - Data Permissions Catalogue - Catalogue created by the data consultancy IF to help teams make decisions about how, when, and why to collect and use data about people.
  - Lean Privacy Review - Carnegie Mellon University researchers developed a fast, easy method to catch privacy issues early in a system’s development process by gathering feedback from users.
  - Contract Design Pattern Library - Library of guidelines, explanations, and examples to inspire and support you in exploring user-friendly approaches to contract simplification and visualization.
  - The UX Guide to Getting Consent - Short guide by the International Association of Privacy Professionals (IAPP) about obtaining consent under the EU's GDPR.
  - Part 4: Privacy-Aware Design Framework
  - Lean Privacy Review - Carnegie Mellon University researchers developed a fast, easy method to catch privacy issues early in a system’s development process by gathering feedback from users.
  - How to Protect Your Users with the Privacy by Design Framework - Developers can help to defend their users’ personal privacy by adopting the Privacy by Design (PbD) framework.
  - Part 1: Common Concerns and Privacy in Web Forms
  - Privacy Patterns - UC Berkeley collection of design patterns attempting to standardize language for privacy-preserving technologies, document common solutions to privacy problems, and help designers identify and address privacy concerns.
  - Creepiness-Convenience Tradeoff - As people consider whether to use the new "creepy" technologies, they do a type of cost-benefit analysis weighing the loss of privacy against the benefits they will receive in return.
  - Building a Privacy Policy Users Actually Want to Read - Creation of a user-friendly privacy notice through privacy journeying and using a layered notice approach.
  - Part 2: Better Cookie Consent Experiences
  - Part 3: Better Notifications UX and Permissions Requests
- Deceptive Design Patterns
  - The Dark Side of UX Design - Practitioner-identified examples of stakeholder values superseding user values.
  - The Dark Side of UX Design - Practitioner-identified examples of stakeholder values superseding user values.
  - Deceptive Design Patterns - Deceptive design patterns (also known as "dark patterns") are tricks used in websites and apps that make you do things that you didn't mean to, like buying or signing up for something.
  - Dark Patterns Tipline - Gallery of deceptive patterns identified and submitted by individuals.
  - 10 Examples of Manipulative Consent Requests - Blog post that illustrates ten examples of manipulative consent patterns in cookie banners.
- Miscellaneous
  - Privacy is an Afterthought in the Software Lifecycle. That Needs to Change
  - Privacy is an Afterthought in the Software Lifecycle. That Needs to Change
  - The World of Geolocation Data (FPF Infographic)
  - Data and the Connected Car (FPF Infographic)
  - Microphones and the Internet of Things (FPF Infographic)
  - GDPR – A Practical Guide For Developers
  - W3C Self-Review Questionnaire: Security and Privacy
  - OWASP Mobile Application Security Verification Standard (MASVS) - PRIVACY
  - How Uber is Approaching Data Privacy Architecture
  - Microsoft - Code with Engineering Playbook: Privacy Fundamentals
  - Private AI - PETs Decision Tree
  - IAPP Privacy Engineering Section
  - VISCHER's Website and App Tracking Legal Checklist
  - VISCHER's Marketing Communications Legal Checklist
- Other Awesome Privacy Curations
  - awesome-ml-privacy-attacks
  - awesome-privacy-on-blockchains
  - awesome-zero-knowledge-proofs
  - awesome-privacy-papers
  - awesome-federated-computing
  - awesome-gdpr
  - awesome-artificial-intelligence-guidelines
  - awesome-ml-sp-papers
  - awesome-synethetic-data
  - awesome-privacy
  - awesome-threat-modeling
  - awesome-data-privacy
- Facial Recognition
  - Magritte - Google's Magritte is a MediaPipe-based library to redact faces from photos and videos. It provides processing graphs to reliably detect faces, track their movements in videos, and disguise the person's identity by obfuscating their face.
  - Understanding Facial Detection, Characterization and Recognition Technologies (Future of Privacy Forum (FPF) Infographic)
  - Fawkes - Fawkes, privacy preserving tool against facial recognition systems, developed by researchers at SANDLab, University of Chicago.
  - LowKey: Leveraging Adversarial Attacks to Protect Social Media Users from Facial Recognition - Adversarial filter that accounts for the entire image processing pipeline and is demonstrably effective against industrial-grade pipelines that include face detection and large scale databases. Also includes an [easy-to-use webtool](https://lowkey.umiacs.umd.edu/) that significantly degrades the accuracy of Amazon Rekognition and the Microsoft Azure Face Recognition API.
  - Creating a Serverless Face Blurring Service for Photos in Amazon S3 - This blog post shows how to build a serverless face blurring service for photos uploaded to an Amazon S3 bucket.
- Secure Multi-Party Computation
  - Facebook Private Computation Framework - Facebook Private Computation Framework (FBPCF) library allows developers to perform randomized controlled trials, without leaking information about who participated or what action an individual took. It uses secure multiparty computation to guarantee this privacy. FBPCF is for scaling multi-party computation up via threading.
  - EzPC (Easy Secure Multi-party Computation) - EzPC is a Microsoft Research tool that converts Tensorflow and ONNX models into Secure Multi-Party Computation protocols.
  - Private Join and Compute - Google's implementation of the "Private Join and Compute" functionality. This functionality allows two users, each holding an input file, to privately compute the sum of associated values for records that have common identifiers.
  - Facebook Private Computation Solutions - Facebook Private Computation Solutions (FBPCS) is a secure, privacy safe and scalable architecture to deploy multi-party computation applications in a distributed way on virtual private clouds via Private Scaling architecture. FBPCS consists of various services, interfaces that enable various private measurement solutions, e.g. Private Lift.
- Privacy Tech Series by [Lea Kissner](https://twitter.com/leakissner?lang=en)
  - Interface Design: The Who/What/Where Rule
  - Vulnerability versus Incident
  - Deidentification versus Anonymization
  - Aggregating Over Anonymized Data
  - Thinking Through ACL-Aware Data Processing
  - Settings and Surfaces
  - Comprehensible Access Control Lists
  - Data Retention in a Distributed System
  - Setting Data Retention Timelines
  - Handling Human Names
  - Aggregating Over Anonymized Data
  - Interface Design: The Who/What/Where Rule
  - Handling Human Names
  - Vulnerability versus Incident
  - Deidentification versus Anonymization
  - Thinking Through ACL-Aware Data Processing
  - Settings and Surfaces
  - Comprehensible Access Control Lists
  - Data Retention in a Distributed System
  - Setting Data Retention Timelines
- Privacy Threat Modeling
  - LINDDUN - The LINDDUN privacy engineering framework provides systematic support for the elicitation and mitigation of privacy threats in software systems.
  - LINDDUN GO - LINDDUN GO is designed to give you a quick start to privacy threat modeling.
  - PLOT4AI - Privacy Library Of Threats 4 Artificial Intelligence (PLOT4AI) is a threat modeling library to help practitioners build responsible artificial intelligence.
  - Draw.io Libraries for Threat Modeling - Collection of custom libraries for using the Draw.io diagramming application for threat modeling.
  - xCompass - A privacy threat modeling persona framework that developers can use to test and document privacy threats, and find edge cases of privacy harm (formerly named Models of Applied Privacy (MAP)).
  - LINDDUN - The LINDDUN privacy engineering framework provides systematic support for the elicitation and mitigation of privacy threats in software systems.
  - LINDDUN GO - LINDDUN GO is designed to give you a quick start to privacy threat modeling.
  - Privacy Adversarial Framework (PAF) - Developed by Facebook, the Privacy Adversarial Framework (PAF) is a knowledgebase of privacy-focused adversarial tactics and techniques that is heavily inspired by MITRE ATT&CK®.
  - PANOPTIC™ Privacy Threat Model - MITRE PANOPTIC™, the Pattern and Action Nomenclature Of Privacy Threats In Context, is a privacy threat taxonomy for breaking down and describing privacy attacks against individuals and groups of individuals.
- Career
  - What Does a Privacy Engineer Do, Anyway?
  - Ethyca's Privacy Engineering Job Board
  - What Does a Privacy Engineer Do, Anyway?
  - What to Expect in a Privacy Interview
- Tokenization
  - AWS Serverless Tokenization - Learn how to use Lambda Layers to develop a serverless tokenization solution in AWS.
  - auto-data-tokenize - This repo demonstrates a reference implementation of detecting and tokenizing sensitive structured data within Google Cloud Platform.
- Related GitHub Topics
  - GDPR
  - CCPA
  - Right-to-be-Forgotten
  - Privacy-Tech
  - Privacy-Enhancing-Technologies
  - Differential-Privacy
  - Federated-Learning
  - Privacy

Programming Languages

Python 35 C++ 10 Java 4 JavaScript 4 Go 2 Dockerfile 2 TypeScript 2 HTML 2 Jupyter Notebook 2 Astro 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

awesome-privacy-engineering

Awesome Privacy Engineering [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)

Books

Data Deletion, Data Mapping, and Data Subject Access Requests

Machine Learning and Algorithmic Bias

Homomorphic Encryption

Synthetic Data

Tagging Personally Identifiable Information

Regulatory and Framework Resources

Conferences

Differential Privacy and Federated Learning

De-Identification and Anonymization

Courses

Designing for Trust with Users

Deceptive Design Patterns

Miscellaneous

Other Awesome Privacy Curations

Facial Recognition

Secure Multi-Party Computation

Privacy Tech Series by [Lea Kissner](https://twitter.com/leakissner?lang=en)

Privacy Threat Modeling

Career

Tokenization

awesome-privacy-engineering

Awesome Privacy Engineering [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)

Books

Data Deletion, Data Mapping, and Data Subject Access Requests

Machine Learning and Algorithmic Bias

Homomorphic Encryption

Synthetic Data

Tagging Personally Identifiable Information

Regulatory and Framework Resources

Conferences

Differential Privacy and Federated Learning

De-Identification and Anonymization

Courses

Designing for Trust with Users

Deceptive Design Patterns

Miscellaneous

Other Awesome Privacy Curations

Facial Recognition

Secure Multi-Party Computation

Privacy Tech Series by [Lea Kissner](https://twitter.com/leakissner?lang=en)

Privacy Threat Modeling

Career

Tokenization

Related GitHub Topics