https://github.com/awslabs/fast-differential-privacy
Fast, memory-efficient, scalable optimization of deep learning with differential privacy
https://github.com/awslabs/fast-differential-privacy
deep-learning differential-privacy neural-network privacy-preserving-machine-learning
Last synced: about 1 year ago
JSON representation
Fast, memory-efficient, scalable optimization of deep learning with differential privacy
- Host: GitHub
- URL: https://github.com/awslabs/fast-differential-privacy
- Owner: awslabs
- License: apache-2.0
- Created: 2022-11-20T15:31:03.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2025-01-08T16:44:30.000Z (over 1 year ago)
- Last Synced: 2025-03-13T02:37:51.164Z (over 1 year ago)
- Topics: deep-learning, differential-privacy, neural-network, privacy-preserving-machine-learning
- Language: Python
- Homepage:
- Size: 834 KB
- Stars: 115
- Watchers: 3
- Forks: 20
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-differential-privacy - Fast-Differential-Privacy - A Faster way of training PyTorch models with Differential Privacy
README
# Fast Differential Privacy
*Fast Differential Privacy* (**fastDP**) is a library that allows differentially private optimization of PyTorch models, with a few additional lines of code. The goal of this library is to make DP deep learning as similar to the standard non-private learning as possible, in terms of **speed, memory cost, scalability, accuracy and hyperparameter-tuning**. It supports all PyTorch optimizers, popular models in [TIMM](https://github.com/rwightman/pytorch-image-models), [torchvision](https://github.com/pytorch/vision), [HuggingFace](https://huggingface.co/transformers/) (up to supported modules), multiple privacy accountants, multiple clipping functions/styles, most parameter-efficient training methods, and distribute solutions such as DeepSpeed and FSDP. The library has provably little overhead in terms of training time and memory cost, compared with the standard non-private optimization.
---
## Installation.
To install the library after Git clone, run
```bash
python -m setup develop
```
> :warning: **NOTE**: We strongly recommend Python>=3.8 and torch<=1.11 (it is a known issue that torch 1.12 can slow down as much as 3 times).
## Getting started
To train a model with differential privacy, simply create a `PrivacyEngine` and continue the standard training pipeline:
```python
from fastDP import PrivacyEngine
optimizer = SGD(model.parameters(), lr=0.05)
privacy_engine = PrivacyEngine(
model,
batch_size=256,
sample_size=50000,
epochs=3,
target_epsilon=2,
clipping_fn='automatic',
clipping_mode='MixOpt',
origin_params=None,
clipping_style='all-layer',
)
# attaching to optimizers is not needed for multi-GPU distributed learning
privacy_engine.attach(optimizer)
#----- standard training pipeline
loss = F.cross_entropy(model(batch), labels)
loss.backward()
optimizer.step()
optimizer.zero_grad()
```
We provide details about our privacy engine in `fastDP/README.md`, including the supported modules and the arguments. By default, we use the `'MixOpt'` (hybrid book-keeping [4]) clipping mode (which enjoys almost the same time complexity as non-private optimization), and the automatic clipping function [8] (which does not need to tune the clipping threshold `max_grad_norm`). We support RDP and GLW privacy accountant, and additional accountants can be used through the argument `noise_multiplier`, after its calculation from [[Automating differential privacy computation](https://github.com/yuxiangw/autodp)] library.
Specifically, we allow the gradient accumulation to use very large batch size, which is beneficial to DP optimization:
```python
for i, batch in enumerate(dataloader):
loss = F.cross_entropy(model(batch), labels)
loss.backward()
if i % gradient_accumulation_steps == 0:
optimizer.step()
optimizer.zero_grad()
```
## Foundation model release
We release DP vision foundation models in [v2.1](https://github.com/awslabs/fast-differential-privacy/releases/tag/v2.1): VisionTransformer models (ViT; ~86M param) following [Pre-training Differentially Private Models with Limited Public Data](https://arxiv.org/abs/2402.18752) in NeurIPS 2024. These models have [epsilon=2](https://github.com/awslabs/fast-differential-privacy/releases/download/v2.1/ViT_base_imgnet11k_DP_eps2.pt) and [epsilon=8](https://github.com/awslabs/fast-differential-privacy/releases/download/v2.1/ViT_base_imgnet11k_DP_eps8.pt), pre-trained on ImageNet-1k with AdamW (1k classes, 1 million images) and ImageNet-11k with DP-AdamW (11k classes, 11 million images). More DP foundation models to come!
## Highlights
1. This library enables large model training in the **multi-GPU distributed setting** and **supports mixed precision training** under DeepSpeed and FSDP.
The scalability has been tested on 100B models with 512 GPUs.
2. This library enables DP training to have almost **the same time and space complexity** as the standard non-private training. This is achieved by three key techniques as described in [4]: mixed ghost norm, book-keeping, and ghost differentiation. In practice, we observe <20% memory overhead and <25% slowdown across different tasks.
3. Specifically, this library overcomes the severe memory issues of large model (commonly encountered by Opacus, which computes the per-sample gradients) and high dimensional data (commonly encountered by ghost clipping, e.g. in Private transformers), by leveraging the mixed ghost norm trick [3,8].
4. We **support all optimizers** in [`torch.optim`](https://pytorch.org/docs/stable/optim.html) (SGD, Adam, AdaGrad,...) and a wide range of **models** (BERT, RoBERTa, GPT, ViT, BEiT, CrossViT, DEiT, ResNet, VGG, DenseNet,...), including their parameter-efficient variants. For example, one can run DP bias-term fine-tuning (DP-BiTFiT) by simply freezing non-bias terms, as in `examples/image_classification`.
------
Full fine-tuning results on a single A100 GPU
| Datasets | ε | Setting | Model | Accuracy | Time(min)/epoch |
|----------|---|----------------------------------------------------------|---------------|-----------|-----------------|
| CIFAR10 | 2 | [6] | ViT-large | 98.9 | 7.0 |
| CIFAR100 | 2 | [6] | BEiT-large | 88.7 | 6.5 |
| CelebA | 3 | [6] | ResNet18 | 88.2 | 2.7 |
| SST2 | 3 | [8] | RoBERTa-large | 93.9 | 13.5 |
| QNLI | 3 | [8] | RoBERTa-large | 91.0 | 20.2 |
| QQP | 3 | [8] | RoBERTa-large | 86.8 | 70.0 |
| MNLI | 3 | [8] | RoBERTa-large | 86.3/86.7 | 77.1 |
More datasets, epsilon budgets, models, fine-tuning styles, and different hyperparamters can be found in the related papers.
## Examples
The `examples` folder covers tasks on the table-to-text (E2E and DART datasets with GPT2 models), the text classification (SST2/QNLI/QQP/MNLI datasets with BERT/RoBERTa models), and the image classification (CIFAR10/CIFAR100/CelebA datasets with [TIMM](https://github.com/rwightman/pytorch-image-models)/[torchvision](https://github.com/pytorch/vision) models). Detailed `README.md` can be found in each sub-folder. These examples can be used to reproduce the results in [2,3,4,6,8].
## Citation
Please consider citing the following if you want to use this library in your works:
```
@inproceedings{bu2023differentially,
title={Differentially private optimization on large model at small cost},
author={Bu, Zhiqi and Wang, Yu-Xiang and Zha, Sheng and Karypis, George},
booktitle={International Conference on Machine Learning},
pages={3192--3218},
year={2023},
organization={PMLR}
}
@article{bu2023zero,
title={Zero redundancy distributed learning with differential privacy},
author={Bu, Zhiqi and Chiu, Justin and Liu, Ruixuan and Zha, Sheng and Karypis, George},
booktitle={ICLR 2023 Workshop on Pitfalls of limited data and computation for Trustworthy ML},
journal={arXiv preprint arXiv:2311.11822},
year={2023}
}
@inproceedings{bu2022differentially,
title={Differentially Private Bias-Term Fine-tuning of Foundation Models},
author={Bu, Zhiqi and Wang, Yu-Xiang and Zha, Sheng and Karypis, George},
booktitle={Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022},
year={2022}
}
```
## Acknowledgements
This codebase is largely inspired by [[Opacus (v0.15)]](https://github.com/pytorch/opacus), [[Private transformers (v0.2.3)]](https://github.com/lxuechen/private-transformers), [[Private Vision]](https://github.com/woodyx218/private_vision), and [[FastGradClip]](https://github.com/ppmlguy/fastgradclip).
## References
[1] Ian Goodfellow. "Efficient per-example gradient computations." arXiv preprint arXiv:1510.01799 (2015).
[2] Xuechen Li, Florian Tramer, Percy Liang, and Tatsunori Hashimoto. "Large language models can be strong differentially private learners." ICLR (2022).
[3] Zhiqi Bu, Jialin Mao, and Shiyun Xu. "Scalable and Efficient Training of Large Convolutional Neural Networks with Differential Privacy." NeurIPS (2022).
[4] Zhiqi Bu, Yu-Xiang Wang, Sheng Zha, and George Karypis. "Differentially Private Optimization on Large Model at Small Cost." ICML (2023).
[5] Ashkan Yousefpour, Igor Shilov, Alexandre Sablayrolles, Davide Testuggine, Karthik Prasad, Mani Malek, John Nguyen et al. "Opacus: User-friendly differential privacy library in PyTorch." arXiv preprint arXiv:2109.12298 (2021).
[6] Zhiqi Bu, Yu-Xiang Wang, Sheng Zha, and George Karypis. "Differentially Private Bias-Term Fine-tuning of Foundation Models." ICML (2024).
[7] Martin Abadi, et al. "Deep learning with differential privacy." Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security.
[8] Zhiqi Bu, Yu-Xiang Wang, Sheng Zha, and George Karypis. "Automatic clipping: Differentially private deep learning made easier and stronger." NeurIPS (2023).
[9] Zhiqi Bu, Xinwei Zhang, Mingyi Hong, Sheng Zha, and George Karypis. "Pre-training Differentially Private Models with Limited Public Data." NeurIPS (2024).