Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/ChrisWaites/pyvacy

Differentially Private Optimization for PyTorch 👁🙅‍♀️
https://github.com/ChrisWaites/pyvacy

differential-privacy pytorch

Last synced: 8 days ago
JSON representation

Differentially Private Optimization for PyTorch 👁🙅‍♀️

Host: GitHub
URL: https://github.com/ChrisWaites/pyvacy
Owner: ChrisWaites
License: apache-2.0
Created: 2019-03-22T22:05:42.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-05-26T15:29:38.000Z (about 4 years ago)
Last Synced: 2024-03-15T04:11:49.611Z (4 months ago)
Topics: differential-privacy, pytorch
Language: Python
Homepage:
Size: 1.38 MB
Stars: 179
Watchers: 6
Forks: 29
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE

Lists

AwesomeResponsibleAI - PyVacy: Privacy Algorithms for PyTorch
awesome-differential-privacy - pyvacy - Implementation DP-SGD for PyTorch. (Libraries)

README

PyVacy: Privacy Algorithms for PyTorch

Basically [TensorFlow Privacy](https://github.com/tensorflow/privacy), but for PyTorch.

DP-SGD implementation modeled after techniques presented within [Deep Learning with Differential Privacy](https://arxiv.org/abs/1607.00133) and [A General Approach to Adding Differential Privacy to Iterative Training Procedures](https://arxiv.org/abs/1812.06210).

## Example Usage

```python
from pyvacy import optim, analysis, sampling

training_parameters = {
'N': len(train_dataset),
# An upper bound on the L2 norm of each gradient update.
# A good rule of thumb is to use the median of the L2 norms observed
# throughout a non-private training loop.
'l2_norm_clip': 1.0,
# A coefficient used to scale the standard deviation of the noise applied to gradients.
'noise_multiplier': 1.1,
# Each example is given probability of being selected with minibatch_size / N.
# Hence this value is only the expected size of each minibatch, not the actual.
'minibatch_size': 128,
# Each minibatch is partitioned into distinct groups of this size.
# The smaller this value, the less noise that needs to be applied to achieve
# the same privacy, and likely faster convergence. Although this will increase the runtime.
'microbatch_size': 1,
# The usual privacy parameter for (ε,δ)-Differential Privacy.
# A generic selection for this value is 1/(N^1.1), but it's very application dependent.
'delta': 1e-5,
# The number of minibatches to process in the training loop.
'iterations': 15000,
}

model = nn.Sequential(...)
optimizer = optim.DPSGD(params=model.parameters(), **training_parameters)
epsilon = analysis.epsilon(**training_parameters)
loss_function = ...

minibatch_loader, microbatch_loader = sampling.get_data_loaders(**training_parameters)
for X_minibatch, y_minibatch in minibatch_loader(train_dataset):
optimizer.zero_grad()
for X_microbatch, y_microbatch in microbatch_loader(TensorDataset(X_minibatch, y_minibatch)):
optimizer.zero_microbatch_grad()
loss = loss_function(model(X_microbatch), y_microbatch)
loss.backward()
optimizer.microbatch_step()
optimizer.step()
```

## Tutorials

`mnist.py`

Implements a basic classifier for identifying which digit a given MNIST image corresponds to. The model achieves a test set classification accuracy of 96.7%. The architecture and results achieved are inspired by the corresponding tutorial within [TensorFlow privacy](https://github.com/tensorflow/privacy/tree/master/tutorials).

## Disclaimer

Do NOT use the contents of this repository in applications which handle sensitive data. The author accepts no liability for privacy infringements - use the contents of this repository solely at your own discretion.