https://github.com/lucidrains/ddpm-proteins
A denoising diffusion probabilistic model (DDPM) tailored for conditional generation of protein distograms
https://github.com/lucidrains/ddpm-proteins
artificial-intelligence deep-learning generative-model protein-structure
Last synced: 7 months ago
JSON representation
A denoising diffusion probabilistic model (DDPM) tailored for conditional generation of protein distograms
- Host: GitHub
- URL: https://github.com/lucidrains/ddpm-proteins
- Owner: lucidrains
- License: mit
- Created: 2021-06-14T16:09:05.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2022-04-20T22:22:48.000Z (over 3 years ago)
- Last Synced: 2025-04-14T20:43:05.524Z (7 months ago)
- Topics: artificial-intelligence, deep-learning, generative-model, protein-structure
- Language: Python
- Homepage:
- Size: 94.7 KB
- Stars: 139
- Watchers: 4
- Forks: 19
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Denoising Diffusion Probabilistic Model for Proteins
Implementation of Denoising Diffusion Probabilistic Model in Pytorch. It is a new approach to generative modeling that may have the potential to rival GANs. It uses denoising score matching to estimate the gradient of the data distribution, followed by Langevin sampling to sample from the true distribution. This implementation was transcribed from the official Tensorflow version here.
This specific repository will be using a heavily modifying version of the U-net for learning on protein structure, with eventual conditioning from MSA Transformers attention heads.

** at around 40k iterations **
## Install
```bash
$ pip install ddpm-proteins
```
## Training
We are using weights & biases for experimental tracking
First you need to login
```bash
$ wandb login
```
Then you will need to cache all the MSA attention embeddings by first running. For some reason, the below needs to be done multiple times to cache all the proteins correctly (it does work though). I'll get around to fixing this.
```bash
$ python cache.py
```
Finally, you can begin training by invoking
```bash
$ python train.py
```
If you would like to clear or recompute the cache (ie after changing the fetch MSA function), just run
```bash
$ rm -rf ~/.cache.ddpm-proteins
```
## Todo
- [x] condition on mask
- [x] condition on MSA transformers (with caching of tensors in specified directory by protein id)
- [x] all-attention network with uformer https://arxiv.org/abs/2106.03106 (with 1d + 2d conv kernels)
- [ ] reach for size 384
- [ ] add all improvements from https://arxiv.org/abs/2105.05233 and https://cascaded-diffusion.github.io/
## Usage
```python
import torch
from ddpm_proteins import Unet, GaussianDiffusion
model = Unet(
dim = 64,
dim_mults = (1, 2, 4, 8)
)
diffusion = GaussianDiffusion(
model,
image_size = 128,
timesteps = 1000, # number of steps
loss_type = 'l1' # L1 or L2
)
training_images = torch.randn(8, 3, 128, 128)
loss = diffusion(training_images)
loss.backward()
# after a lot of training
sampled_images = diffusion.sample(batch_size = 4)
sampled_images.shape # (4, 3, 128, 128)
```
Or, if you simply want to pass in a folder name and the desired image dimensions, you can use the `Trainer` class to easily train a model.
```python
from ddpm_proteins import Unet, GaussianDiffusion, Trainer
model = Unet(
dim = 64,
dim_mults = (1, 2, 4, 8)
).cuda()
diffusion = GaussianDiffusion(
model,
image_size = 128,
timesteps = 1000, # number of steps
loss_type = 'l1' # L1 or L2
).cuda()
trainer = Trainer(
diffusion,
'path/to/your/images',
train_batch_size = 32,
train_lr = 2e-5,
train_num_steps = 700000, # total training steps
gradient_accumulate_every = 2, # gradient accumulation steps
ema_decay = 0.995, # exponential moving average decay
fp16 = True # turn on mixed precision training with apex
)
trainer.train()
```
Samples and model checkpoints will be logged to `./results` periodically
## Citations
```bibtex
@misc{ho2020denoising,
title = {Denoising Diffusion Probabilistic Models},
author = {Jonathan Ho and Ajay Jain and Pieter Abbeel},
year = {2020},
eprint = {2006.11239},
archivePrefix = {arXiv},
primaryClass = {cs.LG}
}
```
```bibtex
@inproceedings{anonymous2021improved,
title = {Improved Denoising Diffusion Probabilistic Models},
author = {Anonymous},
booktitle = {Submitted to International Conference on Learning Representations},
year = {2021},
url = {https://openreview.net/forum?id=-NEXDKk8gZ},
note = {under review}
}
```
```bibtex
@article{Rao2021.02.12.430858,
author = {Rao, Roshan and Liu, Jason and Verkuil, Robert and Meier, Joshua and Canny, John F. and Abbeel, Pieter and Sercu, Tom and Rives, Alexander},
title = {MSA Transformer},
year = {2021},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2021/02/13/2021.02.12.430858},
journal = {bioRxiv}
}
```