https://github.com/keiserlab/autofragdiff
https://github.com/keiserlab/autofragdiff
Last synced: 11 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/keiserlab/autofragdiff
- Owner: keiserlab
- License: mit
- Created: 2023-10-03T19:45:01.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-23T11:32:54.000Z (about 2 years ago)
- Last Synced: 2025-04-01T13:05:39.763Z (about 1 year ago)
- Language: Python
- Size: 34.9 MB
- Stars: 27
- Watchers: 3
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AutoFragDiff
This repository is the official implementation of Autoregressive fragment based diffusion model for target-aware ligand design

# Dependencies
- RDKit
- openbabel
- PyTorch
- biopython
- biopandas
- networkx
- py3dmol
- scikit-learn
- tensorboard
- wandb
- pytorch-lightning
## Create conda environment
```
conda create -n autofragdiff
pip install rdkit
conda install -c conda-forge openbabel
pip3 install torch torchvision torchaudio
pip install biopython
pip install biopandas
pip install networkx
pip install py3dmol
pip install scikit-learn
pip install tensorboard
pip install wandb
pip install tqdm
pip install pytorch-lightning==1.6.0
```
The model has been tested with the following software versions:
| Software | Version |
| --------------- | ----------- |
| rdkit | 2023.3.1 |
| openbabel | 3.1.1 |
| pytorch | 2.0.1 |
| biopython | 1.81 |
| biopandas | 0.4.1 |
| networkx | 3.1 |
| py3dmol | 2.0.1. |
| scikit-learn | 1.2.2 |
| tensorboard | 2.13.0 |
| wandb | 0.15.2 |
| pytorch-lightning | 1.6.0 |
## QucikVina2
For Docking with qvina install QuickVina2:
```
wget https://github.com/QVina/qvina/raw/master/bin/qvina2.1
chmod +x qvina2.1
```
We also need MGLTools for preparing the receptor for docking (pdb->pdbqt) but it can mess up the conda environment, so make a new one.
```
conda create -n mgltools -c bioconda mgltools
```
# Data Preparation
## CrossDock
Download and extract the dataset as described by the authors of Pocket2Mol: https://github.com/pengxingang/Pocket2Mol/tree/main/data
process the molecule fragments using a custom fragmentation.
```
python process_crossdock.py --rootdir $CROSSDOCK_PATH --outdir $OUT_DIR \
--dist_cutoff 7. --max-num-frags 8 --split test --max-atoms-single-fragment 22 \
--add-Vina-score --add-QED-score --add-SA-score --n-cores 16
```
- For adding Vina you also need to generate pdbqt files for each receptor and crystallographic ligand.
# Training
## Training AutoFragdiff.
```
python train_frag_diffuser.py --data $CROSSDOCK_DIR --exp_name CROSSDOCK_model_1 \
--lr 0.0001 --n_layers 6 --nf 128 --diffusoin_steps 500 \
--diffusion_loss_type l2 --n_epochs 1000 --batch_size 4
```
## Training anchor predictor
```
python train_anchor_predictor --data $CROSSDOCK_DIR --exp_name CROSDOCK_anchor_model_1 \
--n_layers 4 --inv_sublayers 2 --nf 128 --dataset-type CrossDock
```
# Sampling:
Firt download the trained models from the google drive in the following link
https://drive.google.com/drive/folders/1DQwIfibHIoFPGJP6aHBGiYRp87bCZFA0?usp=share_link
## CrossDock pocket-based molecule generation:
To generate molecules from trained pocket-based model, also use anchor-predictor model. fragment sizes are sampled from the data distribution.
## CrossDock pocket-based molecule generation (with guidance):
To generate molecules for crossdock test set:
```
python sample_crossdock_mols.py --results-path results/ --data-path $(path-to-crossdock-dataset) --use-anchor-model --anchor-model anchor-model.ckpt --n-samples 20 --exp-name test-crossdock --diff-model pocket-gvp.ckpt --device cuda:0
```
To sample molecules from a pdb file:
first run fpocket and identify the correct pocket using:
```
fpocket -f $pdb.pdb
```
fpocket gives multiple pockets, you can visualize the identify the right pocket and run sampling
```
python sample_from_pocket.py --result-path results --pdb $pdbname --anchor-model anchor-model.ckpt --n-samples 10 --device cuda:0 --pocket-number 1
```
## Scaffold-based molecule property optimization
For scaffold-based optimization you need the pdb file of the pocket and the sdf file of the scaffold molecule (and the original molecule).
Scaffold-extension for crossdock test set
```
python extend_scaffold_crossdock.py --data-path $(path-to-crossdock) --results-path scaffold-gen --anchor-model anchor-model.ckpt --n-samples 20 --exp-name scaffold-gen --diff-model pocket-gvp.ckpt --device cuda:0
```
- In order to select the anchor you can add the `--custom-anchors` argument and provide the ids of custom anchors (starts from 0 and based on atomic ids in the scaffold molecule).