https://github.com/genentech/ringer
RINGER: Rapid Conformer Generation for Macrocycles with Sequence-Conditioned Internal Coordinate Diffusion
https://github.com/genentech/ringer
Last synced: about 1 year ago
JSON representation
RINGER: Rapid Conformer Generation for Macrocycles with Sequence-Conditioned Internal Coordinate Diffusion
- Host: GitHub
- URL: https://github.com/genentech/ringer
- Owner: Genentech
- License: mit
- Created: 2023-05-23T07:05:23.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-08-13T18:06:25.000Z (almost 2 years ago)
- Last Synced: 2024-11-25T21:28:55.314Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 115 MB
- Stars: 16
- Watchers: 7
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# RINGER
This repository is the official implementation of [RINGER: Rapid Conformer Generation for Macrocycles with Sequence-Conditioned Internal Coordinate Diffusion](https://arxiv.org/abs/2305.19800).

## Requirements
To install requirements:
```setup
conda env create -f environment.yaml
conda activate ringer
pip install -e .
```
## Data
Download and extract the CREMP pickle.tar.gz from [here](https://doi.org/10.5281/zenodo.7931444). Use [train.csv](data/cremp/train.csv) and [test.csv](data/cremp/test.csv) to partition it into training and test data and put the corresponding files into [train](data/cremp/train) and [test](data/cremp/test).
## Training
To train the full conditional model, run this command:
```train
train conditional.json
```
The config file can be specified by an absolute path or by a path relative to the [configs](configs) folder. Similarly, within the config file, `data_dir` can be an absolute path or a path relative to the [data](data) folder.
To log a training run with Weights & Biases, set up your configuration in [configs/wandb/wandb.json](configs/wandb/wandb.json) and set up logging using:
```train
train conditional.json --wandb-run
```
## Sampling
The [pre-trained model](assets/models/conditional) is included in this repository.
To generate samples for the CREMP test set, run:
```eval
evaluate \
--model-dir assets/models/conditional \
--data-dir cremp/test \
--split-sizes 0.0 0.0 1.0 \
--sample-only
```
This creates a `sample` directory containing samples for all molecules in `sample/samples.pickle`.
Run `evaluate --help` to see all options available for sampling and evaluation.
## Reconstruction
The `evaluate` command can also be used to reconstruct backbones (not including side chains) and to compute evaluation metrics. However, it is not recommended to do so because `evaluate` does not parallelize well across molecules.
Instead, reconstruction (including side chains) is done most effectively for each molecule individually using [scripts/reconstruct_single.py](scripts/reconstruct_single.py). Parallelization can then be efficiently achieved by submitting a batch job array using an HPC job scheduler (e.g., Slurm) and passing the job array index as the first argument to the script. To reconstruct molecule 0, run:
```shell
python scripts/reconstruct_single.py 0 \
cremp/test \
sample/samples.pickle \
sample/reconstructed_mols \
assets/models/conditional/training_mean_distances.json
```
The script will run the optimization to reconstruct the ring coordinates, followed by a linear (NeRF) reconstruction of the side chains using the [conformer samples previously generated](#sampling), and save the resulting molecule in `sample/reconstructed_mols`. Note that even though we point the script to `cremp/test`, it only uses the atom identities and connectivity information from the test molecules; their geometries are entirely set during the reconstruction procedure.
Run `python scripts/reconstruct_single.py --help` for an overview of other parameters available for reconstruction.
## Evaluation
As with reconstruction, computing metrics is best done separately for each molecule using [scripts/compute_metrics_single.py](scripts/compute_metrics_single.py) followed by aggregation across molecules using [scripts/aggregate_metrics.py](scripts/aggregate_metrics.py). For example, to compute metrics for the `H.A.S.V` macrocycle, run
```shell
python scripts/compute_metrics_single.py \
cremp/test/H.A.S.V.pickle \
sample/reconstructed_mols/H.A.S.V.pickle
```
Run `python scripts/compute_metrics_single.py --help` and `python scripts/aggregate_metrics.py --help` for an overview of other parameters available for computing metrics.
## Contributing
Install pre-commit hooks to use automated code formatting before committing changes. Make sure you're in the top-level directory and run:
```bash
pre-commit install
```
After that, your code will be automatically reformatted on every new commit.
To manually reformat all files in the project, use:
```bash
pre-commit run -a
```
To update the hooks defined in [.pre-commit-config.yaml](.pre-commit-config.yaml), use:
```bash
pre-commit autoupdate
```
## License
Licensed under the MIT License. See [LICENSE](LICENSE) for additional details.
## Citations
For the code and/or model, please cite:
```
@misc{grambow2023ringer,
title={{RINGER}: Rapid Conformer Generation for Macrocycles with Sequence-Conditioned Internal Coordinate Diffusion},
author={Colin A. Grambow and Hayley Weir and Nathaniel L. Diamant and Alex M. Tseng and Tommaso Biancalani and Gabriele Scalia and Kangway V. Chuang},
year={2023},
eprint={2305.19800},
archivePrefix={arXiv},
primaryClass={q-bio.BM}
}
```
To cite the CREMP dataset, please use:
```
@article{grambow2024cremp,
title = {{CREMP: Conformer-rotamer ensembles of macrocyclic peptides for machine learning}},
author = {Grambow, Colin A. and Weir, Hayley and Cunningham, Christian N. and Biancalani, Tommaso and Chuang, Kangway V.},
year = {2024},
journal = {Scientific Data},
doi = {10.1038/s41597-024-03698-y},
pages = {859},
number = {1},
volume = {11}
}
```
You can also cite the CREMP Zenodo repository directly:
```
@dataset{grambow_colin_a_2023_7931444,
author = {Grambow, Colin A. and
Weir, Hayley and
Cunningham, Christian N. and
Biancalani, Tommaso and
Chuang, Kangway V.},
title = {{CREMP: Conformer-Rotamer Ensembles of Macrocyclic
Peptides for Machine Learning}},
month = may,
year = 2023,
publisher = {Zenodo},
version = {1.0.1},
doi = {10.5281/zenodo.7931444},
url = {https://doi.org/10.5281/zenodo.7931444}
}
```