https://github.com/genentech/ringer

RINGER: Rapid Conformer Generation for Macrocycles with Sequence-Conditioned Internal Coordinate Diffusion
https://github.com/genentech/ringer

Last synced: over 1 year ago
JSON representation

RINGER: Rapid Conformer Generation for Macrocycles with Sequence-Conditioned Internal Coordinate Diffusion

Host: GitHub
URL: https://github.com/genentech/ringer
Owner: Genentech
License: mit
Created: 2023-05-23T07:05:23.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-08-13T18:06:25.000Z (almost 2 years ago)
Last Synced: 2024-11-25T21:28:55.314Z (over 1 year ago)
Language: Python
Homepage:
Size: 115 MB
Stars: 16
Watchers: 7
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # RINGER

This repository is the official implementation of [RINGER: Rapid Conformer Generation for Macrocycles with Sequence-Conditioned Internal Coordinate Diffusion](https://arxiv.org/abs/2305.19800).

![cover](assets/overview.png)

## Requirements

To install requirements:

```setup

conda env create -f environment.yaml

conda activate ringer

pip install -e .

```

## Data

Download and extract the CREMP pickle.tar.gz from [here](https://doi.org/10.5281/zenodo.7931444). Use [train.csv](data/cremp/train.csv) and [test.csv](data/cremp/test.csv) to partition it into training and test data and put the corresponding files into [train](data/cremp/train) and [test](data/cremp/test).

## Training

To train the full conditional model, run this command:

```train

train conditional.json

```

The config file can be specified by an absolute path or by a path relative to the [configs](configs) folder. Similarly, within the config file, `data_dir` can be an absolute path or a path relative to the [data](data) folder.

To log a training run with Weights & Biases, set up your configuration in [configs/wandb/wandb.json](configs/wandb/wandb.json) and set up logging using:

```train

train conditional.json --wandb-run 

```

## Sampling

The [pre-trained model](assets/models/conditional) is included in this repository.

To generate samples for the CREMP test set, run:

```eval

evaluate \

    --model-dir assets/models/conditional \

    --data-dir cremp/test \

    --split-sizes 0.0 0.0 1.0 \

    --sample-only

```

This creates a `sample` directory containing samples for all molecules in `sample/samples.pickle`.

Run `evaluate --help` to see all options available for sampling and evaluation.

## Reconstruction

The `evaluate` command can also be used to reconstruct backbones (not including side chains) and to compute evaluation metrics. However, it is not recommended to do so because `evaluate` does not parallelize well across molecules.

Instead, reconstruction (including side chains) is done most effectively for each molecule individually using [scripts/reconstruct_single.py](scripts/reconstruct_single.py). Parallelization can then be efficiently achieved by submitting a batch job array using an HPC job scheduler (e.g., Slurm) and passing the job array index as the first argument to the script. To reconstruct molecule 0, run:

```shell

python scripts/reconstruct_single.py 0 \

    cremp/test \

    sample/samples.pickle \

    sample/reconstructed_mols \

    assets/models/conditional/training_mean_distances.json

```

The script will run the optimization to reconstruct the ring coordinates, followed by a linear (NeRF) reconstruction of the side chains using the [conformer samples previously generated](#sampling), and save the resulting molecule in `sample/reconstructed_mols`. Note that even though we point the script to `cremp/test`, it only uses the atom identities and connectivity information from the test molecules; their geometries are entirely set during the reconstruction procedure.

Run `python scripts/reconstruct_single.py --help` for an overview of other parameters available for reconstruction.

## Evaluation

As with reconstruction, computing metrics is best done separately for each molecule using [scripts/compute_metrics_single.py](scripts/compute_metrics_single.py) followed by aggregation across molecules using [scripts/aggregate_metrics.py](scripts/aggregate_metrics.py). For example, to compute metrics for the `H.A.S.V` macrocycle, run

```shell

python scripts/compute_metrics_single.py \

    cremp/test/H.A.S.V.pickle \

    sample/reconstructed_mols/H.A.S.V.pickle

```

Run `python scripts/compute_metrics_single.py --help` and `python scripts/aggregate_metrics.py --help` for an overview of other parameters available for computing metrics.

## Contributing

Install pre-commit hooks to use automated code formatting before committing changes. Make sure you're in the top-level directory and run:

```bash

pre-commit install

```

After that, your code will be automatically reformatted on every new commit.

To manually reformat all files in the project, use:

```bash

pre-commit run -a

```

To update the hooks defined in [.pre-commit-config.yaml](.pre-commit-config.yaml), use:

```bash

pre-commit autoupdate

```

## License

Licensed under the MIT License. See [LICENSE](LICENSE) for additional details.

## Citations

For the code and/or model, please cite:

```

@misc{grambow2023ringer,

    title={{RINGER}: Rapid Conformer Generation for Macrocycles with Sequence-Conditioned Internal Coordinate Diffusion}, 

    author={Colin A. Grambow and Hayley Weir and Nathaniel L. Diamant and Alex M. Tseng and Tommaso Biancalani and Gabriele Scalia and Kangway V. Chuang},

    year={2023},

    eprint={2305.19800},

    archivePrefix={arXiv},

    primaryClass={q-bio.BM}

}

```

To cite the CREMP dataset, please use:

```

@article{grambow2024cremp,

    title = {{CREMP: Conformer-rotamer ensembles of macrocyclic peptides for machine learning}},

    author = {Grambow, Colin A. and Weir, Hayley and Cunningham, Christian N. and Biancalani, Tommaso and Chuang, Kangway V.},

    year = {2024},

    journal = {Scientific Data},

    doi = {10.1038/s41597-024-03698-y},

    pages = {859},

    number = {1},

    volume = {11}

}

```

You can also cite the CREMP Zenodo repository directly:

```

@dataset{grambow_colin_a_2023_7931444,

  author       = {Grambow, Colin A. and

                  Weir, Hayley and

                  Cunningham, Christian N. and

                  Biancalani, Tommaso and

                  Chuang, Kangway V.},

  title        = {{CREMP: Conformer-Rotamer Ensembles of Macrocyclic 

                   Peptides for Machine Learning}},

  month        = may,

  year         = 2023,

  publisher    = {Zenodo},

  version      = {1.0.1},

  doi          = {10.5281/zenodo.7931444},

  url          = {https://doi.org/10.5281/zenodo.7931444}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/genentech/ringer

Awesome Lists containing this project

README