https://github.com/ramanathanlab/deepdrivemd

DeepDriveMD implemented with Colmena
https://github.com/ramanathanlab/deepdrivemd

biophysics deep-learning machine-learning python simulation workflows

Last synced: 3 months ago
JSON representation

DeepDriveMD implemented with Colmena

Host: GitHub
URL: https://github.com/ramanathanlab/deepdrivemd
Owner: ramanathanlab
License: mit
Created: 2022-10-25T01:57:39.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2024-03-26T23:55:59.000Z (about 2 years ago)
Last Synced: 2024-03-27T00:37:46.206Z (about 2 years ago)
Topics: biophysics, deep-learning, machine-learning, python, simulation, workflows
Language: Python
Homepage:
Size: 502 KB
Stars: 5
Watchers: 0
Forks: 4
Open Issues: 5
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          # DeepDriveMD: Coupling streaming AI and HPC ensembles to achieve 100-1000× faster biomolecular simulations

[DeepDriveMD](https://github.com/DeepDriveMD/DeepDriveMD-pipeline) implemented using [Colmena](https://colmena.readthedocs.io/en/latest/).

This implementation of DeepDriveMD enables ML/AI-coupled simulations using three primary components. _Simulation_: Simulations are used to explore possible trajectories of a protein or other biomolecular system; _Training_: Aggregated trajectories are used to train one or more ML models. _Inference_: Trained ML models are used to identify conformations for subsequent iterations of simulations. A _Thinker_ process orchestrates these components to advance the workflow toward an optimization objective.

![DeepDriveMD-Colmena](https://github.com/ramanathanlab/deepdrivemd/assets/38300604/60971c79-b2a5-43fc-b744-9f97beb2e297)

## Table of Contents

1. [Installation](#installation)

2. [Usage](#usage)

3. [Contributing](#contributing)

4. [License](#license)

5. [Citations](#citations)

## Installation

Create a conda environment

```console

conda create -n deepdrivemd python=3.9 -y

conda activate deepdrivemd

```

To install OpenMM for simulations:

```console

conda install -c conda-forge gcc=12.1.0 -y

conda install -c conda-forge openmm -y

```

To install `deepdrivemd`:

```console

git clone https://github.com/ramanathanlab/deepdrivemd.git

cd deepdrivemd

make install

```

## Usage

The workflow can be tested on a workstation (a system with a few GPUs) via:

```console

python -m deepdrivemd.workflows.openmm_cvae -c tests/apps-enabled-workstation/test.yaml

```

This will generate an output directory for the run with logs, results, and task specific output folders.

Each test will write a timestamped experiment output directory to the `runs/` directory.

Inside the output directory, you will find:

```console

$ ls runs/experiment-170323-091525/

inference  params.yaml  result  run-info  runtime.log  simulation  train

```

- `params.yaml`: the full configuration file (default parameters included)

- `runtime.log`: the workflow log

- `result`: a directory containing JSON files `simulation.json`, `train.json`, `inference.json` which log task results including success or failure, potential error messages, runtime statistics. This can be helpful for debugging application-level failures.

- `simulation`, `train`, `inference`: output directories each containing subdirectories `run-` for each submitted task. This is where the output files of your simulations, preprocessed data, model weights, etc will be written by your applications (it corresponds to the application workdir).

- `run-info`: Parsl logs

An example, the simulation run directories may look like:

```console

$ ls runs/experiment-170323-091525/simulation/run-08843adb-65e1-47f0-b0f8-34821aa45923:

1FME-unfolded.pdb  contact_map.npy  input.yaml  output.yaml  rmsd.npy  sim.dcd  sim.log

```

- `1FME-unfolded.pdb` the PDB file used to start the simulation

- `contact_map.npy`, `rmsd.npy`: the preprocessed data files which will be input into the train and inference tasks

- `input.yaml`, `output.yaml`: These simply log the task function input and return values, they are helpful for debugging but are not strtictly necessary

- `sim.dcd`: the simulation trajectory file containing all the coordinate frames

- `sim.log`: a simulation log detailing the energy, steps taken, ns/day, etc

By default the `runs/` directory is ignored by git.

Production runs can be configured and run analogously. See `examples/bba-folding-workstation/` for a detailed example of folding the [1FME](https://www.rcsb.org/structure/1FME) protein. **The YAML files document the configuration settings and explain the use case**.

### Software Interface

Implement a DeepDriveMD workflow with custom MD simulation engines, and AI training/inference methods by inherting from the `DeepDriveMDWorkflow` interface. This workflow implments the `examples/bba-folding-workstation/` example:

```python

from deepdrivemd.api import DeepDriveMDWorkflow

class DeepDriveMD_OpenMM_CVAE(DeepDriveMDWorkflow):

    def __init__(

        self, simulations_per_train: int, simulations_per_inference: int, **kwargs: Any

    ) -> None:

        super().__init__(**kwargs)

        self.simulations_per_train = simulations_per_train

        self.simulations_per_inference = simulations_per_inference

        # Make sure there has been at least one training task 

        # complete before running inference

        self.model_weights_available: bool = False

        # For batching training/inference inputs

        self.train_input = CVAETrainInput(contact_map_paths=[], rmsd_paths=[])

        self.inference_input = CVAEInferenceInput(

            contact_map_paths=[], rmsd_paths=[], model_weight_path=Path()

        )

        # Communicate results between agents

        self.simulation_input_queue: Queue[MDSimulationInput] = Queue()

    def simulate(self) -> None:

        """Submit either a new outlier to simulate, or a starting conformer."""

        with self.simulation_govenor:

            if not self.simulation_input_queue.empty():

                inputs = self.simulation_input_queue.get()

            else:

                inputs = MDSimulationInput(sim_dir=next(self.simulation_input_dirs))

        self.submit_task("simulation", inputs)

    def train(self) -> None:

        """Submit a new training task."""

        self.submit_task("train", self.train_input)

    def inference(self) -> None:

        """Submit a new inference task once model weights are available."""

        while not self.model_weights_available:

            time.sleep(1)

        self.submit_task("inference", self.inference_input)

    def handle_simulation_output(self, output: MDSimulationOutput) -> None:

        """When a simulation finishes, decide to train a new model or infer outliers."""

        # Collect simulation results

        self.train_input.append(output.contact_map_path, output.rmsd_path)

        self.inference_input.append(output.contact_map_path, output.rmsd_path)

        # Signal train/inference tasks

        num_sims = len(self.train_input)

        if num_sims % self.simulations_per_train == 0:

            self.run_training.set()

        if num_sims % self.simulations_per_inference == 0:

            self.run_inference.set()

    def handle_train_output(self, output: CVAETrainOutput) -> None:

        """When training finishes, update the model weights to use for inference."""

        self.inference_input.model_weight_path = output.model_weight_path

        self.model_weights_available = True

    def handle_inference_output(self, output: CVAEInferenceOutput) -> None:

        """When inference finishes, update the simulation queue with the latest outliers."""

        with self.simulation_govenor:

            self.simulation_input_queue.queue.clear() # Remove old outliers

            for sim_dir, sim_frame in zip(output.sim_dirs, output.sim_frames):

                self.simulation_input_queue.put(

                    MDSimulationInput(sim_dir=sim_dir, sim_frame=sim_frame)

                )

```

## Contributing

Please report **bugs**, **enhancement requests**, or **questions** through the [Issue Tracker](https://github.com/ramanathanlab/deepdrivemd/issues).

If you are looking to contribute, please see [`CONTRIBUTING.md`](https://github.com/ramanathanlab/deepdrivemd/blob/main/CONTRIBUTING.md).

## License

DeepDriveMD has a MIT license, as seen in the [`LICENSE.md`](https://github.com/ramanathanlab/deepdrivemd/blob/main/LICENSE.md) file.

## Citations

If you use DeepDriveMD in your research, please cite this paper:

```bibtex

@inproceedings{brace2022coupling,

  title={Coupling streaming ai and hpc ensembles to achieve 100--1000$\times$ faster biomolecular simulations},

  author={Brace, Alexander and Yakushin, Igor and Ma, Heng and Trifan, Anda and Munson, Todd and Foster, Ian and Ramanathan, Arvind and Lee, Hyungro and Turilli, Matteo and Jha, Shantenu},

  booktitle={2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)},

  pages={806--816},

  year={2022},

  organization={IEEE}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ramanathanlab/deepdrivemd

Awesome Lists containing this project

README