{"id":47995500,"url":"https://github.com/ramanathanlab/deepdrivemd","last_synced_at":"2026-04-04T11:54:48.201Z","repository":{"id":107045118,"uuid":"557053349","full_name":"ramanathanlab/deepdrivemd","owner":"ramanathanlab","description":"DeepDriveMD implemented with Colmena","archived":false,"fork":false,"pushed_at":"2024-03-26T23:55:59.000Z","size":514,"stargazers_count":5,"open_issues_count":5,"forks_count":4,"subscribers_count":0,"default_branch":"main","last_synced_at":"2024-03-27T00:37:46.206Z","etag":null,"topics":["biophysics","deep-learning","machine-learning","python","simulation","workflows"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ramanathanlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-10-25T01:57:39.000Z","updated_at":"2023-05-17T19:36:10.000Z","dependencies_parsed_at":"2024-03-15T00:25:48.144Z","dependency_job_id":null,"html_url":"https://github.com/ramanathanlab/deepdrivemd","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/ramanathanlab/deepdrivemd","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ramanathanlab%2Fdeepdrivemd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ramanathanlab%2Fdeepdrivemd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ramanathanlab%2Fdeepdrivemd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ramanathanlab%2Fdeepdrivemd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ramanathanlab","download_url":"https://codeload.github.com/ramanathanlab/deepdrivemd/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ramanathanlab%2Fdeepdrivemd/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31398770,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["biophysics","deep-learning","machine-learning","python","simulation","workflows"],"created_at":"2026-04-04T11:54:47.411Z","updated_at":"2026-04-04T11:54:48.193Z","avatar_url":"https://github.com/ramanathanlab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DeepDriveMD: Coupling streaming AI and HPC ensembles to achieve 100-1000× faster biomolecular simulations\n[DeepDriveMD](https://github.com/DeepDriveMD/DeepDriveMD-pipeline) implemented using [Colmena](https://colmena.readthedocs.io/en/latest/).\n\nThis implementation of DeepDriveMD enables ML/AI-coupled simulations using three primary components. _Simulation_: Simulations are used to explore possible trajectories of a protein or other biomolecular system; _Training_: Aggregated trajectories are used to train one or more ML models. _Inference_: Trained ML models are used to identify conformations for subsequent iterations of simulations. A _Thinker_ process orchestrates these components to advance the workflow toward an optimization objective.\n\n![DeepDriveMD-Colmena](https://github.com/ramanathanlab/deepdrivemd/assets/38300604/60971c79-b2a5-43fc-b744-9f97beb2e297)\n\n## Table of Contents\n1. [Installation](#installation)\n2. [Usage](#usage)\n3. [Contributing](#contributing)\n4. [License](#license)\n5. [Citations](#citations)\n\n## Installation\n\nCreate a conda environment\n```console\nconda create -n deepdrivemd python=3.9 -y\nconda activate deepdrivemd\n```\n\nTo install OpenMM for simulations:\n```console\nconda install -c conda-forge gcc=12.1.0 -y\nconda install -c conda-forge openmm -y\n```\n\nTo install `deepdrivemd`:\n```console\ngit clone https://github.com/ramanathanlab/deepdrivemd.git\ncd deepdrivemd\nmake install\n```\n\n## Usage\n\nThe workflow can be tested on a workstation (a system with a few GPUs) via:\n```console\npython -m deepdrivemd.workflows.openmm_cvae -c tests/apps-enabled-workstation/test.yaml\n```\nThis will generate an output directory for the run with logs, results, and task specific output folders.\n\nEach test will write a timestamped experiment output directory to the `runs/` directory.\n\nInside the output directory, you will find:\n```console\n$ ls runs/experiment-170323-091525/\ninference  params.yaml  result  run-info  runtime.log  simulation  train\n```\n- `params.yaml`: the full configuration file (default parameters included)\n- `runtime.log`: the workflow log\n- `result`: a directory containing JSON files `simulation.json`, `train.json`, `inference.json` which log task results including success or failure, potential error messages, runtime statistics. This can be helpful for debugging application-level failures.\n- `simulation`, `train`, `inference`: output directories each containing subdirectories `run-\u003cuuid\u003e` for each submitted task. This is where the output files of your simulations, preprocessed data, model weights, etc will be written by your applications (it corresponds to the application workdir).\n- `run-info`: Parsl logs\n\nAn example, the simulation run directories may look like:\n```console\n$ ls runs/experiment-170323-091525/simulation/run-08843adb-65e1-47f0-b0f8-34821aa45923:\n1FME-unfolded.pdb  contact_map.npy  input.yaml  output.yaml  rmsd.npy  sim.dcd  sim.log\n```\n- `1FME-unfolded.pdb` the PDB file used to start the simulation\n- `contact_map.npy`, `rmsd.npy`: the preprocessed data files which will be input into the train and inference tasks\n- `input.yaml`, `output.yaml`: These simply log the task function input and return values, they are helpful for debugging but are not strtictly necessary\n- `sim.dcd`: the simulation trajectory file containing all the coordinate frames\n- `sim.log`: a simulation log detailing the energy, steps taken, ns/day, etc\n\nBy default the `runs/` directory is ignored by git.\n\nProduction runs can be configured and run analogously. See `examples/bba-folding-workstation/` for a detailed example of folding the [1FME](https://www.rcsb.org/structure/1FME) protein. **The YAML files document the configuration settings and explain the use case**.\n\n### Software Interface\n\nImplement a DeepDriveMD workflow with custom MD simulation engines, and AI training/inference methods by inherting from the `DeepDriveMDWorkflow` interface. This workflow implments the `examples/bba-folding-workstation/` example:\n```python\nfrom deepdrivemd.api import DeepDriveMDWorkflow\n\nclass DeepDriveMD_OpenMM_CVAE(DeepDriveMDWorkflow):\n    def __init__(\n        self, simulations_per_train: int, simulations_per_inference: int, **kwargs: Any\n    ) -\u003e None:\n        super().__init__(**kwargs)\n        self.simulations_per_train = simulations_per_train\n        self.simulations_per_inference = simulations_per_inference\n\n        # Make sure there has been at least one training task \n        # complete before running inference\n        self.model_weights_available: bool = False\n\n        # For batching training/inference inputs\n        self.train_input = CVAETrainInput(contact_map_paths=[], rmsd_paths=[])\n        self.inference_input = CVAEInferenceInput(\n            contact_map_paths=[], rmsd_paths=[], model_weight_path=Path()\n        )\n\n        # Communicate results between agents\n        self.simulation_input_queue: Queue[MDSimulationInput] = Queue()\n\n    def simulate(self) -\u003e None:\n        \"\"\"Submit either a new outlier to simulate, or a starting conformer.\"\"\"\n        with self.simulation_govenor:\n            if not self.simulation_input_queue.empty():\n                inputs = self.simulation_input_queue.get()\n            else:\n                inputs = MDSimulationInput(sim_dir=next(self.simulation_input_dirs))\n\n        self.submit_task(\"simulation\", inputs)\n\n    def train(self) -\u003e None:\n        \"\"\"Submit a new training task.\"\"\"\n        self.submit_task(\"train\", self.train_input)\n\n    def inference(self) -\u003e None:\n        \"\"\"Submit a new inference task once model weights are available.\"\"\"\n        while not self.model_weights_available:\n            time.sleep(1)\n\n        self.submit_task(\"inference\", self.inference_input)\n\n    def handle_simulation_output(self, output: MDSimulationOutput) -\u003e None:\n        \"\"\"When a simulation finishes, decide to train a new model or infer outliers.\"\"\"\n        # Collect simulation results\n        self.train_input.append(output.contact_map_path, output.rmsd_path)\n        self.inference_input.append(output.contact_map_path, output.rmsd_path)\n\n        # Signal train/inference tasks\n        num_sims = len(self.train_input)\n        if num_sims % self.simulations_per_train == 0:\n            self.run_training.set()\n\n        if num_sims % self.simulations_per_inference == 0:\n            self.run_inference.set()\n\n    def handle_train_output(self, output: CVAETrainOutput) -\u003e None:\n        \"\"\"When training finishes, update the model weights to use for inference.\"\"\"\n        self.inference_input.model_weight_path = output.model_weight_path\n        self.model_weights_available = True\n\n    def handle_inference_output(self, output: CVAEInferenceOutput) -\u003e None:\n        \"\"\"When inference finishes, update the simulation queue with the latest outliers.\"\"\"\n        with self.simulation_govenor:\n            self.simulation_input_queue.queue.clear() # Remove old outliers\n            for sim_dir, sim_frame in zip(output.sim_dirs, output.sim_frames):\n                self.simulation_input_queue.put(\n                    MDSimulationInput(sim_dir=sim_dir, sim_frame=sim_frame)\n                )\n```\n\n## Contributing\n\nPlease report **bugs**, **enhancement requests**, or **questions** through the [Issue Tracker](https://github.com/ramanathanlab/deepdrivemd/issues).\n\nIf you are looking to contribute, please see [`CONTRIBUTING.md`](https://github.com/ramanathanlab/deepdrivemd/blob/main/CONTRIBUTING.md).\n\n## License\n\nDeepDriveMD has a MIT license, as seen in the [`LICENSE.md`](https://github.com/ramanathanlab/deepdrivemd/blob/main/LICENSE.md) file.\n\n## Citations\n\nIf you use DeepDriveMD in your research, please cite this paper:\n\n```bibtex\n@inproceedings{brace2022coupling,\n  title={Coupling streaming ai and hpc ensembles to achieve 100--1000$\\times$ faster biomolecular simulations},\n  author={Brace, Alexander and Yakushin, Igor and Ma, Heng and Trifan, Anda and Munson, Todd and Foster, Ian and Ramanathan, Arvind and Lee, Hyungro and Turilli, Matteo and Jha, Shantenu},\n  booktitle={2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)},\n  pages={806--816},\n  year={2022},\n  organization={IEEE}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Framanathanlab%2Fdeepdrivemd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Framanathanlab%2Fdeepdrivemd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Framanathanlab%2Fdeepdrivemd/lists"}