{"id":50918241,"url":"https://github.com/kyonofx/mlcgmd","last_synced_at":"2026-06-16T17:07:51.836Z","repository":{"id":41566139,"uuid":"508519448","full_name":"kyonofx/mlcgmd","owner":"kyonofx","description":"[TMLR 2023] Simulate time-integrated coarse-grained MD with multi-scale graph neural networks","archived":false,"fork":false,"pushed_at":"2023-08-26T21:27:31.000Z","size":49242,"stargazers_count":74,"open_issues_count":2,"forks_count":9,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-03T20:20:29.475Z","etag":null,"topics":["coarse-grained-molecular-dynamics","coarse-graining","graph-neural-networks","molecular-dynamics"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kyonofx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-06-29T02:27:42.000Z","updated_at":"2026-01-02T06:10:17.000Z","dependencies_parsed_at":"2023-01-20T19:02:36.764Z","dependency_job_id":null,"html_url":"https://github.com/kyonofx/mlcgmd","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/kyonofx/mlcgmd","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyonofx%2Fmlcgmd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyonofx%2Fmlcgmd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyonofx%2Fmlcgmd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyonofx%2Fmlcgmd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kyonofx","download_url":"https://codeload.github.com/kyonofx/mlcgmd/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyonofx%2Fmlcgmd/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34415368,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-16T02:00:06.860Z","response_time":126,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["coarse-grained-molecular-dynamics","coarse-graining","graph-neural-networks","molecular-dynamics"],"created_at":"2026-06-16T17:07:50.499Z","updated_at":"2026-06-16T17:07:51.823Z","avatar_url":"https://github.com/kyonofx.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Learning to Simulate Time-integrated Coarse-grained Molecular Dynamics with Multi-scale Graph Networks [TMLR 2023]\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/chain.gif\" width=\"300\"\u003e\n\u003c/p\u003e\n\nThis codebase implements multi-scale GNN simulators for time-integrated CGMD, without using force/energy! This implementation was tested under `Ubuntu 18.04`, `Python 3.8`, `PyTorch 1.11`, and `CUDA 11.3`. Versions of all dependencies can be found in `env.yml`.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/model.png\" /\u003e \n\u003c/p\u003e\n\n[[Paper]](https://openreview.net/forum?id=y8RZoPjEUl) [[Website]](https://xiangfu.co/mlcgmd) [[Video]](https://www.youtube.com/watch?v=l3aGVjQezsc)\n\nif you find this code useful, please consider reference in your paper:\n\n```\n@article{\nfu2023simulate,\ntitle={Simulate Time-integrated Coarse-grained Molecular Dynamics with Multi-scale Graph Networks},\nauthor={Xiang Fu and Tian Xie and Nathan J. Rebello and Bradley Olsen and Tommi S. Jaakkola},\njournal={Transactions on Machine Learning Research},\nissn={2835-8856},\nyear={2023},\nurl={https://openreview.net/forum?id=y8RZoPjEUl},\nnote={}\n}\n``` \n\n## Pretrained model checkpoints\n\n[single-chain CG polymer (param count 1.6M)](./ckpts/chain)\n[solid polymer electrolytes (param count 1.6M)](./ckpts/battery)\n\n\n## Installation\nCreate a conda environment with the required dependencies. This may take a few minutes.\n\n```\nconda env create -f env.yml\n```\n\nActivate the conda environment with:\n\n```\nconda activate mlcgmd\n```\n\nThen install `graphwm` (stands for graph world models) as a package:\n\n```\npip install -e ./\n```\n\n## Prepare the dataset\n\nOur single-chain CG polymer dataset is available from Zenodo.\n\n[single-chain CG polymer dataset](https://zenodo.org/record/6764836#.YrqHNuxKjzd)\n\nThe solid polymer electrolyte dataset is available through [here](https://arxiv.org/abs/2208.01692).\n\n## Configure environment variables\n\nBefore running training/evaluation of the GNN simulator, make a copy of the `.env.template` file and rename it to `.env`. Modify the following environment variables in `.env`, and copy it to `mlcgmd/graphwm/.env`.\n\n- `PROJECT_ROOT`: path to the folder that contains this repo\n- `CHAIN_DATASET_DIR`: path to the single-chain polymer training dataset (50k $\\tau$)\n- `BAT_DATASET_DIR`: path to the battery training dataset (5 ns)\n- `CHAIN_TEST_DATASET_DIR`: \"/scratch/xiangfu/polymer_test\" (used as initialization for testing)\n- `BAT_TEST_DATASET_DIR`: path to the battery evaluation dataset (50 ns)\n- `MODEL_DIR`: path to save model checkpoints\n\n## Logging with Weights and Biases (`wandb`)\n\nWe recommend logging with `wandb` and it is used by default. You need to have a wandb account and log in with `wandb init`. More details at [https://wandb.ai/](https://wandb.ai/).\n\n## Train a CGMD simulator\n\nThe training configurations, including default hyperparameters can be found at [graphwm/conf](./graphwm/conf). These hyperparameters produce the results reported in our paper, but may not be optimal as we did not do extensive tuning. We trained all models with a single GPU and it takes ~1 day for the single-chain polymer dataset and 7-10 days for the battery dataset. Multi-GPU training is available (cf. [Tips](https://github.com/kyonofx/mlcgmd/tree/main#tips)) and will likely reduce training time.\n\nTrain a model with the [default configurations for the single-chain polymer dataset](./graphwm/conf/train.yaml) with the command:\n\n```\npython train.py\n```\n\nFor the [battery dataset](./graphwm/conf/train_battery.yaml), use:\n\n```\npython train.py --config-name train_battery\n```\n\nWe use `hydra` for config management. Command-line argument can be passed in conveniently. For example, if you want to a higher radius cut-off of `9.0`, with the battery dataset, simply do:\n\n```\npython train.py --config-name train_battery model.radius=9\n```\n\nFind out more about hydra at [https://hydra.cc/docs/intro/](https://hydra.cc/docs/intro/).\n\n## Simulation using the learned simulator\n\nWith a trained model saved at `MODEL_DIR/chain_gns` (or change the `model_dir` argument in the evaluation config file), run simulation for the [single-chain polymer dataset](./graphwm/conf/eval.yaml) with the command:\n\n```\npython eval.py\n```\n\nFor the battery dataset, run:\n\n```\npython eval.py --config-name eval_battery\n```\n\nNote that the simulation code assumes your model is saved as `{data.name}_{model.name}*`. The rollout trajectories are saved as a torch pickle file. Simulation efficiency is maximized when using a large batch size to parallelize the simulation of many systems on a single GPU. Simulating all 40 testing class-II polymers for 5M τ using a single RTX 2080 Ti GPU takes roughly 2.6 hours. Simulating all 50 testing batteries for 50 ns using one single RTX 2080 Ti GPU takes roughly 4.6 hours.\n\nThe `ld_kwargs` in the config file controls the inference process of the score-based refinement module. They are only used with the `PnR` model class.\n\n## Tips\n\n- Training CGMD simulators is data I/O intensive. Training speed will be greatly improved with a faster file system. For example, local drive is usually a lot faster than NFS/AFS. \n- The hyperparameter `model.cg_level` controls how many atoms are grouped into a coarse-grained bead. We use METIS for coarse-graining -- this algorithm tries to make the number of atoms assigned to each CG-bead equal. But this may not be achieved as atoms not connected by a chemical bond are never grouped together.  If `model.cg_level=1`, coarse-graining is turned off.\n- multi-gpu training can be turned on by setting `train.pl_trainer.gpus=X`, where `X` is the number of GPUs.\n- The hyperparameter `model.dilation` controls the time-integration step. It specifies the number of **recorded steps** that the ML simulator predicts over in a single step. More information about the length of recorded steps is in the next section.\n\n## More about the datasets \n\nThe single-chain coarse-grained polymer in implicit solvent dataset is adapted from the paper: [Targeted sequence design within the coarse-grained polymer genome](https://www.science.org/doi/10.1126/sciadv.abc6216), and the battery dataset is adapted from the paper: [Accelerating amorphous polymer electrolyte screening by learning to reduce errors in molecular dynamics simulated properties](https://arxiv.org/abs/2101.05339). Please find the simulation details of the datasets in these papers, and consider citing the respective papers if you use the datasets. \n\nThe recording frequency for the single-chain polymer is 5 τ. for the training set and 500 τ for the test set. The timestep used in the LAMMPS simulation is 0.01 τ. Our default config uses `dilation=1`, so one step of our learned simulator is 5 τ, which is as long as 500 steps in the LAMMPS simulation.\n\nThe recording frequency for the battery dataset is 2 ps for both the training and the test sets. The integrator used in the LAMMPS simulation is a rRESPA multi-timescale integrator with an outer timestep of 2 fs for non-bonded interactions, and an inner timestep of 0.5 fs. Our default config uses `dilation=100`, so one step of our learned simulator is 0.2 ns, which is as long as $10^5$ steps in the LAMMPS simulation.\n\nThe orginal MD trajectories were simulated using [LAMMPS](https://www.lammps.org). Under [graphwm/preprocess](./graphwm/preprocess) you can find the scripts for preprocessing the raw LAMMPS dump to the `.h5` files that are used for our learned simulators. To use the preprocessing functionality, `mdtraj` needs to be installed through: `pip install mdtraj`.\n\n## Related repos\n\n- [nn-template](https://github.com/grok-ai/nn-template)\n- [DeepMind implementation of GNS](https://github.com/deepmind/deepmind-research/tree/master/learning_to_simulate)\n- [PyG](https://github.com/pyg-team/pytorch_geometric)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyonofx%2Fmlcgmd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkyonofx%2Fmlcgmd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyonofx%2Fmlcgmd/lists"}