{"id":21573674,"url":"https://github.com/yuyangw/denoise-pretrain-ml-potential","last_synced_at":"2025-07-07T08:10:02.733Z","repository":{"id":183880196,"uuid":"606105698","full_name":"yuyangw/Denoise-Pretrain-ML-Potential","owner":"yuyangw","description":"Implementation of \"Denoise Pretraining on Non-equilibrium Molecular Conformations for Accurate and Transferable Neural Potentials\" in PyTorch.","archived":false,"fork":false,"pushed_at":"2023-07-26T06:14:01.000Z","size":33359,"stargazers_count":14,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-07-07T08:09:51.592Z","etag":null,"topics":["chemical-physics","dgl","equivariant-representations","graph-neural-networks","molecular-simulation","pytorch","pytorch-geometric","self-supervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yuyangw.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-02-24T16:00:37.000Z","updated_at":"2025-06-26T10:14:43.000Z","dependencies_parsed_at":"2023-09-06T04:15:17.566Z","dependency_job_id":null,"html_url":"https://github.com/yuyangw/Denoise-Pretrain-ML-Potential","commit_stats":null,"previous_names":["yuyangw/denoise-pretrain-ml-potential"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/yuyangw/Denoise-Pretrain-ML-Potential","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yuyangw%2FDenoise-Pretrain-ML-Potential","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yuyangw%2FDenoise-Pretrain-ML-Potential/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yuyangw%2FDenoise-Pretrain-ML-Potential/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yuyangw%2FDenoise-Pretrain-ML-Potential/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yuyangw","download_url":"https://codeload.github.com/yuyangw/Denoise-Pretrain-ML-Potential/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yuyangw%2FDenoise-Pretrain-ML-Potential/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264040938,"owners_count":23548070,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chemical-physics","dgl","equivariant-representations","graph-neural-networks","molecular-simulation","pytorch","pytorch-geometric","self-supervised-learning"],"created_at":"2024-11-24T12:07:41.670Z","updated_at":"2025-07-07T08:10:02.652Z","avatar_url":"https://github.com/yuyangw.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Denoise Pretraining for ML Potentials\n\n\u003cstrong\u003eDenoise Pretraining on Nonequilibrium Molecules for Accurate and Transferable Neural Potentials\u003c/strong\u003e \u003c/br\u003e\n\u003cem\u003eJournal of Chemical Theory and Computation\u003c/em\u003e [[Paper]](https://pubs.acs.org/doi/10.1021/acs.jctc.3c00289) [[arXiv]](https://arxiv.org/abs/2303.02216) [[PDF]](https://pubs.acs.org/doi/epdf/10.1021/acs.jctc.3c00289) \u003c/br\u003e \n[Yuyang Wang](https://yuyangw.github.io/), [Changwen Xu](https://changwenxu98.github.io/), [Zijie Li](https://scholar.google.com/citations?user=ji7TXTMAAAAJ\u0026hl=en\u0026oi=ao), [Amir Barati Farimani](https://www.meche.engineering.cmu.edu/directory/bios/barati-farimani-amir.html) \u003c/br\u003e\nCarnegie Mellon University \u003c/br\u003e\n\n\u003cimg src=\"figs/framework.png\" width=\"460\"\u003e\n\nThis is the official implementation of \"[Denoise Pretraining on Nonequilibrium Molecules for Accurate and Transferable Neural Potentials](https://pubs.acs.org/doi/10.1021/acs.jctc.3c00289)\". In this work, we propose denoise pretraining on non-equilibrium molecular conformations to achieve more accurate and transferable potential predictions with invariant and equivariant graph neural networks (GNNs). If you find our work useful in your research, please cite:\n\n```\n@article{wang2023denoise,\n  title={Denoise Pre-training on Non-equilibrium Molecules for Accurate and Transferable Neural Potentials},\n  author={Wang, Yuyang and Xu, Changwen and Li, Zijie and Barati Farimani, Amir},\n  journal={Journal of Chemical Theory and Computation},\n  doi={10.1021/acs.jctc.3c00289},\n  year={2023}\n}\n```\n\n## Getting Started\n\n1. [Installation](#installation)\n2. [Dataset](#dataset)\n4. [Pre-training](#pretrain)\n5. [Fine-tuning](#finetune)\n6. [Pre-trained models](#models)\n\n### Installation \u003ca name=\"installation\"\u003e\u003c/a\u003e\n\nSet up a conda environment and clone the github repo\n\n```\n# create a new environment\n$ conda create --name ml_potential python=3.8\n$ conda activate ml_potential\n\n# install requirements\n$ conda install pytorch==1.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge\n$ conda install pyg -c pyg\n$ conda install -c dglteam/label/cu116 dgl\n$ conda install -c conda-forge tensorboard openmm\n$ pip install PyYAML rdkit ase\n$ pip install git+https://github.com/AMLab-Amsterdam/lie_learn\n\n# clone the source code\n$ git clone https://github.com/yuyangw/Denoise-Pretrain-ML-Potential.git\n$ cd Denoise-Pretrain-ML-Potential\n```\n\n### Dataset \u003ca name=\"dataset\"\u003e\u003c/a\u003e\n\nThe datasets used in the work are summarized in the following table, including the link to download, number of molecules, number of conformations, number of elements, number of atoms per molecule, molecule types, and whether each dataset is used for pre-training (PT) and fine-tuning (FT). GNNs are pre-trained on the combination of ANI-1 and ANI-1x, and fine-tuned on each dataset separately.\n\n| Dataset | Link | # Mol. | # Conf. | # Ele. | # Atoms | Molecule types | Usage\n| ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- |\n| ANI-1   | [[link]](https://figshare.com/articles/dataset/ANI-1_data_set_20M_DFT_energies_for_non-equilibrium_small_molecules/5287732) | 57,462 | 24,687,809 | 4  | 2~26 | Small molecules | PT \u0026 FT \n| ANI-1x  | [[link]](https://figshare.com/articles/dataset/ANI-1x_Dataset_Release/10047041/1) | 63,865 | 5,496,771  | 4  | 2~63 | Small molecules | PT \u0026 FT |\n| ISO17   | [[link]](http://quantum-machine.org/datasets/) | 129    | 645,000    | 3  | 19 | Isomers of C7O2H10 | FT |\n| MD22    | [[link]](http://www.sgdml.org/#datasets) | 7 | 223,422 | 4 | 42~370 | Proteins, lipids, carbohydrates, nucleic acids, supramolecules | FT |\n| SPICE   | [[link]](https://zenodo.org/record/7338495#.Y_aCx3bMK38) | 19,238 | 1,132,808  | 15 | 3~50 | Small molecules, dimers, dipeptides, solvated amino acids | FT |\n\n### Pre-training \u003ca name=\"pretrain\"\u003e\u003c/a\u003e\n\nTo pre-train the invariant or equivariant GNNs, where the configurations and detailed explaination for each variable can be found in `config_pretrain.yaml`\n```\n$ python pretrain.py\n```\n\nTo monitor the training via tensorboard, run `tensorboard --logdir {PATH}` and click the URL http://127.0.0.1:6006/.\n\n### Fine-tuning  \u003ca name=\"finetune\"\u003e\u003c/a\u003e\n\nTo fine-tune the pre-trained GNN models on molecular potential predictions, where the configurations and detailed explaination for each variable can be found in `config.yaml`\n```\n$ python train.py\n```\n\n### Pre-trained models \u003ca name=\"models\"\u003e\u003c/a\u003e\n\nWe also provide pre-trained checkpoint `model.pth` and the configuration `config_pretrain.yaml` for each model, which can be found in the `ckpt` folder. Pre-trained models include: \n- Pre-trained SchNet in `ckpt/schnet` folder\n- Pre-trained SE(3)-Transformer in `ckpt/se3transformer` folder\n- Pre-trained EGNN in `ckpt/egnn` folder\n- Pre-trained TorchMD-Net in `ckpt/torchmdnet` folder\n\n## Acknowledgement\n\nThe implementation of GNNs in this work is based on:\n- Implementation of SchNet: [kyonofx/MDsim](https://github.com/kyonofx/MDsim/blob/main/mdsim/models/schnet.py) \\\u0026 [PyG](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.models.SchNet.html)\n- Implementation of SE(3)-Transformer: [FabianFuchsML/se3-transformer-public](https://github.com/FabianFuchsML/se3-transformer-public)\n- Implementation of EGNN: [vgsatorras/egnn](https://github.com/vgsatorras/egnn)\n- Implementation of TorchMD-Net: [torchmd/torchmd-net](https://github.com/torchmd/torchmd-net)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyuyangw%2Fdenoise-pretrain-ml-potential","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyuyangw%2Fdenoise-pretrain-ml-potential","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyuyangw%2Fdenoise-pretrain-ml-potential/lists"}