{"id":13958624,"url":"https://github.com/MinkaiXu/GeoDiff","last_synced_at":"2025-07-21T00:31:31.872Z","repository":{"id":41155483,"uuid":"458461908","full_name":"MinkaiXu/GeoDiff","owner":"MinkaiXu","description":"Implementation of GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022).","archived":false,"fork":false,"pushed_at":"2023-05-17T06:31:05.000Z","size":2292,"stargazers_count":359,"open_issues_count":14,"forks_count":79,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-26T06:11:42.756Z","etag":null,"topics":["computational-biology","computational-chemistry","conformation","diffusion-models","generative-models","graph-neural-networks","iclr","iclr2022","molecule","score-matching"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MinkaiXu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-12T08:18:16.000Z","updated_at":"2025-05-21T19:51:57.000Z","dependencies_parsed_at":"2024-09-26T08:41:06.403Z","dependency_job_id":null,"html_url":"https://github.com/MinkaiXu/GeoDiff","commit_stats":{"total_commits":7,"total_committers":1,"mean_commits":7.0,"dds":0.0,"last_synced_commit":"ea0ca48045a2f7abfccd7f0df449e45eb6eae638"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/MinkaiXu/GeoDiff","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MinkaiXu%2FGeoDiff","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MinkaiXu%2FGeoDiff/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MinkaiXu%2FGeoDiff/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MinkaiXu%2FGeoDiff/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MinkaiXu","download_url":"https://codeload.github.com/MinkaiXu/GeoDiff/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MinkaiXu%2FGeoDiff/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266221269,"owners_count":23894966,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computational-biology","computational-chemistry","conformation","diffusion-models","generative-models","graph-neural-networks","iclr","iclr2022","molecule","score-matching"],"created_at":"2024-08-08T13:01:46.749Z","updated_at":"2025-07-21T00:31:26.855Z","avatar_url":"https://github.com/MinkaiXu.png","language":"Python","readme":"# GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/MinkaiXu/GeoDiff/blob/main/LICENSE)\n\n[[OpenReview](https://openreview.net/forum?id=PzcvxEMzvQC)] [[arXiv](https://arxiv.org/abs/2203.02923)] [[Code](https://github.com/MinkaiXu/GeoDiff)]\n\nThe official implementation of GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022 **Oral Presentation [54/3391]**).\n\n![cover](assets/geodiff_framework.png)\n\n## Environments\n\n### Install via Conda (Recommended)\n\n```bash\n# Clone the environment\nconda env create -f env.yml\n# Activate the environment\nconda activate geodiff\n# Install PyG\nconda install pytorch-geometric=1.7.2=py37_torch_1.8.0_cu102 -c rusty1s -c conda-forge\n```\n\n## Dataset\n\n### Offical Dataset\nThe offical raw GEOM dataset is avaiable [[here]](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/JNGTDF).\n\n### Preprocessed dataset\nWe provide the preprocessed datasets (GEOM) in this [[google drive folder]](https://drive.google.com/drive/folders/1b0kNBtck9VNrLRZxg6mckyVUpJA5rBHh?usp=sharing). After downleading the dataset, it should be put into the folder path as specified in the `dataset` variable of config files `./configs/*.yml`.\n\n### Prepare your own GEOM dataset from scratch (optional)\n\nYou can also download origianl GEOM full dataset and prepare your own data split. A guide is available at previous work ConfGF's [[github page]](https://github.com/DeepGraphLearning/ConfGF#prepare-your-own-geom-dataset-from-scratch-optional).\n\n## Training\n\nAll hyper-parameters and training details are provided in config files (`./configs/*.yml`), and free feel to tune these parameters.\n\nYou can train the model with the following commands:\n\n```bash\n# Default settings\npython train.py ./config/qm9_default.yml\npython train.py ./config/drugs_default.yml\n# An ablation setting with fewer timesteps, as described in Appendix D.2.\npython train.py ./config/drugs_1k_default.yml\n```\n\nThe model checkpoints, configuration yaml file as well as training log will be saved into a directory specified by `--logdir` in `train.py`.\n\n## Generation\n\nWe provide the checkpoints of two trained models, i.e., `qm9_default` and `drugs_default` in the [[google drive folder]](https://drive.google.com/drive/folders/1b0kNBtck9VNrLRZxg6mckyVUpJA5rBHh?usp=sharing). Note that, please put the checkpoints `*.pt` into paths like `${log}/${model}/checkpoints/`, and also put corresponding configuration file `*.yml` into the upper level directory `${log}/${model}/`.\n\n\u003cfont color=\"red\"\u003eAttention\u003c/font\u003e: if you want to use pretrained models, please use the code at the [`pretrain`](https://github.com/MinkaiXu/GeoDiff/tree/pretrain) branch, which is the vanilla codebase for reproducing the results with our pretrained models. We recently notice some issue of the codebase and update it, making the `main` branch not compatible well with the previous checkpoints.\n\nYou can generate conformations for entire or part of test sets by:\n\n```bash\npython test.py ${log}/${model}/checkpoints/${iter}.pt \\\n    --start_idx 800 --end_idx 1000\n```\nHere `start_idx` and `end_idx` indicate the range of the test set that we want to use. All hyper-parameters related to sampling can be set in `test.py` files. Specifically, for testing qm9 model, you could add the additional arg `--w_global 0.3`, which empirically shows slightly better results.\n\nConformations of some drug-like molecules generated by GeoDiff are provided below.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/exp_drugs.png\" /\u003e \n\u003c/p\u003e\n\n## Evaluation\n\nAfter generating conformations following the obove commands, the results of all benchmark tasks can be calculated based on the generated data.\n\n### Task 1. Conformation Generation\n\nThe `COV` and `MAT` scores on the GEOM datasets can be calculated using the following commands:\n\n```bash\npython eval_covmat.py ${log}/${model}/${sample}/sample_all.pkl\n```\n\n\n### Task 2. Property Prediction\n\nFor the property prediction, we use a small split of qm9 different from the `Conformation Generation` task. This split is also provided in the [[google drive folder]](https://drive.google.com/drive/folders/1b0kNBtck9VNrLRZxg6mckyVUpJA5rBHh?usp=sharing). Generating conformations and evaluate `mean  absolute errors (MAR)` metric on this split can be done by the following commands:\n\n```bash\npython ${log}/${model}/checkpoints/${iter}.pt --num_confs 50 \\\n      --start_idx 0 --test_set data/GEOM/QM9/qm9_property.pkl\npython eval_prop.py --generated ${log}/${model}/${sample}/sample_all.pkl\n```\n\n## Visualizing molecules with PyMol\n\nHere we also provide a guideline for visualizing molecules with PyMol. The guideline is borrowed from previous work ConfGF's [[github page]](https://github.com/DeepGraphLearning/ConfGF#prepare-your-own-geom-dataset-from-scratch-optional).\n\n### Start Setup\n\n1. `pymol -R`\n2. `Display - Background - White`\n3. `Display - Color Space - CMYK`\n4. `Display - Quality - Maximal Quality`\n5. `Display Grid`\n   1. by object:  use `set grid_slot, int, mol_name` to put the molecule into the corresponding slot\n   2. by state: align all conformations in a single slot\n   3. by object-state: align all conformations and put them in separate slots. (`grid_slot` dont work!)\n6. `Setting - Line and Sticks - Ball and Stick on - Ball and Stick ratio: 1.5`\n7. `Setting - Line and Sticks - Stick radius: 0.2 - Stick Hydrogen Scale: 1.0`\n\n### Show Molecule\n\n1. To show molecules\n\n   1. `hide everything`\n   2. `show sticks`\n\n2. To align molecules: `align name1, name2`\n\n3. Convert RDKit mol to Pymol\n\n   ```python\n   from rdkit.Chem import PyMol\n   v= PyMol.MolViewer()\n   rdmol = Chem.MolFromSmiles('C')\n   v.ShowMol(rdmol, name='mol')\n   v.SaveFile('mol.pkl')\n   ```\n\n\n## Citation\nPlease consider citing the our paper if you find it helpful. Thank you!\n```\n@inproceedings{\nxu2022geodiff,\ntitle={GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation},\nauthor={Minkai Xu and Lantao Yu and Yang Song and Chence Shi and Stefano Ermon and Jian Tang},\nbooktitle={International Conference on Learning Representations},\nyear={2022},\nurl={https://openreview.net/forum?id=PzcvxEMzvQC}\n}\n```\n\n## Acknowledgement\n\nThis repo is built upon the previous work ConfGF's [[codebase]](https://github.com/DeepGraphLearning/ConfGF#prepare-your-own-geom-dataset-from-scratch-optional). Thanks Chence and Shitong!\n\n## Contact\n\nIf you have any question, please contact me at minkai.xu@umontreal.ca or xuminkai@mila.quebec.\n\n## Known issues\n\n1. The current codebase is not compatible with more recent torch-geometric versions.\n2. The current processed dataset (with PyD data object) is not compatible with more recent torch-geometric versions.","funding_links":[],"categories":["分子"],"sub_categories":["网络服务_其他"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMinkaiXu%2FGeoDiff","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMinkaiXu%2FGeoDiff","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMinkaiXu%2FGeoDiff/lists"}