{"id":15140766,"url":"https://github.com/chao1224/moleculesde","last_synced_at":"2025-07-03T23:33:31.083Z","repository":{"id":169521828,"uuid":"632180756","full_name":"chao1224/MoleculeSDE","owner":"chao1224","description":"A Group Symmetric Stochastic Differential Equation Model for Molecule Multi-modal Pretraining, ICML'23","archived":false,"fork":false,"pushed_at":"2024-03-29T14:22:24.000Z","size":23631,"stargazers_count":32,"open_issues_count":3,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-06T18:52:11.838Z","etag":null,"topics":["conformation","diffusion","generation","geometry","group-equivariant-neural-network","molecule","pretraining","reflection-antisymmetric","representation","sde","stochastic-differential-equation"],"latest_commit_sha":null,"homepage":"https://chao1224.github.io/MoleculeSDE","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chao1224.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-24T22:04:18.000Z","updated_at":"2025-02-10T09:52:22.000Z","dependencies_parsed_at":"2024-02-22T00:27:26.230Z","dependency_job_id":"6beb9653-9c48-4467-8acb-8f91c8078994","html_url":"https://github.com/chao1224/MoleculeSDE","commit_stats":{"total_commits":6,"total_committers":1,"mean_commits":6.0,"dds":0.0,"last_synced_commit":"06f84c873ba8196e32b4fd4f7436c2bbc3511915"},"previous_names":["chao1224/moleculesde"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/chao1224/MoleculeSDE","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chao1224%2FMoleculeSDE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chao1224%2FMoleculeSDE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chao1224%2FMoleculeSDE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chao1224%2FMoleculeSDE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chao1224","download_url":"https://codeload.github.com/chao1224/MoleculeSDE/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chao1224%2FMoleculeSDE/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263421460,"owners_count":23464012,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conformation","diffusion","generation","geometry","group-equivariant-neural-network","molecule","pretraining","reflection-antisymmetric","representation","sde","stochastic-differential-equation"],"created_at":"2024-09-26T08:40:57.140Z","updated_at":"2025-07-03T23:33:31.014Z","avatar_url":"https://github.com/chao1224.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# A Group Symmetric Stochastic Differential Equation Model for Molecule Multi-modal Pretraining\n\n**ICML 2023**\n\nShengchao Liu\u003csup\u003e+\u003c/sup\u003e, Weitao Du\u003csup\u003e+\u003c/sup\u003e, Zhiming Ma, Hongyu Guo, Jian Tang\n\n\u003csup\u003e+\u003c/sup\u003e Equal contribution\n\n[[Project Page](https://chao1224.github.io/MoleculeSDE)]\n[[Paper](https://proceedings.mlr.press/v202/liu23h.html)]\n[[ArXiv](https://arxiv.org/abs/2305.18407)]\n[[Checkpoints on HuggingFace](https://huggingface.co/chao1224/MoleculeSDE/tree/main)]\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"figure/pipeline.png\" /\u003e \n\u003c/p\u003e\n\n- MoleculeSDE is GraphMVPv2, follow-up of GraphMVP\n- It includes two components:\n    - Contrastive learning\n    - Generative learning:\n        - One 2D-\u003e3D diffusion model. Frame-based SE(3)-equivariant and reflection anti-symmetric model\n        - One 3D-\u003e2D diffusion model. SE(3)-invariant.\n\nAll the pretrained checkpoints are available on [this HuggingFace link](https://huggingface.co/chao1224/MoleculeSDE/tree/main).\nYou can find detailed mapping between checkpoints and tables in file `README_checkpoints.md`.\n\n\u003cp align=\"left\"\u003e\n  \u003cimg src=\"figure/demo.gif\" width=\"100%\" /\u003e \n\u003c/p\u003e\n\n\n## Environments\n```bash\nconda create -n Geom3D python=3.7\nconda activate Geom3D\nconda install -y -c rdkit rdkit\nconda install -y numpy networkx scikit-learn\nconda install -y -c conda-forge -c pytorch pytorch=1.9.1\nconda install -y -c pyg -c conda-forge pyg=2.0.2\npip install ogb==1.2.1\n\npip install sympy\n\npip install ase  # for SchNet\n\npip intall -e .\n```\n\n## Datasets\n\n- For PCQM4Mv2 (pretraining) dataset\n  - Download the dataset from [PCQM4Mv2 website](https://ogb.stanford.edu/docs/lsc/pcqm4mv2/) under folder `data/PCQM4Mv2/raw`:\n    ```\n      .\n    ├── data\n    │   └── PCQM4Mv2\n    │       └── raw\n    │           ├── data.csv\n    │           ├── data.csv.gz\n    │           ├── pcqm4m-v2-train.sdf\n    │           └── pcqm4m-v2-train.sdf.tar.gz\n    ```\n  - Then run `examples/generate_PCQM4Mv2.py`.\n- For QM9, it is automatically downloaded in pyg class. The default path is `data/molecule_datasets/QM9`.\n- For MD17, it is automatically downloaded in pyg class. The default path is `data/MD17`.\n- For MoleculeNet, please follow [GraphMVP instructions](https://github.com/chao1224/GraphMVP). The dataset structure is:\n  ```\n    .\n  ├── data\n  │   ├── molecule_datasets\n  │   │   ├── bace\n  │   │   │   ├── BACE_README\n  │   │   │   └── raw\n  │   │   │       └── bace.csv\n  │   │   ├── bbbp\n  ...............\n  ```\n\n## Pretraining\n\nA quick demo on pretraining is:\n```\ncd examples\n\npython pretrain_MoleculeSDE.py \\\n--verbose --input_data_dir=../data --dataset=PCQM4Mv2 \\\n--model_3d=SchNet \\\n--lr=1e-4 --epochs=50 --num_workers=0 --batch_size=256 --SSL_masking_ratio=0 --gnn_3d_lr_scale=0.1 --dropout_ratio=0 --graph_pooling=mean --emb_dim=300 --epochs=1 \\\n--SDE_coeff_contrastive=1 --CL_similarity_metric=EBM_node_dot_prod --T=0.1 --normalize --SDE_coeff_contrastive_skip_epochs=0 \\\n--SDE_coeff_generative_2Dto3D=1 --SDE_2Dto3D_model=SDEModel2Dto3D_02 --SDE_type_2Dto3D=VE --use_extend_graph \\\n--SDE_coeff_generative_3Dto2D=1 --SDE_3Dto2D_model=SDEModel3Dto2D_node_adj_dense --SDE_type_3Dto2D=VE --noise_on_one_hot \\\n--output_model_dir=[MODEL_DIR]\n```\n\n**Notice** that the `[MODEL_DIR]` is where you are going to save your models/checkpoints.\n\n## Downstream\n\nThe downstream scripts can be found under the `examples` folder. Below we illustrate few simple examples.\n- `finetune_MoleculeNet.py`:\n  ```\n  python finetune_MoleculeNet.py \\\n  --dataset=tox21 \\\n  --input_model_file=[MODEL_DIR]/model_complete.pth\n  ```\n- `finetune_QM9.py`: \n  ```\n  python finetune_QM9.py \\\n  --dataset=QM9 --task=gap \\\n  --model_3d=SchNet \\\n  --input_model_file=[MODEL_DIR]/model_complete.pth\n  ```\n- `finetune_MD17.py`: \n  ```\n  python finetune_MD17.py \\\n  --dataset=MD17 --task=aspirin \\\n  --model_3d=SchNet \\\n  --input_model_file=[MODEL_DIR]/model_complete.pth\n  ```\n\n## Cite Us\n\nFeel free to cite this work if you find it useful to you!\n\n```\n@inproceedings{liu2023group,\n  title={A group symmetric stochastic differential equation model for molecule multi-modal pretraining},\n  author={Liu, Shengchao and Du, Weitao and Ma, Zhi-Ming and Guo, Hongyu and Tang, Jian},\n  booktitle={International Conference on Machine Learning},\n  pages={21497--21526},\n  year={2023},\n  organization={PMLR}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchao1224%2Fmoleculesde","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchao1224%2Fmoleculesde","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchao1224%2Fmoleculesde/lists"}