{"id":15140761,"url":"https://github.com/chao1224/graphmvp","last_synced_at":"2025-10-23T17:31:38.256Z","repository":{"id":40495455,"uuid":"414245124","full_name":"chao1224/GraphMVP","owner":"chao1224","description":"Pre-training Molecular Graph Representation with 3D Geometry, ICLR'22 (https://openreview.net/forum?id=xQUe1pOKPam)","archived":false,"fork":false,"pushed_at":"2022-09-20T14:29:48.000Z","size":623,"stargazers_count":176,"open_issues_count":3,"forks_count":22,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-01-15T22:35:10.438Z","etag":null,"topics":["contrastive-learning","generative-model","geometry","graph","molecule","pretraining","self-supervised","self-supervised-learning"],"latest_commit_sha":null,"homepage":"https://chao1224.github.io/GraphMVP","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chao1224.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-10-06T14:27:54.000Z","updated_at":"2024-12-30T05:58:36.000Z","dependencies_parsed_at":"2022-07-10T01:34:17.656Z","dependency_job_id":null,"html_url":"https://github.com/chao1224/GraphMVP","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chao1224%2FGraphMVP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chao1224%2FGraphMVP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chao1224%2FGraphMVP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chao1224%2FGraphMVP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chao1224","download_url":"https://codeload.github.com/chao1224/GraphMVP/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237869045,"owners_count":19379253,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["contrastive-learning","generative-model","geometry","graph","molecule","pretraining","self-supervised","self-supervised-learning"],"created_at":"2024-09-26T08:40:50.859Z","updated_at":"2025-10-23T17:31:37.328Z","avatar_url":"https://github.com/chao1224.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pre-training Molecular Graph Representation with 3D Geometry\n\n**ICLR 2022**\n\nAuthors: Shengchao Liu, Hanchen Wang, Weiyang Liu, Joan Lasenby, Hongyu Guo, Jian Tang\n\n[[Project Page](https://chao1224.github.io/GraphMVP)]\n[[Paper](https://openreview.net/forum?id=xQUe1pOKPam)]\n[[ArXiv](https://arxiv.org/abs/2110.07728)]\n[[Slides](https://drive.google.com/file/d/1-lDWtdgeEgTO009YVPzHK8f7yYbvQ1oY/view?usp=sharing)]\n[[Poster](https://drive.google.com/file/d/1L_XrlgfmCmycfGf47Dt6nnaKpZtiqiN-/view?usp=sharing)]\n\u003cbr\u003e\n[[NeurIPS SSL Workshop 2021](https://sslneurips21.github.io/)]\n[[ICLR GTRL Workshop 2022 (Spotlight)](https://gt-rl.github.io/)]\n\nThis repository provides the source code for the ICLR'22 paper **Pre-training Molecular Graph Representation with 3D Geometry**, with the following task:\n- During pre-training, we consider both the 2D topology and 3D geometry.\n- During downstream, we consider tasks with 2D topology only.\n\nIn the future, we will merge it into the [TorchDrug](https://github.com/DeepGraphLearning/torchdrug) package.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"fig/pipeline.png\" /\u003e \n\u003c/p\u003e\n\n## Baselines\nFor implementation, this repository also provides the following graph SSL baselines:\n- Generative Graph SSL:\n  - [Edge Prediction (EdgePred)](https://proceedings.neurips.cc/paper/2017/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf)\n  - [AttributeMasking (AttrMask)](https://openreview.net/forum?id=HJlWWJSFDH)\n  - [GPT-GNN](https://arxiv.org/abs/2006.15437)\n- Contrastive Graph SSL:\n  - [InfoGraph](https://openreview.net/pdf?id=r1lfF2NYvH)\n  - [Context Prediction (ContextPred)](https://openreview.net/forum?id=HJlWWJSFDH)\n  - [GraphLoG](http://proceedings.mlr.press/v139/xu21g/xu21g.pdf)\n  - [Grover-Contextual](https://papers.nips.cc/paper/2020/hash/94aef38441efa3380a3bed3faf1f9d5d-Abstract.html)\n  - [GraphCL](https://papers.nips.cc/paper/2020/file/3fe230348e9a12c13120749e3f9fa4cd-Paper.pdf)\n  - [JOAO](https://arxiv.org/abs/2106.07594)\n- Predictive Graph SSL:\n  - [Grover-Motif](https://papers.nips.cc/paper/2020/hash/94aef38441efa3380a3bed3faf1f9d5d-Abstract.html)\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"fig/baselines.png\" /\u003e \n\u003c/p\u003e\n\n## Environments\nInstall packages under conda env\n```bash\nconda create -n GraphMVP python=3.7\nconda activate GraphMVP\n\nconda install -y -c rdkit rdkit\nconda install -y -c pytorch pytorch=1.9.1\nconda install -y numpy networkx scikit-learn\npip install ase\npip install git+https://github.com/bp-kelley/descriptastorus\npip install ogb\nexport TORCH=1.9.0\nexport CUDA=cu102  # cu102, cu110\n\nwget https://data.pyg.org/whl/torch-${TORCH}%2B${CUDA}/torch_cluster-1.5.9-cp37-cp37m-linux_x86_64.whl\npip install torch_cluster-1.5.9-cp37-cp37m-linux_x86_64.whl\nwget https://data.pyg.org/whl/torch-${TORCH}%2B${CUDA}/torch_scatter-2.0.9-cp37-cp37m-linux_x86_64.whl\npip install torch_scatter-2.0.9-cp37-cp37m-linux_x86_64.whl\nwget https://data.pyg.org/whl/torch-${TORCH}%2B${CUDA}/torch_sparse-0.6.12-cp37-cp37m-linux_x86_64.whl\npip install torch_sparse-0.6.12-cp37-cp37m-linux_x86_64.whl\npip install torch-geometric==1.7.2\n```\n\n## Dataset Preprocessing\n\nFor dataset download, please follow the instruction [here](https://github.com/chao1224/GraphMVP/tree/main/datasets).\n\nFor data preprocessing (GEOM), please use the following commands:\n```\ncd src_classification\npython GEOM_dataset_preparation.py --n_mol 50000 --n_conf 5 --n_upper 1000 --data_folder $SLURM_TMPDIR\ncd ..\n\ncd src_regression\npython GEOM_dataset_preparation.py --n_mol 50000 --n_conf 5 --n_upper 1000 --data_folder $SLURM_TMPDIR\ncd ..\n\nmv $SLURM_TMPDIR/GEOM datasets\n```\n\n**Featurization**. We employ two sets of featurization methods on atoms.\n1. For classification tasks, in order to follow the main molecular graph SSL research line, we use the same atom featurization methods (consider the atom types and chirality).\n2. For regression tasks, results with the above two atom-level features are too bad. Thus, we consider more comprehensive features from OGB.\n\n## Experiments\n\n### Terminology specification\n\nIn the latest scripts, we use `GraphMVP` for the trivial GraphMVP (Eq. 7 in the paper), and `GraphMVP_hybrid` includes two variants adding extra 2D SSL pretext tasks (Eq 8. in the paper).\nIn the previous scripts, we call these two terms as `3D_hybrid_02_masking` and `3D_hybrid_03_masking` respectively.\nThis could show up in some pre-trained log files [here](https://drive.google.com/drive/folders/1uPsBiQF3bfeCAXSDd4JfyXiTh-qxYfu6?usp=sharing).\n\n| GraphMVP | Latest scripts | Previous scripts |\n| :--: | :--: | :--: |\n| Eq. 7 | `GraphMVP` | `3D_hybrid_02_masking` |\n| Eq. 8 | `GraphMVP_hybrid` | `3D_hybrid_03_masking` |\n\n### For GraphMVP pre-training\n\nCheck the following scripts:\n- `scripts_classification/submit_pre_training_GraphMVP.sh`\n- `scripts_classification/submit_pre_training_GraphMVP_hybrid.sh`\n- `scripts_regression/submit_pre_training_GraphMVP.sh`\n- `scripts_regression/submit_pre_training_GraphMVP_hybrid.sh`\n\nThe pre-trained model weights, training logs, and prediction files can be found [here](https://drive.google.com/drive/folders/1uPsBiQF3bfeCAXSDd4JfyXiTh-qxYfu6?usp=sharing).\n\n### For Other SSL pre-training baselines\n\nCheck the following scripts:\n- `scripts_classification/submit_pre_training_baselines.sh`\n- `scripts_regression/submit_pre_training_baselines.sh`\n\n### For Downstream tasks\n\nCheck the following scripts:\n- `scripts_classification/submit_fine_tuning.sh`\n- `scripts_regression/submit_fine_tuning.sh`\n\n## Cite Us\n\nFeel free to cite this work if you find it useful to you!\n\n```\n@inproceedings{liu2022pretraining,\n    title={Pre-training Molecular Graph Representation with 3D Geometry},\n    author={Shengchao Liu and Hanchen Wang and Weiyang Liu and Joan Lasenby and Hongyu Guo and Jian Tang},\n    booktitle={International Conference on Learning Representations},\n    year={2022},\n    url={https://openreview.net/forum?id=xQUe1pOKPam}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchao1224%2Fgraphmvp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchao1224%2Fgraphmvp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchao1224%2Fgraphmvp/lists"}