{"id":13751836,"url":"https://github.com/DeepGraphLearning/GearNet","last_synced_at":"2025-05-09T18:32:23.471Z","repository":{"id":60275838,"uuid":"510563695","full_name":"DeepGraphLearning/GearNet","owner":"DeepGraphLearning","description":"GearNet and Geometric Pretraining Methods for Protein Structure Representation Learning, ICLR'2023 (https://arxiv.org/abs/2203.06125)","archived":false,"fork":false,"pushed_at":"2023-11-20T01:44:25.000Z","size":524,"stargazers_count":258,"open_issues_count":22,"forks_count":28,"subscribers_count":10,"default_branch":"main","last_synced_at":"2024-08-03T09:03:18.880Z","etag":null,"topics":["graph-neural-networks","pre-training","protein-representation-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DeepGraphLearning.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-07-05T02:37:43.000Z","updated_at":"2024-07-29T21:50:08.000Z","dependencies_parsed_at":"2023-02-17T03:45:54.989Z","dependency_job_id":"28436562-5ac6-4536-9752-99000de812ea","html_url":"https://github.com/DeepGraphLearning/GearNet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeepGraphLearning%2FGearNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeepGraphLearning%2FGearNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeepGraphLearning%2FGearNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeepGraphLearning%2FGearNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DeepGraphLearning","download_url":"https://codeload.github.com/DeepGraphLearning/GearNet/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224876976,"owners_count":17384699,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["graph-neural-networks","pre-training","protein-representation-learning"],"created_at":"2024-08-03T09:00:55.554Z","updated_at":"2024-11-16T04:31:46.024Z","avatar_url":"https://github.com/DeepGraphLearning.png","language":"Python","readme":"# GearNet: Geometry-Aware Relational Graph Neural Network\n\n\nThis is the official codebase of the paper\n\n**Protein Representation Learning by Geometric Structure Pretraining**, *ICLR'2023*\n\n[[ArXiv](https://arxiv.org/abs/2203.06125)] [[OpenReview](https://openreview.net/forum?id=to3qCB3tOh9)]\n\n[Zuobai Zhang](https://oxer11.github.io/), [Minghao Xu](https://chrisallenming.github.io/), [Arian Jamasb](https://jamasb.io/), [Vijil Chenthamarakshan](https://researcher.watson.ibm.com/researcher/view.php?person=us-ecvijil), [Aurelie Lozano](https://researcher.watson.ibm.com/researcher/view.php?person=us-aclozano), [Payel Das](https://researcher.watson.ibm.com/researcher/view.php?person=us-daspa), [Jian Tang](https://jian-tang.com/)\n\nand the paper\n\n**Enhancing Protein Language Models with Structure-based Encoder and Pre-training**, *ICLR'2023 MLDD Workshop*\n\n[[ArXiv](https://arxiv.org/abs/2303.06275)] [[OpenReview]()]\n\n[Zuobai Zhang](https://oxer11.github.io/), [Minghao Xu](https://chrisallenming.github.io/), [Vijil Chenthamarakshan](https://researcher.watson.ibm.com/researcher/view.php?person=us-ecvijil), [Aurelie Lozano](https://researcher.watson.ibm.com/researcher/view.php?person=us-aclozano), [Payel Das](https://researcher.watson.ibm.com/researcher/view.php?person=us-daspa), [Jian Tang](https://jian-tang.com/)\n\n## News\n\n- [2023/10/17] Please check the latest version of the [ESM-GearNet paper](https://arxiv.org/abs/2303.06275) and [code implementation](https://github.com/DeepGraphLearning/ESM-GearNet)!!\n\n- [2023/03/14] The code for ESM_GearNet has been released with [our latest paper](https://arxiv.org/abs/2303.06275).\n\n- [2023/02/25] The code for GearNet_Edge_IEConv \u0026 Fold3D dataset has been released.\n\n- [2023/02/01] Our paper has been accepted by ICLR'2023! We have released the pretrained model weights [here](https://zenodo.org/record/7593637).\n\n- [2022/11/20] We add the scheduler in the `downstream.py` and provide the config file for training GearNet-Edge with single GPU on EC. Now you can reproduce the results in the paper.\n\n## Overview\n\n*GeomEtry-Aware Relational Graph Neural Network (**GearNet**)* is a simple yet effective structure-based protein encoder. \nIt encodes spatial information by adding different types of sequential or structural edges and then performs relational message passing on protein residue graphs, which can be further enhanced by an edge message passing mechanism.\nThough conceptually simple, GearNet augmented with edge message passing can achieve very strong performance on several benchmarks in a supervised setting.\n\n![GearNet](./asset/GearNet.png)\n\nFive different geometric self-supervised learning methods based on protein structures are further proposed to pretrain the encoder, including **Multivew Contrast**, **Residue Type Prediction**, **Distance Prediction**, **Angle Prediction**, **Dihedral Prediction**.\nThrough extensively benchmarking these pretraining techniques on diverse\ndownstream tasks, we set up a solid starting point for pretraining protein structure representations.\n\n![SSL](./asset/SSL.png)\n\nThis codebase is based on PyTorch and [TorchDrug] ([TorchProtein](https://torchprotein.ai)). \nIt supports training and inference with multiple GPUs.\nThe documentation and implementation of our methods can be found in the [docs](https://torchdrug.ai/docs/) of TorchDrug.\nTo adapt our model in your setting, you can follow the step-by-step [tutorials](https://torchprotein.ai/tutorials) in TorchProtein.\n\n[TorchDrug]: https://github.com/DeepGraphLearning/torchdrug\n\n## Installation\n\nYou may install the dependencies via either conda or pip. Generally, GearNet works\nwith Python 3.7/3.8 and PyTorch version \u003e= 1.8.0.\n\n### From Conda\n\n```bash\nconda install torchdrug pytorch=1.8.0 cudatoolkit=11.1 -c milagraph -c pytorch-lts -c pyg -c conda-forge\nconda install easydict pyyaml -c conda-forge\n```\n\n### From Pip\n\n```bash\npip install torch==1.8.0+cu111 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html\npip install torchdrug\npip install easydict pyyaml\n```\n\n### Using Docker\n\nFirst, make sure to setup docker with GPU support ([guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)).\n\nNext, build docker image\n\n```bash\ndocker build . -t GearNet\n```\n\nThen, after image is built, you can run training commands from within docker with following command\n\n```bash\ndocker run -it -v /path/to/dataset/directory/on/disk:/root/scratch/ --gpus all GearNet bash\n```\n\n## Reproduction\n\n### Training From Scratch\n\nTo reproduce the results of GearNet, use the following command. Alternatively, you\nmay use `--gpus null` to run GearNet on a CPU. All the datasets will be automatically downloaded in the code.\nIt takes longer time to run the code for the first time due to the preprocessing time of the dataset.\n\n```bash\n# Run GearNet on the Enzyme Comission dataset with 1 gpu\npython script/downstream.py -c config/downstream/EC/gearnet.yaml --gpus [0]\n```\n\nWe provide the hyperparameters for each experiment in configuration files.\nAll the configuration files can be found in `config/*.yaml`.\n\nTo run GearNet with multiple GPUs, use the following commands.\n\n```bash\n# Run GearNet on the Enzyme Comission dataset with 4 gpus\npython -m torch.distributed.launch --nproc_per_node=4 script/downstream.py -c config/downstream/EC/gearnet.yaml --gpus [0,1,2,3]\n\n# Run ESM_GearNet on the Enzyme Comission dataset with 4 gpus\npython -m torch.distributed.launch --nproc_per_node=4 script/downstream.py -c config/downstream/EC/ESM_gearnet.yaml --gpus [0,1,2,3]\n\n# Run GearNet_Edge_IEConv on the Fold3D dataset with 4 gpus\n# You need to first install the latest version of torchdrug from source. See https://github.com/DeepGraphLearning/torchdrug.\npython -m torch.distributed.launch --nproc_per_node=4 script/downstream.py -c config/downstream/Fold3D/gearnet_edge_ieconv.yaml --gpus [0,1,2,3]\n```\n\n### Pretraining and Finetuning\nBy default, we will use the AlphaFold Datase for pretraining.\nTo pretrain GearNet-Edge with Multiview Contrast, use the following command. \nSimilar, all the datasets will be automatically downloaded in the code and preprocessed for the first time you run the code.\n\n```bash\n# Pretrain GearNet-Edge with Multiview Contrast\npython script/pretrain.py -c config/pretrain/mc_gearnet_edge.yaml --gpus [0]\n\n# Pretrain ESM_GearNet with Multiview Contrast\npython script/pretrain.py -c config/pretrain/mc_esm_gearnet.yaml --gpus [0]\n```\n\nAfter pretraining, you can load the model weight from the saved checkpoint via the `--ckpt` argument and then finetune the model on downstream tasks.\n\n```bash\n# Finetune GearNet-Edge on the Enzyme Commission dataset\npython script/downstream.py -c config/downstream/EC/gearnet_edge.yaml --gpus [0] --ckpt \u003cpath_to_your_model\u003e\n```\n\nYou can find the pretrained model weights [here](https://zenodo.org/record/7593637), including those pretrained with [Multiview Contrast](https://zenodo.org/record/7593637/files/mc_gearnet_edge.pth), [Residue Type Prediction](https://zenodo.org/record/7593637/files/attr_gearnet_edge.pth), [Distance Prediction](https://zenodo.org/record/7593637/files/distance_gearnet_edge.pth), [Angle Prediction](https://zenodo.org/record/7593637/files/angle_gearnet_edge.pth) and [Dihedral Prediction](https://zenodo.org/record/7593637/files/dihedral_gearnet_edge.pth).\n\n## Results\nHere are the results of GearNet w/ and w/o pretraining on standard benchmark datasets. **All the results are obtained with 4 A100 GPUs (40GB). Note results may be slightly different if the model is trained with 1 GPU and/or a smaller batch size. For EC and GO, the provided config files are for 4 GPUs with batch size 2 on each one. If you run the model on 1 GPU, you should set the batch size as 8.**\nMore detailed results are listed in the paper.\n\n\u003ctable\u003e\n    \u003ctr\u003e\n        \u003cth\u003eMethod\u003c/th\u003e\n        \u003cth\u003eEC\u003c/th\u003e\n        \u003cth\u003eGO-BP\u003c/th\u003e\n        \u003cth\u003eGO-MF\u003c/th\u003e\n        \u003cth\u003eGO-CC\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003cth\u003eGearNet\u003c/th\u003e\n        \u003ctd\u003e0.730\u003c/td\u003e\n        \u003ctd\u003e0.356\u003c/td\u003e\n        \u003ctd\u003e0.503\u003c/td\u003e\n        \u003ctd\u003e0.414\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003cth\u003eGearNet-Edge\u003c/th\u003e\n        \u003ctd\u003e0.810\u003c/td\u003e\n        \u003ctd\u003e0.403\u003c/td\u003e\n        \u003ctd\u003e0.580\u003c/td\u003e\n        \u003ctd\u003e0.450\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003cth\u003eMultiview Contrast\u003c/th\u003e\n        \u003ctd\u003e0.874\u003c/td\u003e\n        \u003ctd\u003e0.490\u003c/td\u003e\n        \u003ctd\u003e0.654\u003c/td\u003e\n        \u003ctd\u003e0.488\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003cth\u003eResidue Type Prediction\u003c/th\u003e\n        \u003ctd\u003e0.843\u003c/td\u003e\n        \u003ctd\u003e0.430\u003c/td\u003e\n        \u003ctd\u003e0.604\u003c/td\u003e\n        \u003ctd\u003e0.465\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003cth\u003eDistance Prediction\u003c/th\u003e\n        \u003ctd\u003e0.839\u003c/td\u003e\n        \u003ctd\u003e0.448\u003c/td\u003e\n        \u003ctd\u003e0.616\u003c/td\u003e\n        \u003ctd\u003e0.464\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003cth\u003eAngle Prediction\u003c/th\u003e\n        \u003ctd\u003e0.853\u003c/td\u003e\n        \u003ctd\u003e0.458\u003c/td\u003e\n        \u003ctd\u003e0.625\u003c/td\u003e\n        \u003ctd\u003e0.473\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003cth\u003eDihedral Prediction\u003c/th\u003e\n        \u003ctd\u003e0.859\u003c/td\u003e\n        \u003ctd\u003e0.458\u003c/td\u003e\n        \u003ctd\u003e0.626\u003c/td\u003e\n        \u003ctd\u003e0.465\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003cth\u003eESM_GearNet\u003c/th\u003e\n        \u003ctd\u003e0.883\u003c/td\u003e\n        \u003ctd\u003e0.491\u003c/td\u003e\n        \u003ctd\u003e0.677\u003c/td\u003e\n        \u003ctd\u003e0.501\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003cth\u003eESM_GearNet (Multiview Contrast)\u003c/th\u003e\n        \u003ctd\u003e0.894\u003c/td\u003e\n        \u003ctd\u003e0.516\u003c/td\u003e\n        \u003ctd\u003e0.684\u003c/td\u003e\n        \u003ctd\u003e0.5016\u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/table\u003e\n\n## Citation\nIf you find this codebase useful in your research, please cite the following papers.\n\n```bibtex\n@inproceedings{zhang2022protein,\n  title={Protein representation learning by geometric structure pretraining},\n  author={Zhang, Zuobai and Xu, Minghao and Jamasb, Arian and Chenthamarakshan, Vijil and Lozano, Aurelie and Das, Payel and Tang, Jian},\n  booktitle={International Conference on Learning Representations},\n  year={2023}\n}\n```\n\n```bibtex\n@article{zhang2023enhancing,\n  title={A Systematic Study of Joint Representation Learning on Protein Sequences and Structures},\n  author={Zhang, Zuobai and Wang, Chuanrui and Xu, Minghao and Chenthamarakshan, Vijil and Lozano, Aurelie and Das, Payel and Tang, Jian},\n  journal={arXiv preprint arXiv:2303.06275},\n  year={2023}\n}\n```\n","funding_links":[],"categories":["Ranked by starred repositories"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDeepGraphLearning%2FGearNet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDeepGraphLearning%2FGearNet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDeepGraphLearning%2FGearNet/lists"}