{"id":28700525,"url":"https://github.com/deepgraphlearning/esm-gearnet","last_synced_at":"2025-07-01T13:42:42.409Z","repository":{"id":200886396,"uuid":"706435082","full_name":"DeepGraphLearning/ESM-GearNet","owner":"DeepGraphLearning","description":"ESM-GearNet for Protein Structure Representation Learning (https://arxiv.org/abs/2303.06275)","archived":false,"fork":false,"pushed_at":"2023-10-23T20:23:50.000Z","size":1638,"stargazers_count":102,"open_issues_count":2,"forks_count":10,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-06-14T11:08:19.867Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DeepGraphLearning.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-10-18T00:22:37.000Z","updated_at":"2025-06-08T09:38:40.000Z","dependencies_parsed_at":"2023-10-23T21:29:48.383Z","dependency_job_id":null,"html_url":"https://github.com/DeepGraphLearning/ESM-GearNet","commit_stats":null,"previous_names":["deepgraphlearning/esm-gearnet"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/DeepGraphLearning/ESM-GearNet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeepGraphLearning%2FESM-GearNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeepGraphLearning%2FESM-GearNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeepGraphLearning%2FESM-GearNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeepGraphLearning%2FESM-GearNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DeepGraphLearning","download_url":"https://codeload.github.com/DeepGraphLearning/ESM-GearNet/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeepGraphLearning%2FESM-GearNet/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259884743,"owners_count":22926457,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-14T11:08:10.647Z","updated_at":"2025-07-01T13:42:42.347Z","avatar_url":"https://github.com/DeepGraphLearning.png","language":"Python","readme":"# ESM-GearNet\n\n\nThis is the official codebase of the paper\n\n**A Systematic Study of Joint Representation Learning on Protein Sequences and Structures** \n[[ArXiv](https://arxiv.org/abs/2303.06275)]\n\n[Zuobai Zhang](https://oxer11.github.io/), [Chuanrui Wang*](https://wang-cr.github.io/), [Minghao Xu*](https://chrisallenming.github.io/), [Vijil Chenthamarakshan](https://researcher.watson.ibm.com/researcher/view.php?person=us-ecvijil), [Aurelie Lozano](https://researcher.watson.ibm.com/researcher/view.php?person=us-aclozano), [Payel Das](https://researcher.watson.ibm.com/researcher/view.php?person=us-daspa), [Jian Tang](https://jian-tang.com/)\n\n\n## Overview\n\nTo explore the advantage of combining the advantages of sequence- and structure-based protein encoders, we conduct a comprehensive investigation into joint protein representation learning.\nOur study combines a state-of-the-art PLM (ESM-2) with three distinct structure encoders ([GVP](https://openreview.net/forum?id=1YLJDvSx6J4), [GearNet](https://openreview.net/forum?id=to3qCB3tOh9), and [CDConv](https://openreview.net/forum?id=P5Z-Zl9XJ7)). \nWe introduce three fusion strategies—serial, parallel, and cross fusion—to combine sequence and structure representations.\n\n![GearNet](./asset/ESM-GearNet.png)\n\nWe further explore six diverse pre-training techniques: ([Residue Type Prediction](https://arxiv.org/abs/2203.06125), [Distance Prediction](https://arxiv.org/abs/2203.06125), [Angle Prediction](https://arxiv.org/abs/2203.06125), [Dihedral Prediction](https://arxiv.org/abs/2203.06125), [Multiview Contrast](https://arxiv.org/abs/2203.06125), [SiamDiff](https://arxiv.org/abs/2301.12068)), employing the optimal model from the aforementioned choices and leveraging pre-training on the AlphaFold Database.\n\n![SSL](./asset/pretrain.png)\n\nYou can find the pre-trained model weights [here](https://zenodo.org/records/10034578), including ESM-GearNet pre-trained with [Multiview Contrast](https://zenodo.org/records/10034578/files/mc_esm_gearnet.pth?download=1), [Residue Type Prediction](https://zenodo.org/records/10034578/files/attr_esm_gearnet.pth?download=1), [Distance Prediction](https://zenodo.org/records/10034578/files/dist_esm_gearnet.pth?download=1), [Angle Prediction](https://zenodo.org/records/10034578/files/angle_esm_gearnet.pth?download=1), [Dihedral Prediction](https://zenodo.org/records/10034578/files/dihedral_esm_gearnet.pth?download=1) and [SiamDiff](https://zenodo.org/records/10034578/files/siamdiff_esm_gearnet.pth?download=1).\n\n\n## Installation\n\nYou may install the dependencies via either conda or pip. Generally, ESM-GearNet works\nwith Python 3.7/3.8 and PyTorch version \u003e= 1.12.0.\n\n### From Conda\n\n```bash\nconda install torchdrug pytorch=1.12.1 cudatoolkit=11.6 -c milagraph -c pytorch-lts -c pyg -c conda-forge\nconda install easydict pyyaml -c conda-forge\nconda install transformers==4.14.1 tokenizers==0.10.3 -c huggingface \npip install atom3d\n```\n\n### From Pip\n\n```bash\npip install torch==1.12.1+cu116 -f https://download.pytorch.org/whl/lts/1.12/torch_lts.html\npip install torchdrug\npip install easydict pyyaml\npip install atom3d\npip install transformers==4.14.1 tokenizers==0.10.3\n```\n\n## Reproduction\n\n### Training From Scratch\n\nTo reproduce the results of ESM-{GVP, GearNet, CDConv}, use the following command. \nAlternatively, you may reset the `gpus` parameter in configure files to switch to other GPUs. All the datasets will be automatically downloaded in the code. \nIt takes longer time to run the code for the first time due to the preprocessing time of the dataset.\n\n\n```bash\n# Run ESM-GearNet (serial fusion) on the Enzyme Comission dataset with 4 gpus\npython -m torch.distributed.launch --nproc_per_node=4 script/downstream.py -c config/EC/esm_gearnet.yaml\n\n# ESM-GearNet (parallel fusion)\npython -m torch.distributed.launch --nproc_per_node=4 script/downstream.py -c config/EC/esm_gearnet_parallel.yaml\n\n# ESM-GearNet (cross fusion)\npython -m torch.distributed.launch --nproc_per_node=4 script/downstream.py -c config/EC/esm_gearnet_cross.yaml\n\n# Run ESM-GearNet (serial fusion) on the Gene Ontology dataset\npython -m torch.distributed.launch --nproc_per_node=4 script/downstream.py -c config/GO/esm_gearnet.yaml --branch MF\n\n# Run ESM-GearNet (serial fusion) on the PSR dataset\npython -m torch.distributed.launch --nproc_per_node=4 script/downstream.py -c config/PSR/esm_gearnet.yaml\n\n# Run ESM-GearNet (serial fusion) on the MSP dataset\npython -m torch.distributed.launch --nproc_per_node=4 script/downstream.py -c config/MSP/esm_gearnet.yaml\n```\n\n### Pre-training and Fine-tuning\nBy default, we will use the AlphaFold Datase for pretraining. To pre-train ESM-GearNet with Multiview Contrast, use the following command. Similar, all the datasets will be automatically downloaded in the code and preprocessed for the first time you run the code.\n\n```bash\n# Run pre-training\npython -m torch.distributed.launch --nproc_per_node=4 script/pretrain.py -c config/pretrain/mc_esm_gearnet.yaml\n```\n\nAfter pre-training, you can load the model weight from the saved checkpoint via the `--ckpt` argument and then finetune the model on downstream tasks.\n**Remember to first uncomment the ``model_checkpoint: {{ ckpt }}` line in the config file.**\n\n```bash\npython -m torch.distributed.launch --nproc_per_node=4 script/downstream.py -c config/EC/esm_gearnet.yaml --ckpt \u003cpath_to_your_model\u003e\n```\n\n## Citation\nIf you find this codebase useful in your research, please cite the following papers.\n\n```bibtex\n@article{zhang2023enhancing,\n  title={A Systematic Study of Joint Representation Learning on Protein Sequences and Structures},\n  author={Zhang, Zuobai and Wang, Chuanrui and Xu, Minghao and Chenthamarakshan, Vijil and Lozano, Aurelie and Das, Payel and Tang, Jian},\n  journal={arXiv preprint arXiv:2303.06275},\n  year={2023}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepgraphlearning%2Fesm-gearnet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeepgraphlearning%2Fesm-gearnet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepgraphlearning%2Fesm-gearnet/lists"}