{"id":28700519,"url":"https://github.com/deepgraphlearning/peer_benchmark","last_synced_at":"2025-06-14T11:08:15.152Z","repository":{"id":59967834,"uuid":"502766040","full_name":"DeepGraphLearning/PEER_Benchmark","owner":"DeepGraphLearning","description":"PEER Benchmark, appear at NeurIPS 2022 Dataset and Benchmark Track (https://arxiv.org/abs/2206.02096)","archived":false,"fork":false,"pushed_at":"2023-03-18T01:23:08.000Z","size":1763,"stargazers_count":84,"open_issues_count":3,"forks_count":10,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-01-06T09:03:24.590Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DeepGraphLearning.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-06-13T01:10:15.000Z","updated_at":"2024-12-07T22:31:32.000Z","dependencies_parsed_at":"2023-01-20T04:04:50.593Z","dependency_job_id":null,"html_url":"https://github.com/DeepGraphLearning/PEER_Benchmark","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/DeepGraphLearning/PEER_Benchmark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeepGraphLearning%2FPEER_Benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeepGraphLearning%2FPEER_Benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeepGraphLearning%2FPEER_Benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeepGraphLearning%2FPEER_Benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DeepGraphLearning","download_url":"https://codeload.github.com/DeepGraphLearning/PEER_Benchmark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DeepGraphLearning%2FPEER_Benchmark/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259804865,"owners_count":22913903,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-14T11:08:08.917Z","updated_at":"2025-06-14T11:08:15.141Z","avatar_url":"https://github.com/DeepGraphLearning.png","language":"Python","readme":"# PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding\n\nThis is the official codebase of the paper [PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding](https://arxiv.org/pdf/2206.02096.pdf), \naccepted by **NeurIPS 2022 Dataset and Benchmark Track** ([OpenReview link](https://openreview.net/forum?id=QgTZ56-zJou)).\n\n[Minghao Xu*](https://chrisallenming.github.io),\n[Zuobai Zhang*](https://oxer11.github.io),\n[Jiarui Lu](https://mila.quebec/en/person/jiarui-lu/),\n[Zhaocheng Zhu](https://kiddozhu.github.io),\n[Yangtian Zhang](https://zytzrh.github.io/),\n[Chang Ma](https://github.com/chang-github-00),\n[Runcheng Liu](https://www.runchengliu.com/),\n[Jian Tang](https://jian-tang.com)\n(*equal contribution)\n\n## News ##\n\n- [2022/09/19] The initial release! We release source codes and configs for 14 tasks in the PEER benchmark.\n- [2022/10/16] Full PEER benchmark released! We newly release the source codes and configs for three protein function prediction tasks from FLIP. \n\n## Overview ##\n\nPEER is a **comprehensive** and **multi-task** benchmark for protein sequence understanding. \nIt contains 17 tasks of protein understanding lying in 5 task categories \nincluding protein function prediction, protein localization prediction, protein structure prediction, protein-protein interaction prediction and protein-ligand interaction prediction. \nOn this benchmark, we evaluate different types of sequence-based methods for each task \nincluding traditional feature engineering approaches, different sequence encoding methods as well as large-scale pre-trained protein language models \nunder both **single-task learning** and **multi-task learning** settings.\n\n![PEER Benchmark](asset/benchmark.png)\n\nThis codebase is based on PyTorch and [TorchProtein], an extension of [TorchDrug] specific to protein applications.\nIt supports training and inference with multiple GPUs or multiple machines.\n\n[TorchProtein]: https://torchprotein.ai/\n[TorchDrug]: https://torchdrug.ai/\n\n## Installation ##\n\nYou may install the dependencies of TorchProtein and PEER benchmark as below. \nGenerally, they work with Python 3.7/3.8 and PyTorch version \u003e= 1.8.0.\n\n```bash\nconda create -n protein python=3.7\nconda activate protein\n\nconda install pytorch==1.8.0 cudatoolkit=10.2 -c pytorch\nconda install scikit-learn pandas decorator ipython networkx tqdm matplotlib -y\nconda install pytorch-scatter pytorch-cluster -c pyg -c conda-forge\npip install fair-esm transformers easydict pyyaml lmdb\n\npython -m pip install git+https://github.com/DeepGraphLearning/torchdrug/\n```\n\n## Reproduction ##\n\n### Experimental Configurations ###\n\nWe provide a yaml based config for each benchmark experiment in our paper. \nThe configs of all baselines for single-task and multi-task learning are stored in ```./config/``` with the following folder structure:\n\n```\nconfig\n └── single_task\n     ├── ESM\n     │   ├── Task_ESM.yaml\n     │   ├── Task_ESM_fix.yaml\n     ├── ProtBert\n     ├── LSTM\n     ├── BERT\n     ├── CNN\n     ├── ResNet\n     ├── DDE\n     ├── Moran\n ├── multi_task\n     ├── ESM\n     │   ├── CenterTask_AuxiliaryTask_ESM.yaml\n     ├── BERT\n     ├── CNN\n```\n\n### Launch Experiments ###\n\nIn each config, we give a **suggested GPU configuration**, considering the tradeoff between *dataset size* and *computational resource*.\nWe assume **Tesla-V100-32GB GPUs** as the computational resource.\nYou can change this default configuration based on your own computational resource.\n\n*Note.* The benchmark results can be reproduced by taking the mean and std of three runs with ```--seed 0```, ```--seed 1``` and ```--seed 2```. \n\n**Single-GPU.** By setting ```gpus: [0]```, the experiment is performed under a single GPU.\nYou can use the following command to run with seed 0, where all datasets will be automatically downloaded in the code.\n\nSingle-task learning experiment:\n```bash\npython script/run_single.py -c config/single_task/$model/$yaml_config --seed 0\n```\n\nMulti-task learning experiment:\n```bash\npython script/run_multi.py -c config/multi_task/$model/$yaml_config --seed 0\n```\n\n**Multi-GPU.** By setting ```gpus: [0,1,2,3]```, the experiment is performed under a single machine with 4 GPUs.\nYou can use the following command to run with seed 0.\n\nSingle-task learning experiment:\n```bash\npython -m torch.distributed.launch --nproc_per_node=4 script/run_single.py -c config/single_task/$model/$yaml_config --seed 0\n```\n\nMulti-task learning experiment:\n```bash\npython -m torch.distributed.launch --nproc_per_node=4 script/run_multi.py -c config/multi_task/$model/$yaml_config --seed 0\n```\n\n**Multi-Machine.** By setting ```gpus: [0,1,2,3,0,1,2,3]```, the experiment is performed under 2 machines with 4 GPUs in each machine.\nYou can use the following command to run with seed 0.\n\nSingle-task learning experiment:\n```bash\npython -m torch.distributed.launch --nnodes=2 --nproc_per_node=4 script/run_single.py -c config/single_task/$model/$yaml_config --seed 0\n```\n\nMulti-task learning experiment:\n```bash\npython -m torch.distributed.launch --nnodes=2 --nproc_per_node=4 script/run_multi.py -c config/multi_task/$model/$yaml_config --seed 0\n```\n\n## Benchmark Results ##\n\nAt the [website of TorchProtein], we maintain a leaderboard for each benchmark task. \nWe also maintain an **integrated leaderboard** among different methods by taking the mean reciprocal rank (MRR) as the metric. \nIn the future, we will open the entrance to receive new benchmark results of new methods from the community. \n\n[website of TorchProtein]: https://torchprotein.ai/benchmark\n\n## License ##\n\nThis codebase is released under the Apache License 2.0 as in the [LICENSE](LICENSE) file.\n\n## Citation ##\n\nIf you find this codebase helpful in your research, please cite the following paper.\n```\n@article{xu2022peer,\n  title={PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding},\n  author={Xu, Minghao and Zhang, Zuobai and Lu, Jiarui and Zhu, Zhaocheng and Zhang, Yangtian and Ma, Chang and Liu, Runcheng and Tang, Jian},\n  journal={arXiv preprint arXiv:2206.02096},\n  year={2022}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepgraphlearning%2Fpeer_benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeepgraphlearning%2Fpeer_benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepgraphlearning%2Fpeer_benchmark/lists"}