{"id":13958574,"url":"https://github.com/salesforce/provis","last_synced_at":"2025-04-07T10:26:37.900Z","repository":{"id":37730264,"uuid":"272459149","full_name":"salesforce/provis","owner":"salesforce","description":"Official code repository of \"BERTology Meets Biology: Interpreting Attention in Protein Language Models.\"","archived":false,"fork":false,"pushed_at":"2023-06-12T21:27:47.000Z","size":7536,"stargazers_count":303,"open_issues_count":3,"forks_count":48,"subscribers_count":18,"default_branch":"master","last_synced_at":"2025-03-31T08:09:06.059Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2006.15222","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/salesforce.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":"SECURITY.md","support":null,"governance":null}},"created_at":"2020-06-15T14:22:26.000Z","updated_at":"2025-03-16T03:10:05.000Z","dependencies_parsed_at":"2023-09-24T16:57:22.203Z","dependency_job_id":null,"html_url":"https://github.com/salesforce/provis","commit_stats":{"total_commits":21,"total_committers":3,"mean_commits":7.0,"dds":"0.23809523809523814","last_synced_commit":"051fe89190d9ac74865a6f49a3c25fd7b0fcca57"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salesforce%2Fprovis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salesforce%2Fprovis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salesforce%2Fprovis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salesforce%2Fprovis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/salesforce","download_url":"https://codeload.github.com/salesforce/provis/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247634237,"owners_count":20970491,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-08T13:01:45.138Z","updated_at":"2025-04-07T10:26:37.865Z","avatar_url":"https://github.com/salesforce.png","language":"Python","funding_links":[],"categories":["蛋白质结构"],"sub_categories":["网络服务_其他"],"readme":"# BERTology Meets Biology: Interpreting Attention in Protein Language Models\n\nThis repository is the official implementation of [BERTology Meets Biology: Interpreting Attention in Protein Language Models](https://arxiv.org/abs/2006.15222). \n\n## Table of Contents\n\n- [ProVis Attention Visualizer](#provis-attention-visualizer)\n  * [Installation](#installation)\n  * [Execution](#execution)\n- [Experiments](#experiments)\n  * [Installation](#installation-2)\n  * [Datasets](#datasets)\n  * [Attention Analysis](#attention-analysis)\n    + [Tape BERT Model](#tape-bert-model)\n    + [ProtTrans Models](#prottrans-models)\n  * [Probing Analysis](#probing-analysis)\n    + [Training](#training)\n    + [Reports](#reports)\n- [License](#license)\n- [Acknowledgments](#acknowledgments)\n- [Citation](#citation)\n\n## ProVis Attention Visualizer\n\nThis section provides instructions for generating visualizations of attention projected onto 3D protein structure.\n\n![Image](images/vis3d_binding_sites.png?raw=true)  ![Image](images/vis3d_contact_map.png?raw=true)\n\n### Installation\n**General requirements**:\n* Python \u003e= 3.7\n\n```\npip install biopython==1.77\npip install tape-proteins==0.5\npip install jupyterlab==3.0.14\npip install nglview\njupyter-nbextension enable nglview --py --sys-prefix\n```\n\nIf you run into problems installing nglview, please refer to their \n[installation instructions](https://github.com/arose/nglview#released-version) for additional installation details\n and options.\n\n\n### Execution\n\n```\ncd \u003cproject_root\u003e/notebooks\njupyter notebook provis.ipynb\n```\n\nIf you get an error running the notebook, you may need to execute the notebook as follows:\n\n```\njupyter notebook --NotebookApp.iopub_data_rate_limit=10000000\n```\nSee nglview [installation instructions](https://github.com/arose/nglview#released-version) for more details.\n\nYou may edit the notebook to choose other proteins, attention heads, etc. The visualization tool is based on the\nexcellent [nglview](https://github.com/arose/nglview) library.\n\n---\n\n## Experiments\n\nThis section describes how to reproduce the experiments in the paper.\n\n### Installation\n\n```setup\ncd \u003cproject_root\u003e\npython setup.py develop\n```\n\nTo download additional required datasets from [TAPE](https://github.com/songlab-cal/tape):\n\n```setup\ncd \u003cproject_root\u003e/data\nwget http://s3.amazonaws.com/songlabdata/proteindata/data_pytorch/secondary_structure.tar.gz\ntar -xvf secondary_structure.tar.gz \u0026\u0026 rm secondary_structure.tar.gz\nwget http://s3.amazonaws.com/songlabdata/proteindata/data_pytorch/proteinnet.tar.gz\ntar -xvf proteinnet.tar.gz \u0026\u0026 rm proteinnet.tar.gz\n```\n\n### Attention Analysis\n\nThe following steps will reproduce the attention analysis experiments and generate the reports currently found in\n \u003cproject_root\u003e/reports/attention_analysis. This includes all experiments besides the probing experiments\n  (see [Probing Analysis](#probing-analysis)).\n\nBefore performing steps, navigate to appropriate directory:\n```\ncd \u003cproject_root\u003e/protein_attention/attention_analysis\n```\n\n#### Tape BERT Model\n\nThe following executes the attention analysis (may run for several hours):\n```\nsh scripts/compute_all_features_tape_bert.sh\n```\nThe above script create a set of extract files in \u003cproject_root\u003e/data/cache corresponding to various properties\nbeing analyzed. You may edit the script files to remove properties that you are not interested in. If you wish to run the\n analysis without a GPU, you must specify the `--no_cuda` flag.\n\nThe following generate reports based on the files created in previous step:\n```\nsh scripts/report_all_features_tape_bert.sh\n```\nIf you removed steps from the analysis script above, you will need to update the reporting script accordingly.\n\n\n#### ProtTrans Models\n\nIn order to generate reports for the ProtTrans models, follow the instructions as for the TapeBert\n model above, but substitute the following commands:\u003cbr\u003e\n\n**ProtBert:**\u003cbr/\u003e\n```\nsh scripts/compute_all_features_prot_bert.sh\nsh scripts/report_all_features_prot_bert.sh\n```\n \n**ProtBertBFD:**\u003cbr/\u003e\n```\nsh scripts/compute_all_features_prot_bert_bfd.sh\nsh scripts/report_all_features_prot_bert_bfd.sh\n```\n\n**ProtAlbert:**\u003cbr/\u003e\n```\nsh scripts/compute_all_features_prot_albert.sh\nsh scripts/report_all_features_prot_albert.sh\n```\n\n**ProtXLNet:**\u003cbr/\u003e\n```\nsh scripts/compute_all_features_prot_xlnet.sh\nsh scripts/report_all_features_prot_xlnet.sh\n```\n\n### Probing Analysis\n\nThe following steps will recreate the figures from the probing analysis, currently found in \u003cproject_root\u003e/reports/probing\n\nNavigate to directory:\n```\ncd \u003cproject_root\u003e/protein_attention/probing\n```\n\n#### Training\nTrain diagnostic classifiers. Each script will write out an extract file with evaluation results. Note: each of these scripts may run for several hours.\n```\nsh scripts/probe_ss4_0_all\nsh scripts/probe_ss4_1_all\nsh scripts/probe_ss4_2_all\nsh scripts/probe_sites.sh\nsh scripts/probe_contacts.sh\n```\n#### Reports\n```\npython report.py\n```\n\n## License\n\nThis project is licensed under BSD3 License - see the [LICENSE](LICENSE) file for details\n\n## Acknowledgments\n\nThis project incorporates code from the following repo:\n* https://github.com/songlab-cal/tape\n\n## Citation\n\nWhen referencing this repository, please cite [this paper](https://arxiv.org/abs/2006.15222).\n\n```\n@misc{vig2020bertology,\n    title={BERTology Meets Biology: Interpreting Attention in Protein Language Models},\n    author={Jesse Vig and Ali Madani and Lav R. Varshney and Caiming Xiong and Richard Socher and Nazneen Fatema Rajani},\n    year={2020},\n    eprint={2006.15222},\n    archivePrefix={arXiv},\n    primaryClass={cs.CL},\n    url={https://arxiv.org/abs/2006.15222}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsalesforce%2Fprovis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsalesforce%2Fprovis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsalesforce%2Fprovis/lists"}