{"id":13561066,"url":"https://github.com/agemagician/ProtTrans","last_synced_at":"2025-04-03T16:32:08.189Z","repository":{"id":37729520,"uuid":"263028569","full_name":"agemagician/ProtTrans","owner":"agemagician","description":"ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.","archived":false,"fork":false,"pushed_at":"2025-01-22T16:20:32.000Z","size":58304,"stargazers_count":1190,"open_issues_count":16,"forks_count":158,"subscribers_count":31,"default_branch":"master","last_synced_at":"2025-03-31T14:06:10.898Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"afl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/agemagician.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-11T11:55:08.000Z","updated_at":"2025-03-26T07:22:47.000Z","dependencies_parsed_at":"2023-02-16T22:01:38.092Z","dependency_job_id":"cb54df1e-6d05-4d4b-a303-b7f4f722824e","html_url":"https://github.com/agemagician/ProtTrans","commit_stats":{"total_commits":241,"total_committers":7,"mean_commits":34.42857142857143,"dds":"0.24481327800829877","last_synced_commit":"05efcb781b2f54ec28108fca52977e99c99db1f5"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agemagician%2FProtTrans","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agemagician%2FProtTrans/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agemagician%2FProtTrans/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agemagician%2FProtTrans/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/agemagician","download_url":"https://codeload.github.com/agemagician/ProtTrans/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247037107,"owners_count":20873097,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T13:00:52.181Z","updated_at":"2025-04-03T16:32:03.181Z","avatar_url":"https://github.com/agemagician.png","language":"Jupyter Notebook","readme":"\u003cbr/\u003e\n\u003ch1 align=\"center\"\u003eProtTrans\u003c/h1\u003e\n\u003cbr/\u003e\n\n\u003cbr/\u003e\n\n[ProtTrans](https://github.com/agemagician/ProtTrans/) is providing **state of the art pre-trained models for proteins**. ProtTrans was trained on **thousands of GPUs from Summit** and **hundreds of Google TPUs** using various **Transformer models**.\n\nHave a look at our paper [ProtTrans: cracking the language of life’s code through self-supervised deep learning and high performance computing](https://doi.org/10.1109/TPAMI.2021.3095381) for more information about our work. \n\n\u003cbr/\u003e\n\u003cp align=\"center\"\u003e\n    \u003cimg width=\"70%\" src=\"https://github.com/agemagician/ProtTrans/raw/master/images/transformers_attention.png\" alt=\"ProtTrans Attention Visualization\"\u003e\n\u003c/p\u003e\n\u003cbr/\u003e\n\n\nThis repository will be updated regulary with **new pre-trained models for proteins** as part of supporting **bioinformatics** community in general, and **Covid-19 research** specifically through our [Accelerate SARS-CoV-2 research with transfer learning using pre-trained language modeling models](https://covid19-hpc-consortium.org/projects/5ed56e51a21132007ebf57bf) project.\n\nTable of Contents\n=================\n* [ ⌛️\u0026nbsp; News](#news)\n* [ 🚀\u0026nbsp; Installation](#install)\n* [ 🚀\u0026nbsp; Quick Start](#quick)\n* [ ⌛️\u0026nbsp; Models Availability](#models)\n* [ ⌛️\u0026nbsp; Dataset Availability](#datasets)\n* [ 🚀\u0026nbsp; Usage ](#usage)\n  * [ 🧬\u0026nbsp; Feature Extraction (FE)](#feature-extraction)\n  * [ 🚀\u0026nbsp; Logits extraction](#logits-extraction)\n  * [ 💥\u0026nbsp; Fine Tuning (FT)](#fine-tuning)\n  * [ 🧠\u0026nbsp; Prediction](#prediction)\n  * [ ⚗️\u0026nbsp; Protein Sequences Generation ](#protein-generation)\n  * [ 🧐\u0026nbsp; Visualization ](#visualization)\n  * [ 📈\u0026nbsp; Benchmark ](#benchmark)\n* [ 📊\u0026nbsp; Original downstream Predictions  ](#results)\n* [ 📊\u0026nbsp; Followup use-cases  ](#inaction)\n* [ 📊\u0026nbsp; Comparisons to other tools ](#comparison)\n* [ ❤️\u0026nbsp; Community and Contributions ](#community)\n* [ 📫\u0026nbsp; Have a question? ](#question)\n* [ 🤝\u0026nbsp; Found a bug? ](#bug)\n* [ ✅\u0026nbsp; Requirements ](#requirements)\n* [ 🤵\u0026nbsp; Team ](#team)\n* [ 💰\u0026nbsp; Sponsors ](#sponsors)\n* [ 📘\u0026nbsp; License ](#license)\n* [ ✏️\u0026nbsp; Citation ](#citation)\n\n\n\u003ca name=\"news\"\u003e\u003c/a\u003e\n## ⌛️\u0026nbsp; News\n* **2023/07/14: [FineTuning with LoRA]( https://github.com/agemagician/ProtTrans/tree/master/Fine-Tuning) provides a notebooks on how to fine-tune ProtT5 on both, per-residue and per-protein tasks, using Low-Rank Adaptation (LoRA) for efficient finetuning (thanks @0syrys !).**\n* 2022/11/18: Availability: [LambdaPP](https://embed.predictprotein.org/) offers a simple web-service to access ProtT5-based predictions and UniProt now offers to download [pre-computed ProtT5 embeddings](https://www.uniprot.org/help/embeddings) for a subset of selected organisms. \n\n\u003ca name=\"install\"\u003e\u003c/a\u003e\n## 🚀\u0026nbsp; Installation\nAll our models are available via huggingface/transformers:\n```console\npip install torch\npip install transformers\npip install sentencepiece\n```\nFor more details, please follow the instructions for [transformers installations](https://huggingface.co/docs/transformers/installation).\n\nA recently introduced [change in the T5-tokenizer](https://github.com/huggingface/transformers/pull/24565) results in `UnboundLocalError: cannot access local variable 'sentencepiece_model_pb2` and can either be fixed by installing [this PR](https://github.com/huggingface/transformers/pull/25684) or by manually installing:\n```console\npip install protobuf\n```\nIf you are using a transformer version after [this PR](https://github.com/huggingface/transformers/pull/24565), you will see [this warning](https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/tokenization_t5.py#L167).\nExplicitly setting `legacy=True` will result in expected behavor and will avoid the warning. You can also safely ignore the warning as `legacy=True` is [the default](https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/tokenization_t5.py#L175).\n\n\u003ca name=\"quick\"\u003e\u003c/a\u003e\n## 🚀\u0026nbsp; Quick Start\nExample for how to derive embeddings from our best-performing protein language model, ProtT5-XL-U50 (aka ProtT5); also available as [colab](https://colab.research.google.com/drive/1h7F5v5xkE_ly-1bTQSu-1xaLtTP2TnLF?usp=sharing):\n```python\nfrom transformers import T5Tokenizer, T5EncoderModel\nimport torch\nimport re\n\ndevice = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')\n\n# Load the tokenizer\ntokenizer = T5Tokenizer.from_pretrained('Rostlab/prot_t5_xl_half_uniref50-enc', do_lower_case=False)\n\n# Load the model\nmodel = T5EncoderModel.from_pretrained(\"Rostlab/prot_t5_xl_half_uniref50-enc\").to(device)\n\n# only GPUs support half-precision currently; if you want to run on CPU use full-precision (not recommended, much slower)\nmodel.to(torch.float32) if device==torch.device(\"cpu\")\n\n# prepare your protein sequences as a list\nsequence_examples = [\"PRTEINO\", \"SEQWENCE\"]\n\n# replace all rare/ambiguous amino acids by X and introduce white-space between all amino acids\nsequence_examples = [\" \".join(list(re.sub(r\"[UZOB]\", \"X\", sequence))) for sequence in sequence_examples]\n\n# tokenize sequences and pad up to the longest sequence in the batch\nids = tokenizer(sequence_examples, add_special_tokens=True, padding=\"longest\")\n\ninput_ids = torch.tensor(ids['input_ids']).to(device)\nattention_mask = torch.tensor(ids['attention_mask']).to(device)\n\n# generate embeddings\nwith torch.no_grad():\n    embedding_repr = model(input_ids=input_ids, attention_mask=attention_mask)\n\n# extract residue embeddings for the first ([0,:]) sequence in the batch and remove padded \u0026 special tokens ([0,:7]) \nemb_0 = embedding_repr.last_hidden_state[0,:7] # shape (7 x 1024)\n# same for the second ([1,:]) sequence but taking into account different sequence lengths ([1,:8])\nemb_1 = embedding_repr.last_hidden_state[1,:8] # shape (8 x 1024)\n\n# if you want to derive a single representation (per-protein embedding) for the whole protein\nemb_0_per_protein = emb_0.mean(dim=0) # shape (1024)\n```\n\n\nWe also have a [script](https://github.com/agemagician/ProtTrans/blob/master/Embedding/prott5_embedder.py) which simplifies deriving per-residue and per-protein embeddings from ProtT5 for a given FASTA file:\n```\npython prott5_embedder.py --input sequences/some.fasta --output embeddings/residue_embeddings.h5\npython prott5_embedder.py --input sequences/some.fasta --output embeddings/protein_embeddings.h5 --per_protein 1\n```\n\n\u003ca name=\"models\"\u003e\u003c/a\u003e\n## ⌛️\u0026nbsp; Models Availability\n\n|          Model                |                              Hugging Face                                  |                         Zenodo                | Colab |\n| ----------------------------- | :------------------------------------------------------------------------: |:---------------------------------------------:|---------------------------------------------:|\n| ProtT5-XL-UniRef50 (also **ProtT5-XL-U50**)            |  [Download](https://huggingface.co/Rostlab/prot_t5_xl_uniref50/tree/main)  | [Download](https://zenodo.org/record/4644188) | [**Colab**](https://colab.research.google.com/drive/1TUj-ayG3WO52n5N50S7KH9vtt6zRkdmj?usp=sharing)|\n| ProtT5-XL-BFD                 |  [Download](https://huggingface.co/Rostlab/prot_t5_xl_bfd/tree/main)       | [Download](https://zenodo.org/record/4633924) |\n| ProtT5-XXL-UniRef50           |  [Download](https://huggingface.co/Rostlab/prot_t5_xxl_uniref50/tree/main) | [Download](https://zenodo.org/record/4652717) |\n| ProtT5-XXL-BFD                |  [Download](https://huggingface.co/Rostlab/prot_t5_xxl_bfd/tree/main)      | [Download](https://zenodo.org/record/4635302) |\n| ProtBert-BFD                  |  [Download](https://huggingface.co/Rostlab/prot_bert_bfd/tree/main)        | [Download](https://zenodo.org/record/4633647) |\n| ProtBert                      |  [Download](https://huggingface.co/Rostlab/prot_bert/tree/main)            | [Download](https://zenodo.org/record/4633691) |\n| ProtAlbert                    |  [Download](https://huggingface.co/Rostlab/prot_albert/tree/main)          | [Download](https://zenodo.org/record/4633687) |\n| ProtXLNet                     |  [Download](https://huggingface.co/Rostlab/prot_xlnet/tree/main)           | [Download](https://zenodo.org/record/4633987) |\n| ProtElectra-Generator-BFD     |  [Download](https://huggingface.co/Rostlab/prot_electra_generator_bfd/tree/main)           | [Download](https://zenodo.org/record/4633813) |\n| ProtElectra-Discriminator-BFD |  [Download](https://huggingface.co/Rostlab/prot_electra_discriminator_bfd/tree/main)           | [Download](https://zenodo.org/record/4633717) |\n\n\n\u003ca name=\"datasets\"\u003e\u003c/a\u003e\n## ⌛️\u0026nbsp; Datasets Availability\n|          Dataset              |                                    Dropbox                                    |  \n| ----------------------------- | :---------------------------------------------------------------------------: |\n|\tNEW364\t\t\t|      [Download](https://www.dropbox.com/s/g49lb352ij4cnt7/NEW364.csv?dl=1)    |\n|\tNetsurfp2       \t| [Download](https://www.dropbox.com/s/98hovta9qjmmiby/Train_HHblits.csv?dl=1)  |\n|\tCASP12\t\t\t| [Download](https://www.dropbox.com/s/te0vn0t7ocdkra7/CASP12_HHblits.csv?dl=1) |\n|\tCB513\t\t\t| [Download](https://www.dropbox.com/s/9mat2fqqkcvdr67/CB513_HHblits.csv?dl=1) |\n|\tTS115\t\t\t| [Download](https://www.dropbox.com/s/68pknljl9la8ax3/TS115_HHblits.csv?dl=1) |\n|\tDeepLoc Train\t\t| [Download](https://www.dropbox.com/s/vgdqcl4vzqm9as0/deeploc_per_protein_train.csv?dl=1) |\n|\tDeepLoc Test\t\t| [Download](https://www.dropbox.com/s/jfzuokrym7nflkp/deeploc_per_protein_test.csv?dl=1) |\n\n\u003ca name=\"usage\"\u003e\u003c/a\u003e\n## 🚀\u0026nbsp; Usage  \n\nHow to use ProtTrans:\n\n\u003ca name=\"feature-extraction\"\u003e\u003c/a\u003e\n * \u003cb\u003e🧬\u0026nbsp; Feature Extraction (FE):\u003c/b\u003e\u003cbr/\u003e\n Please check:\n [Embedding Section](https://github.com/agemagician/ProtTrans/tree/master/Embedding). [Colab](https://colab.research.google.com/drive/1TUj-ayG3WO52n5N50S7KH9vtt6zRkdmj?usp=sharing) example for feature extraction via ProtT5-XL-U50 \n\n\u003ca name=\"logits-extraction\"\u003e\u003c/a\u003e\n * \u003cb\u003e🚀\u0026nbsp; Logits Extraction:\u003c/b\u003e\u003cbr/\u003e\n For ProtT5-logits extraction, please check:\n [VESPA logits script](https://github.com/Rostlab/VESPA#step-3-log-odds-ratio-of-masked-marginal-probabilities). \n\n\u003ca name=\"fine-tuning\"\u003e\u003c/a\u003e\n * \u003cb\u003e💥\u0026nbsp; Fine Tuning (FT):\u003c/b\u003e\u003cbr/\u003e\n Please check:\n [Fine Tuning Section](https://github.com/agemagician/ProtTrans/tree/master/Fine-Tuning). More information coming soon.\n\n\u003ca name=\"prediction\"\u003e\u003c/a\u003e\n * \u003cb\u003e🧠\u0026nbsp; Prediction:\u003c/b\u003e\u003cbr/\u003e\n Please check:\n [Prediction Section](https://github.com/agemagician/ProtTrans/tree/master/Prediction). [Colab](https://colab.research.google.com/drive/1TUj-ayG3WO52n5N50S7KH9vtt6zRkdmj?usp=sharing) example for secondary structure prediction via ProtT5-XL-U50 and [Colab](https://colab.research.google.com/drive/1W5fI20eKLtHpaeeGDcKuXsgeiwujeczX?usp=sharing) example for subcellular localization prediction as well as differentiation between membrane-bound and water-soluble proteins via ProtT5-XL-U50.\n  \n\u003ca name=\"protein-generation\"\u003e\u003c/a\u003e\n * \u003cb\u003e⚗️\u0026nbsp; Protein Sequences Generation:\u003c/b\u003e\u003cbr/\u003e\n Please check:\n [Generate Section](https://github.com/agemagician/ProtTrans/tree/master/Generate). More information coming soon.\n \n\u003ca name=\"visualization\"\u003e\u003c/a\u003e\n* \u003cb\u003e🧐\u0026nbsp; Visualization:\u003c/b\u003e\u003cbr/\u003e \nPlease check:\n [Visualization Section](https://github.com/agemagician/ProtTrans/tree/master/Visualization). More information coming soon.\n \n\u003ca name=\"benchmark\"\u003e\u003c/a\u003e\n* \u003cb\u003e📈\u0026nbsp; Benchmark:\u003c/b\u003e\u003cbr/\u003e \nPlease check:\n [Benchmark Section](https://github.com/agemagician/ProtTrans/tree/master/Benchmark). More information coming soon.\n\n\u003ca name=\"results\"\u003e\u003c/a\u003e\n## 📊\u0026nbsp; Original downstream Predictions \n\n\u003ca name=\"q3\"\u003e\u003c/a\u003e\n * \u003cb\u003e🧬\u0026nbsp; Secondary Structure Prediction (Q3):\u003c/b\u003e\u003cbr/\u003e\n \n|          Model             |       CASP12       |       TS115      |       CB513      |\n| -------------------------- | :----------------: | :-------------:  | :-------------:  |\n| ProtT5-XL-UniRef50         |         81         |        87        |        86        |\n| ProtT5-XL-BFD              |         77         |        85        |        84        |\n| ProtT5-XXL-UniRef50        |         79         |        86        |        85        |\n| ProtT5-XXL-BFD             |         78         |        85        |        83        |\n| ProtBert-BFD               |         76         |        84        |        83        |\n| ProtBert                   |         75         |        83        |        81        |\n| ProtAlbert                 |         74         |        82        |        79        |\n| ProtXLNet                  |         73         |        81        |        78        |\n| ProtElectra-Generator      |         73         |        78        |        76        |\n| ProtElectra-Discriminator  |         74         |        81        |        79        |\n| ProtTXL                    |         71         |        76        |        74        |\n| ProtTXL-BFD                |         72         |        75        |        77        |\n\n🆕 Predict your sequence live on [predictprotein.org](https://predictprotein.org).\n\n\u003ca name=\"q8\"\u003e\u003c/a\u003e\n * \u003cb\u003e🧬\u0026nbsp; Secondary Structure Prediction (Q8):\u003c/b\u003e\u003cbr/\u003e\n \n|          Model             |       CASP12       |       TS115      |       CB513      |\n| -------------------------- | :----------------: | :-------------:  | :-------------:  |\n| ProtT5-XL-UniRef50         |         70         |        77        |        74        |\n| ProtT5-XL-BFD              |         66         |        74        |        71        |\n| ProtT5-XXL-UniRef50        |         68         |        75        |        72        |\n| ProtT5-XXL-BFD             |         66         |        73        |        70        |\n| ProtBert-BFD               |         65         |        73        |        70        |\n| ProtBert                   |         63         |        72        |        66        |\n| ProtAlbert                 |         62         |        70        |        65        |\n| ProtXLNet                  |         62         |        69        |        63        |\n| ProtElectra-Generator      |         60         |        66        |        61        |\n| ProtElectra-Discriminator  |         62         |        69        |        65        |\n| ProtTXL                    |         59         |        64        |        59        |\n| ProtTXL-BFD                |         60         |        65        |        60        |\n\n🆕 Predict your sequence live on [predictprotein.org](https://predictprotein.org).\n\n\u003ca name=\"q2\"\u003e\u003c/a\u003e\n * \u003cb\u003e🧬\u0026nbsp; Membrane-bound vs Water-soluble (Q2):\u003c/b\u003e\u003cbr/\u003e\n \n|          Model             |    DeepLoc         |\n| -------------------------- | :----------------: |\n| ProtT5-XL-UniRef50         |         91         |\n| ProtT5-XL-BFD              |         91         |\n| ProtT5-XXL-UniRef50        |         89         |\n| ProtT5-XXL-BFD             |         90         |\n| ProtBert-BFD               |         89         |\n| ProtBert                   |         89         |\n| ProtAlbert                 |         88         |\n| ProtXLNet                  |         87         |\n| ProtElectra-Generator      |         85         |\n| ProtElectra-Discriminator  |         86         |\n| ProtTXL                    |         85         |\n| ProtTXL-BFD                |         86         |\n\n\n\u003ca name=\"q10\"\u003e\u003c/a\u003e\n * \u003cb\u003e🧬\u0026nbsp; Subcellular Localization (Q10):\u003c/b\u003e\u003cbr/\u003e\n \n|          Model             |    DeepLoc         |\n| -------------------------- | :----------------: |\n| ProtT5-XL-UniRef50         |         81         |\n| ProtT5-XL-BFD              |         77         |\n| ProtT5-XXL-UniRef50        |         79         |\n| ProtT5-XXL-BFD             |         77         |\n| ProtBert-BFD               |         74         |\n| ProtBert                   |         74         |\n| ProtAlbert                 |         74         |\n| ProtXLNet                  |         68         |\n| ProtElectra-Generator      |         59         |\n| ProtElectra-Discriminator  |         70         |\n| ProtTXL                    |         66         |\n| ProtTXL-BFD                |         65         |\n\n\n\u003ca name=\"inaction\"\u003e\u003c/a\u003e\n## 📊\u0026nbsp; Use-cases \n| Level | Type  | Tool |  Task | Manuscript | Webserver |\n| ----- |  ---- | -- | -- | -- | -- |\n| Protein | Function | Light Attention | Subcellular localization | [Light attention predicts protein location from the language of life](https://doi.org/10.1093/bioadv/vbab035) | ([Web-server](https://embed.protein.properties/)) |\n| Residue | Function | bindEmbed21 | Binding Residues | [Protein embeddings and deep learning predict binding residues for various ligand classes](https://www.nature.com/articles/s41598-021-03431-4) | (Coming soon)  |\n| Residue | Function | VESPA           | Conservation \u0026 effect of Single Amino Acid Variants (SAVs) | [Embeddings from protein language models predict conservation and variant effects](https://rdcu.be/cD7q5) | (coming soon) |\n| Protein | Structure | ProtTucker      | Protein 3D structure similarity prediction                 | [Contrastive learning on protein embeddings enlightens midnight zone at lightning speed](https://www.biorxiv.org/content/10.1101/2021.11.14.468528v2) |  |\n| Residue | Structure | ProtT5dst       | Protein 3D structure prediction                            | [Protein language model embeddings for fast, accurate, alignment-free protein structure prediction](https://www.biorxiv.org/content/10.1101/2021.07.31.454572v1.abstract) |  |\n\n\u003ca name=\"comparison\"\u003e\u003c/a\u003e\n## 📊\u0026nbsp; Comparison to other protein language models (pLMs)\nWhile developing the [use-cases](#inaction), we compared ProtTrans models to other protein language models, for instance the [ESM](https://github.com/facebookresearch/esm) models. To focus on the effect of changing input representaitons, the following comparisons use the same architectures on top on different embedding inputs.\n\n|          Task/Model             |  ProtBERT-BFD      | ProtT5-XL-U50    |       ESM-1b    |       ESM-1v      | Metric | Reference |\n| -------------------------- | :--------------:   | :--------------: | :-----------:   | :-----------:  | :-----------: | :-----------: |\n| Subcell. loc. (setDeepLoc) |  80    | \u003cb\u003e86\u003c/b\u003e    |   83        |    -         | Accuracy |  [Light-attention](https://academic.oup.com/view-large/figure/321379865/vbab035f2.tif) |\n| Subcell. loc. (setHard)    |  58    | \u003cb\u003e65\u003c/b\u003e    |   62        |    -         | Accuracy |  [Light-attention](https://academic.oup.com/view-large/figure/321379865/vbab035f2.tif) |\n| Conservation (ConSurf-DB)  |  0.540 | \u003cb\u003e0.596\u003c/b\u003e |   0.563     |    -         | MCC      | [ConsEmb](https://rdcu.be/cD7q5) | \n| Variant effect (DMS-data)  |  -     | \u003cb\u003e0.53\u003c/b\u003e  |   -         |    0.49      | Spearman (Mean) | [VESPA](https://rdcu.be/cD7q5) |\n| Variant effect (DMS-data)  |  -     | \u003cb\u003e0.53\u003c/b\u003e  |   -         | \u003cb\u003e0.53\u003c/b\u003e  | Spearman (Median) | [VESPA](https://rdcu.be/cD7q5) |\n| CATH superfamily (unsup.)  |  18    | \u003cb\u003e64\u003c/b\u003e    |   57        |    -         | Accuracy | [ProtTucker](https://www.biorxiv.org/content/10.1101/2021.11.14.468528v1) |\n| CATH superfamily (sup.)    |  39    | \u003cb\u003e76\u003c/b\u003e    |   70        |    -         | Accuracy | [ProtTucker](https://www.biorxiv.org/content/10.1101/2021.11.14.468528v1) |\n| Binding residues           |  -     | \u003cb\u003e39\u003c/b\u003e    |   32        |    -        | F1 | [bindEmbed21](https://www.nature.com/articles/s41598-021-03431-4) |\n\nImportant note on ProtT5-XL-UniRef50 (dubbed ProtT5-XL-U50): all performances were measured using only embeddings extracted from the encoder-side of the underlying T5 model as described [here](https://github.com/agemagician/ProtTrans/blob/master/Embedding/PyTorch/Advanced/ProtT5-XL-UniRef50.ipynb). Also, experiments were ran in half-precision mode (model.half()), to speed-up embedding generation. No performance degradation could be observed in any of the experiments when running in half-precision.\n\n\u003ca name=\"community\"\u003e\u003c/a\u003e\n## ❤️\u0026nbsp; Community and Contributions\n\nThe ProtTrans project is a **open source project** supported by various partner companies and research institutions. We are committed to **share all our pre-trained models and knowledge**. We are more than happy if you could help us on sharing new ptrained models, fixing bugs, proposing new feature, improving our documentation, spreading the word, or support our project.\n\n\u003ca name=\"question\"\u003e\u003c/a\u003e\n## 📫\u0026nbsp; Have a question?\n\nWe are happy to hear your question in our issues page [ProtTrans](https://github.com/agemagician/ProtTrans/issues)! Obviously if you have a private question or want to cooperate with us, you can always **reach out to us directly** via our [RostLab email](mailto:assistant@rostlab.org?subject=[GitHub]ProtTrans) \n\n\u003ca name=\"bug\"\u003e\u003c/a\u003e\n## 🤝\u0026nbsp; Found a bug?\n\nFeel free to **file a new issue** with a respective title and description on the the [ProtTrans](https://github.com/agemagician/ProtTrans/issues) repository. If you already found a solution to your problem, **we would love to review your pull request**!.\n\n\u003ca name=\"requirements\"\u003e\u003c/a\u003e\n## ✅\u0026nbsp; Requirements\n\nFor protein feature extraction or fine-tuninng our pre-trained models, [Pytorch](https://github.com/pytorch/pytorch) and [Transformers](https://github.com/huggingface/transformers) library from huggingface is needed. For model visualization, you need to install [BertViz](https://github.com/jessevig/bertviz) library.\n\n\u003ca name=\"team\"\u003e\u003c/a\u003e\n## 🤵\u0026nbsp; Team\n\n * \u003cb\u003eTechnical University of Munich:\u003c/b\u003e\u003cbr/\u003e\n \n| Ahmed Elnaggar       |      Michael Heinzinger  |  Christian Dallago | Ghalia Rehawi | Burkhard Rost |\n|:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|\n| \u003cimg width=120/ src=\"https://github.com/agemagician/ProtTrans/blob/master/images/ElnaggarAhmend.jpg?raw=true\"\u003e | \u003cimg width=120/ src=\"https://github.com/agemagician/ProtTrans/blob/master/images/MichaelHeinzinger-2.jpg?raw=true\"\u003e | \u003cimg width=120/ src=\"https://github.com/agemagician/ProtTrans/blob/master/images/christiandallago.png?raw=true\"\u003e | \u003cimg width=120/ src=\"https://github.com/agemagician/ProtTrans/blob/master/images/female.png?raw=true\"\u003e | \u003cimg width=120/ src=\"https://github.com/agemagician/ProtTrans/blob/master/images/B.Rost.jpg?raw=true\"\u003e |\n\n * \u003cb\u003eMed AI Technology:\u003c/b\u003e\u003cbr/\u003e\n\n| Yu Wang       |\n|:-------------------------:|\n| \u003cimg width=120/ src=\"https://github.com/agemagician/ProtTrans/blob/master/images/yu-wang.jpeg?raw=true\"\u003e |\n\n* \u003cb\u003eGoogle:\u003c/b\u003e\u003cbr/\u003e\n\n| Llion Jones       |\n|:-------------------------:|\n| \u003cimg width=120/ src=\"https://github.com/agemagician/ProtTrans/blob/master/images/Llion-Jones.jpg?raw=true\"\u003e |\n\n* \u003cb\u003eNvidia:\u003c/b\u003e\u003cbr/\u003e\n\n| Tom Gibbs       | Tamas Feher | Christoph Angerer |\n|:-------------------------:|:-------------------------:|:-------------------------:|\n| \u003cimg width=120/ src=\"https://github.com/agemagician/ProtTrans/blob/master/images/Tom-Gibbs.png?raw=true\"\u003e | \u003cimg width=120/ src=\"https://github.com/agemagician/ProtTrans/blob/master/images/Tamas-Feher.jpeg?raw=true\"\u003e | \u003cimg width=120/ src=\"https://github.com/agemagician/ProtTrans/blob/master/images/Christoph-Angerer.jpg?raw=true\"\u003e |\n\n* \u003cb\u003eSeoul National University:\u003c/b\u003e\u003cbr/\u003e\n\n| Martin Steinegger       |\n|:-------------------------:|\n| \u003cimg width=120/ src=\"https://github.com/agemagician/ProtTrans/raw/master/images/Martin-Steinegger.png\"\u003e |\n\n\n* \u003cb\u003eORNL:\u003c/b\u003e\u003cbr/\u003e\n\n| Debsindhu Bhowmik       |\n|:-------------------------:|\n| \u003cimg width=120/ src=\"https://github.com/agemagician/ProtTrans/blob/master/images/Debsindhu-Bhowmik.jpg?raw=true\"\u003e |\n\n\u003ca name=\"sponsors\"\u003e\u003c/a\u003e\n## 💰\u0026nbsp; Sponsors\n\n\u003c!--\n\u003cdiv id=\"banner\" style=\"overflow: hidden;justify-content:space-around;display:table-cell; vertical-align:middle; text-align:center\"\u003e\n  \u003cdiv class=\"\" style=\"max-width: 20%;max-height: 20%;display: inline-block;\"\u003e\n      \u003cimg width=\"14%\" src=\"https://github.com/agemagician/ProtTrans/blob/master/images/1200px-Nvidia_image_logo.svg.png?raw=true\" alt=\"nvidia logo\"\u003e\n  \u003c/div\u003e\n\n  \u003cdiv class=\"\" style=\"max-width: 20%;max-height: 20%;display: inline-block;\"\u003e\n      \u003cimg width=\"22%\" src=\"https://github.com/agemagician/ProtTrans/blob/master/images/Google-Logo.jpg?raw=true\" alt=\"google cloud logo\"\u003e\n  \u003c/div\u003e\n\n  \u003cdiv class=\"\" style=\"max-width: 20%;max-height: 20%;display: inline-block;\"\u003e\n      \u003cimg width=\"20%\" src=\"https://github.com/agemagician/ProtTrans/blob/master/images/Oak_Ridge_National_Laboratory_logo.svg.png?raw=true\" alt=\"ornl logo\"\u003e\n  \u003c/div\u003e\n  \n  \u003cdiv class=\"\" style=\"max-width: 20%;max-height: 20%;display: inline-block;\"\u003e\n      \u003cimg width=\"12%\" src=\"https://github.com/agemagician/ProtTrans/blob/master/images/SOFTWARE_CAMPUS_logo_cmyk.jpg?raw=true\" alt=\"software campus logo\"\u003e\n  \u003c/div\u003e\n  \n\u003c/div\u003e\n--\u003e\n\nNvidia       |      Google  |      Google  | ORNL | Software Campus\n:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:\n![](https://github.com/agemagician/ProtTrans/blob/master/images/1200px-Nvidia_image_logo.svg.png?raw=true) | ![](https://github.com/agemagician/ProtTrans/blob/master/images/google-cloud-logo.jpg?raw=true) | ![](https://github.com/agemagician/ProtTrans/blob/master/images/tfrc.png?raw=true) | ![](https://github.com/agemagician/ProtTrans/blob/master/images/Oak_Ridge_National_Laboratory_logo.svg.png?raw=true) | ![](https://github.com/agemagician/ProtTrans/blob/master/images/SOFTWARE_CAMPUS_logo_cmyk.jpg?raw=true)\n\n\u003ca name=\"license\"\u003e\u003c/a\u003e\n## 📘\u0026nbsp; License\nThe ProtTrans pretrained models are released under the under terms of the [Academic Free License v3.0 License](https://choosealicense.com/licenses/afl-3.0/).\n\n\u003ca name=\"citation\"\u003e\u003c/a\u003e\n## ✏️\u0026nbsp; Citation\nIf you use this code or our pretrained models for your publication, please cite the original paper:\n```\n@ARTICLE\n{9477085,\nauthor={Elnaggar, Ahmed and Heinzinger, Michael and Dallago, Christian and Rehawi, Ghalia and Yu, Wang and Jones, Llion and Gibbs, Tom and Feher, Tamas and Angerer, Christoph and Steinegger, Martin and Bhowmik, Debsindhu and Rost, Burkhard},\njournal={IEEE Transactions on Pattern Analysis and Machine Intelligence},\ntitle={ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing},\nyear={2021},\nvolume={},\nnumber={},\npages={1-1},\ndoi={10.1109/TPAMI.2021.3095381}}\n```\n","funding_links":[],"categories":["Libraries on Molecule AI","By Methodology","General protein language models","蛋白质结构","📊 Sequence Analysis \u0026 Language Models","Machine Learning Tasks and Models"],"sub_categories":["3D","Deep Learning \u0026 Protein Language Models","网络服务_其他","Protein Language Models","Foundation Models"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagemagician%2FProtTrans","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fagemagician%2FProtTrans","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagemagician%2FProtTrans/lists"}