{"id":15601064,"url":"https://github.com/lucidrains/protein-bert-pytorch","last_synced_at":"2025-04-09T21:23:37.191Z","repository":{"id":48110551,"uuid":"371130526","full_name":"lucidrains/protein-bert-pytorch","owner":"lucidrains","description":"Implementation of ProteinBERT in Pytorch","archived":false,"fork":false,"pushed_at":"2021-08-10T18:27:38.000Z","size":38,"stargazers_count":157,"open_issues_count":2,"forks_count":22,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-02T20:11:13.792Z","etag":null,"topics":["artificial-intelligence","deep-learning","protein-sequences","unsupervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lucidrains.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-05-26T18:19:08.000Z","updated_at":"2025-03-19T08:01:52.000Z","dependencies_parsed_at":"2022-08-12T18:50:21.847Z","dependency_job_id":null,"html_url":"https://github.com/lucidrains/protein-bert-pytorch","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fprotein-bert-pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fprotein-bert-pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fprotein-bert-pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fprotein-bert-pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lucidrains","download_url":"https://codeload.github.com/lucidrains/protein-bert-pytorch/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248113147,"owners_count":21049791,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","deep-learning","protein-sequences","unsupervised-learning"],"created_at":"2024-10-03T02:13:36.529Z","updated_at":"2025-04-09T21:23:37.166Z","avatar_url":"https://github.com/lucidrains.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## ProteinBERT - Pytorch (wip)\n\nImplementation of \u003ca href=\"https://www.biorxiv.org/content/10.1101/2021.05.24.445464v1\"\u003eProteinBERT\u003c/a\u003e in Pytorch.\n\n\u003ca href=\"https://github.com/nadavbra/protein_bert\"\u003eOriginal Repository\u003c/a\u003e\n\n## Install\n\n```bash\n$ pip install protein-bert-pytorch\n```\n\n## Usage\n\n```python\nimport torch\nfrom protein_bert_pytorch import ProteinBERT\n\nmodel = ProteinBERT(\n    num_tokens = 21,\n    num_annotation = 8943,\n    dim = 512,\n    dim_global = 256,\n    depth = 6,\n    narrow_conv_kernel = 9,\n    wide_conv_kernel = 9,\n    wide_conv_dilation = 5,\n    attn_heads = 8,\n    attn_dim_head = 64\n)\n\nseq = torch.randint(0, 21, (2, 2048))\nmask = torch.ones(2, 2048).bool()\nannotation = torch.randint(0, 1, (2, 8943)).float()\n\nseq_logits, annotation_logits = model(seq, annotation, mask = mask) # (2, 2048, 21), (2, 8943)\n```\n\nTo use for pretraining\n\n```python\nimport torch\nfrom protein_bert_pytorch import ProteinBERT, PretrainingWrapper\n\nmodel = ProteinBERT(\n    num_tokens = 21,\n    num_annotation = 8943,\n    dim = 512,\n    dim_global = 256,\n    depth = 6,\n    narrow_conv_kernel = 9,\n    wide_conv_kernel = 9,\n    wide_conv_dilation = 5,\n    attn_heads = 8,\n    attn_dim_head = 64,\n    local_to_global_attn = False,\n    local_self_attn = True,\n    num_global_tokens = 2,\n    glu_conv = False\n)\n\nlearner = PretrainingWrapper(\n    model,\n    random_replace_token_prob = 0.05,    # what percentage of the tokens to replace with a random one, defaults to 5% as in paper\n    remove_annotation_prob = 0.25,       # what percentage of annotations to remove, defaults to 25%\n    add_annotation_prob = 0.01,          # probability to add an annotation randomly, defaults to 1%\n    remove_all_annotations_prob = 0.5,   # what percentage of batch items to remove annotations for completely, defaults to 50%\n    seq_loss_weight = 1.,                # weight on loss of sequence\n    annotation_loss_weight = 1.,         # weight on loss of annotation\n    exclude_token_ids = (0, 1, 2)        # for excluding padding, start, and end tokens from being masked\n)\n\n# do the following in a loop for a lot of sequences and annotations\n\nseq        = torch.randint(0, 21, (2, 2048))\nannotation = torch.randint(0, 1, (2, 8943)).float()\nmask       = torch.ones(2, 2048).bool()\n\nloss = learner(seq, annotation, mask = mask) # (2, 2048, 21), (2, 8943)\nloss.backward()\n\n# save your model and evaluate it\n\ntorch.save(model, './improved-protein-bert.pt')\n```\n\n## Citations\n\n```bibtex\n@article {Brandes2021.05.24.445464,\n    author      = {Brandes, Nadav and Ofer, Dan and Peleg, Yam and Rappoport, Nadav and Linial, Michal},\n    title       = {ProteinBERT: A universal deep-learning model of protein sequence and function},\n    year        = {2021},\n    doi         = {10.1101/2021.05.24.445464},\n    publisher   = {Cold Spring Harbor Laboratory},\n    URL         = {https://www.biorxiv.org/content/early/2021/05/25/2021.05.24.445464},\n    eprint      = {https://www.biorxiv.org/content/early/2021/05/25/2021.05.24.445464.full.pdf},\n    journal     = {bioRxiv}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucidrains%2Fprotein-bert-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucidrains%2Fprotein-bert-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucidrains%2Fprotein-bert-pytorch/lists"}