{"id":13529930,"url":"https://github.com/lucidrains/alphafold2","last_synced_at":"2025-05-14T23:07:09.009Z","repository":{"id":39257266,"uuid":"317440833","full_name":"lucidrains/alphafold2","owner":"lucidrains","description":"To eventually become an unofficial Pytorch implementation / replication of Alphafold2, as details of the architecture get released","archived":false,"fork":false,"pushed_at":"2022-10-29T00:34:53.000Z","size":20253,"stargazers_count":1598,"open_issues_count":21,"forks_count":263,"subscribers_count":64,"default_branch":"main","last_synced_at":"2025-04-13T19:50:05.740Z","etag":null,"topics":["artificial-intelligence","attention-mechanism","deep-learning","protein-folding"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lucidrains.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-12-01T06:01:44.000Z","updated_at":"2025-03-31T03:35:24.000Z","dependencies_parsed_at":"2022-07-10T01:32:23.735Z","dependency_job_id":null,"html_url":"https://github.com/lucidrains/alphafold2","commit_stats":null,"previous_names":[],"tags_count":139,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Falphafold2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Falphafold2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Falphafold2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Falphafold2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lucidrains","download_url":"https://codeload.github.com/lucidrains/alphafold2/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254243362,"owners_count":22038046,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","attention-mechanism","deep-learning","protein-folding"],"created_at":"2024-08-01T07:00:40.921Z","updated_at":"2025-05-14T23:07:03.999Z","avatar_url":"https://github.com/lucidrains.png","language":"Python","funding_links":[],"categories":["Python","Uncategorized"],"sub_categories":["Uncategorized"],"readme":"\u003cimg src=\"./images/alphafold2.png\" width=\"600px\"\u003e\u003c/img\u003e\n\n## Alphafold2 - Pytorch (wip)\n\nTo eventually become an unofficial working Pytorch implementation of \u003ca href=\"https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology\"\u003eAlphafold2\u003c/a\u003e, the breathtaking attention network that solved CASP14. Will be gradually implemented as more details of the architecture is released.\n\nOnce this is replicated, I intend to fold all available amino acid sequences out there in-silico and release it as an academic torrent, to further science. If you are interested in replication efforts, please drop by #alphafold at this \u003ca href=\"https://discord.gg/GgDBFP8ZEt\"\u003eDiscord channel\u003c/a\u003e\n\nUpdate: Deepmind has open sourced the official \u003ca href=\"https://github.com/deepmind/alphafold\"\u003ecode\u003c/a\u003e in Jax, along with the weights 🙏! This repository will now be geared towards a straight pytorch translation with some improvements on positional encoding\n\n\u003ca href=\"https://www.youtube.com/watch?v=nGVFbPKrRWQ\"\u003eArxivInsights video\u003c/a\u003e\n\n## Install\n\n```bash\n$ pip install alphafold2-pytorch\n```\n\n## Status\n\n\u003ca href=\"https://github.com/lhatsk\"\u003elhatsk\u003c/a\u003e has reported training a modified trunk of this repository, using the same setup as trRosetta, with competitive results\n\n\u003cimg src=\"./images/axial_attention_vs_trrosetta.jpg\" width=\"400px\"\u003e\u003c/img\u003e\n\n`blue used the the trRosetta input (MSA -\u003e potts -\u003e axial attention), green used the ESM embedding (only sequence) -\u003e tiling -\u003e axial attention` - lhatsk\n\n## Usage\n\nPredicting distogram, like Alphafold-1, but with attention\n\n```python\nimport torch\nfrom alphafold2_pytorch import Alphafold2\n\nmodel = Alphafold2(\n    dim = 256,\n    depth = 2,\n    heads = 8,\n    dim_head = 64,\n    reversible = False  # set this to True for fully reversible self / cross attention for the trunk\n).cuda()\n\nseq = torch.randint(0, 21, (1, 128)).cuda()      # AA length of 128\nmsa = torch.randint(0, 21, (1, 5, 120)).cuda()   # MSA doesn't have to be the same length as primary sequence\nmask = torch.ones_like(seq).bool().cuda()\nmsa_mask = torch.ones_like(msa).bool().cuda()\n\ndistogram = model(\n    seq,\n    msa,\n    mask = mask,\n    msa_mask = msa_mask\n) # (1, 128, 128, 37)\n```\n\nYou can also turn on prediction for the angles, by passing a `predict_angles = True` on init. The below example would be equivalent to \u003ca href=\"https://github.com/lucidrains/tr-rosetta-pytorch\"\u003etrRosetta\u003c/a\u003e but with self / cross attention.\n\n```python\nimport torch\nfrom alphafold2_pytorch import Alphafold2\n\nmodel = Alphafold2(\n    dim = 256,\n    depth = 2,\n    heads = 8,\n    dim_head = 64,\n    predict_angles = True   # set this to True\n).cuda()\n\nseq = torch.randint(0, 21, (1, 128)).cuda()\nmsa = torch.randint(0, 21, (1, 5, 120)).cuda()\nmask = torch.ones_like(seq).bool().cuda()\nmsa_mask = torch.ones_like(msa).bool().cuda()\n\ndistogram, theta, phi, omega = model(\n    seq,\n    msa,\n    mask = mask,\n    msa_mask = msa_mask\n)\n\n# distogram - (1, 128, 128, 37),\n# theta     - (1, 128, 128, 25),\n# phi       - (1, 128, 128, 13),\n# omega     - (1, 128, 128, 25)\n```\n\n## Predicting Coordinates\n\nFabian's \u003ca href=\"https://arxiv.org/abs/2102.13419\"\u003erecent paper\u003c/a\u003e suggests iteratively feeding the coordinates back into SE3 Transformer, weight shared, may work. I have decided to execute based on this idea, even though it is still up in the air how it actually works.\n\nYou can also use \u003ca href=\"https://github.com/lucidrains/En-transformer\"\u003eE(n)-Transformer\u003c/a\u003e or \u003ca href=\"https://github.com/lucidrains/egnn-pytorch\"\u003eEGNN\u003c/a\u003e for structural refinement.\n\nUpdate: Baker's lab have shown that an end-to-end architecture from sequence and MSA embeddings to SE3 Transformers can best trRosetta and close the gap to Alphafold2. We will be using the \u003ca href=\"https://github.com/lucidrains/graph-transformer-pytorch\"\u003eGraph Transformer\u003c/a\u003e, which acts on the trunk embeddings, to generate the initial set of coordinates to be sent to the equivariant network. (This is further corroborated by Costa et al in their work teasing out 3d coordinates from MSA Transformer embeddings in a paper predating Baker lab's)\n\n```python\nimport torch\nfrom alphafold2_pytorch import Alphafold2\n\nmodel = Alphafold2(\n    dim = 256,\n    depth = 2,\n    heads = 8,\n    dim_head = 64,\n    predict_coords = True,\n    structure_module_type = 'se3',          # use SE3 Transformer - if set to False, will use E(n)-Transformer, Victor and Max Welling's new paper\n    structure_module_dim = 4,               # se3 transformer dimension\n    structure_module_depth = 1,             # depth\n    structure_module_heads = 1,             # heads\n    structure_module_dim_head = 16,         # dimension of heads\n    structure_module_refinement_iters = 2,  # number of equivariant coordinate refinement iterations\n    structure_num_global_nodes = 1          # number of global nodes for the structure module, only works with SE3 transformer\n).cuda()\n\nseq = torch.randint(0, 21, (2, 64)).cuda()\nmsa = torch.randint(0, 21, (2, 5, 60)).cuda()\nmask = torch.ones_like(seq).bool().cuda()\nmsa_mask = torch.ones_like(msa).bool().cuda()\n\ncoords = model(\n    seq,\n    msa,\n    mask = mask,\n    msa_mask = msa_mask\n) # (2, 64 * 3, 3)  \u003c-- 3 atoms per residue\n```\n\n## Atoms\n\nThe underlying assumption is that the trunk works on the residue level, and then constitutes to atomic level for the structure module, whether it be SE3 Transformers, E(n)-Transformer, or EGNN doing the refinement. This library defaults to the 3 backbone atoms (C, Ca, N), but you can configure it to include any other atom you like, including Cb and the sidechains.\n\n\n```python\nimport torch\nfrom alphafold2_pytorch import Alphafold2\n\nmodel = Alphafold2(\n    dim = 256,\n    depth = 2,\n    heads = 8,\n    dim_head = 64,\n    predict_coords = True,\n    atoms = 'backbone-with-cbeta'\n).cuda()\n\nseq = torch.randint(0, 21, (2, 64)).cuda()\nmsa = torch.randint(0, 21, (2, 5, 60)).cuda()\nmask = torch.ones_like(seq).bool().cuda()\nmsa_mask = torch.ones_like(msa).bool().cuda()\n\ncoords = model(\n    seq,\n    msa,\n    mask = mask,\n    msa_mask = msa_mask\n) # (2, 64 * 4, 3)  \u003c-- 4 atoms per residue (C, Ca, N, Cb)\n```\n\nValid choices for `atoms` include:\n\n- `backbone` - 3 backbone atoms (C, Ca, N) [default]\n- `backbone-with-cbeta` - 3 backbone atoms and C beta\n- `backbone-with-oxygen` - 3 backbone atoms and oxygen from carboxyl\n- `backbone-with-cbeta-and-oxygen` - 3 backbone atoms with C beta and oxygen\n- `all` - backbone and all other atoms from sidechain\n\nYou can also pass in a tensor of shape (14,) defining which atoms you would like to include\n\nex.\n\n```python\natoms = torch.tensor([1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1])\n```\n\n## MSA, ESM, or ProtTrans Embeddings\n\nThis repository offers you an easy supplement the network with pre-trained embeddings from \u003ca href=\"https://github.com/facebookresearch/esm\"\u003eFacebook AI\u003c/a\u003e. It contains wrappers for the pre-trained \u003ca href=\"https://www.biorxiv.org/content/10.1101/622803v1.full\"\u003eESM\u003c/a\u003e, \u003ca href=\"https://www.biorxiv.org/content/10.1101/2021.02.12.430858v1\"\u003eMSA Transformers\u003c/a\u003e or \u003ca href=\"https://www.biorxiv.org/content/early/2021/05/04/2020.07.12.199554\"\u003eProtein Transformer\u003c/a\u003e.\n\nThere are some prerequisites. You will need to make sure that you have Nvidia's \u003ca href=\"https://github.com/NVIDIA/apex#linux\"\u003eapex\u003c/a\u003e library installed, as the pretrained transformers make use of some fused operations.\n\nOr you can try running the script below\n\n```bash\ngit clone https://github.com/NVIDIA/apex\ncd apex\npip install -v --disable-pip-version-check --no-cache-dir --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" ./\n```\n\nNext, you will simply have to import and wrap your `Alphafold2` instance with a `ESMEmbedWrapper`, `MSAEmbedWrapper`, or `ProtTranEmbedWrapper` and it will take care of embedding both the sequence and the multiple-sequence alignments for you (and projecting it to the dimensions as specified on your model). Nothing needs to be changed save for adding the wrapper.\n\n```python\nimport torch\nfrom alphafold2_pytorch import Alphafold2\nfrom alphafold2_pytorch.embeds import MSAEmbedWrapper\n\nalphafold2 = Alphafold2(\n    dim = 256,\n    depth = 2,\n    heads = 8,\n    dim_head = 64\n)\n\nmodel = MSAEmbedWrapper(\n    alphafold2 = alphafold2\n).cuda()\n\nseq = torch.randint(0, 21, (2, 16)).cuda()\nmask = torch.ones_like(seq).bool().cuda()\n\nmsa = torch.randint(0, 21, (2, 5, 16)).cuda()\nmsa_mask = torch.ones_like(msa).bool().cuda()\n\ndistogram = model(\n    seq,\n    msa,\n    mask = mask,\n    msa_mask = msa_mask\n)\n```\n\nBy default, even if the wrapper supplies the trunk with the sequence and MSA embeddings, they would be summed with the usual token embeddings. If you want to train Alphafold2 without token embeddings (only rely on pretrained embeddings), you would need to set `disable_token_embed` to `True` on `Alphafold2` init.\n\n```python\nalphafold2 = Alphafold2(\n    dim = 256,\n    depth = 2,\n    heads = 8,\n    dim_head = 64,\n    disable_token_embed = True\n)\n```\n\n## Real-Value Distance Prediction\n\nA \u003ca href=\"https://www.biorxiv.org/content/10.1101/2020.11.26.400523v1.full.pdf\"\u003epaper\u003c/a\u003e by Jinbo Xu suggests that one doesn't need to bin the distances, and can instead predict the mean and standard deviation directly. You can use this by turning on one flag `predict_real_value_distances`, in which case, the distance prediction returned will have a dimension of `2` for the mean and standard deviation respectively.\n\nIf `predict_coords` is also turned on, then the MDS will accept the mean and standard deviation predictions directly without having to calculate that from the distogram bins.\n\n```python\nimport torch\nfrom alphafold2_pytorch import Alphafold2\n\nmodel = Alphafold2(\n    dim = 256,\n    depth = 2,\n    heads = 8,\n    dim_head = 64,\n    predict_coords = True,\n    predict_real_value_distances = True,      # set this to True\n    structure_module_type = 'se3',\n    structure_module_dim = 4,\n    structure_module_depth = 1,\n    structure_module_heads = 1,\n    structure_module_dim_head = 16,\n    structure_module_refinement_iters = 2\n).cuda()\n\nseq = torch.randint(0, 21, (2, 64)).cuda()\nmsa = torch.randint(0, 21, (2, 5, 60)).cuda()\nmask = torch.ones_like(seq).bool().cuda()\nmsa_mask = torch.ones_like(msa).bool().cuda()\n\ncoords = model(\n    seq,\n    msa,\n    mask = mask,\n    msa_mask = msa_mask\n) # (2, 64 * 3, 3)  \u003c-- 3 atoms per residue\n```\n\n## Convolutions\n\nYou can add convolutional blocks, for both the primary sequence as well as the MSA, by simply setting one extra keyword argument `use_conv = True`\n\n```python\nimport torch\nfrom alphafold2_pytorch import Alphafold2\n\nmodel = Alphafold2(\n    dim = 256,\n    depth = 2,\n    heads = 8,\n    dim_head = 64,\n    use_conv = True # set this to True\n).cuda()\n\nseq = torch.randint(0, 21, (1, 128)).cuda()\nmsa = torch.randint(0, 21, (1, 5, 120)).cuda()\nmask = torch.ones_like(seq).bool().cuda()\nmsa_mask = torch.ones_like(msa).bool().cuda()\n\ndistogram = model(\n    seq,\n    msa,\n    mask = mask,\n    msa_mask = msa_mask\n) # (1, 128, 128, 37)\n```\n\nThe convolutional kernels follow the lead of \u003ca href=\"https://www.biorxiv.org/content/early/2021/05/11/2021.05.10.443415\"\u003ethis paper\u003c/a\u003e, combining 1d and 2d kernels in one resnet-like block. You can fully customize the kernels as such.\n\n```python\nimport torch\nfrom alphafold2_pytorch import Alphafold2\n\nmodel = Alphafold2(\n    dim = 256,\n    depth = 2,\n    heads = 8,\n    dim_head = 64,\n    use_conv = True, # set this to True\n    conv_seq_kernels = ((9, 1), (1, 9), (3, 3)), # kernels for N x N primary sequence\n    conv_msa_kernels = ((1, 9), (3, 3)), # kernels for {num MSAs} x N MSAs\n).cuda()\n\nseq = torch.randint(0, 21, (1, 128)).cuda()\nmsa = torch.randint(0, 21, (1, 5, 120)).cuda()\nmask = torch.ones_like(seq).bool().cuda()\nmsa_mask = torch.ones_like(msa).bool().cuda()\n\ndistogram = model(\n    seq,\n    msa,\n    mask = mask,\n    msa_mask = msa_mask\n) # (1, 128, 128, 37)\n```\n\nYou can also do cycle dilation with one extra keyword argument. Default dilation is `1` for all layers.\n\n```python\nimport torch\nfrom alphafold2_pytorch import Alphafold2\n\nmodel = Alphafold2(\n    dim = 256,\n    depth = 2,\n    heads = 8,\n    dim_head = 64,\n    use_conv = True, # set this to True\n    dilations = (1, 3, 5) # cycle between dilations of 1, 3, 5\n).cuda()\n\nseq = torch.randint(0, 21, (1, 128)).cuda()\nmsa = torch.randint(0, 21, (1, 5, 120)).cuda()\nmask = torch.ones_like(seq).bool().cuda()\nmsa_mask = torch.ones_like(msa).bool().cuda()\n\ndistogram = model(\n    seq,\n    msa,\n    mask = mask,\n    msa_mask = msa_mask\n) # (1, 128, 128, 37)\n```\n\nFinally, instead of following the pattern of convolutions, self-attention, cross-attention per depth repeating, you can customize any order you wish with the `custom_block_types` keyword\n\nex. A network where you do predominately convolutions first, followed by self-attention + cross-attention blocks\n\n```python\nimport torch\nfrom alphafold2_pytorch import Alphafold2\n\nmodel = Alphafold2(\n    dim = 256,\n    heads = 8,\n    dim_head = 64,\n    custom_block_types = (\n        *(('conv',) * 6),\n        *(('self', 'cross') * 6)\n    )\n).cuda()\n\nseq = torch.randint(0, 21, (1, 128)).cuda()\nmsa = torch.randint(0, 21, (1, 5, 120)).cuda()\nmask = torch.ones_like(seq).bool().cuda()\nmsa_mask = torch.ones_like(msa).bool().cuda()\n\ndistogram = model(\n    seq,\n    msa,\n    mask = mask,\n    msa_mask = msa_mask\n) # (1, 128, 128, 37)\n```\n\n## Sparse Attention\n\nYou can train with Microsoft Deepspeed's \u003ca href=\"https://www.deepspeed.ai/news/2020/09/08/sparse-attention.html\"\u003eSparse Attention\u003c/a\u003e, but you will have to endure the installation process. It is two-steps.\n\nFirst, you need to install Deepspeed with Sparse Attention\n\n```bash\n$ sh install_deepspeed.sh\n```\n\nNext, you need to install the pip package `triton`\n\n```bash\n$ pip install triton\n```\n\nIf both of the above succeeded, now you can train with Sparse Attention!\n\nSadly, the sparse attention is only supported for self attention, and not cross attention. I will bring in a different solution for making cross attention performant.\n\n```python\nmodel = Alphafold2(\n    dim = 256,\n    depth = 12,\n    heads = 8,\n    dim_head = 64,\n    max_seq_len = 2048,                   # the maximum sequence length, this is required for sparse attention. the input cannot exceed what is set here\n    sparse_self_attn = (True, False) * 6  # interleave sparse and full attention for all 12 layers\n).cuda()\n```\n\n## Linear Attention\n\nI have also added one of the best \u003ca href=\"https://github.com/lucidrains/performer-pytorch\"\u003elinear attention\u003c/a\u003e variants, in the hope of lessening the burden of cross attending. I personally have not found Performer to work that well, but since in the paper they reported some ok numbers for protein benchmarks, I thought I'd include it and allow others to experiment.\n\n```python\nimport torch\nfrom alphafold2_pytorch import Alphafold2\n\nmodel = Alphafold2(\n    dim = 256,\n    depth = 2,\n    heads = 8,\n    dim_head = 64,\n    cross_attn_linear = True # simply set this to True to use Performer for all cross attention\n).cuda()\n```\n\nYou can also specify the exact layers you wish to use linear attention by passing in a tuple of the same length as the depth\n\n```python\nimport torch\nfrom alphafold2_pytorch import Alphafold2\n\nmodel = Alphafold2(\n    dim = 256,\n    depth = 6,\n    heads = 8,\n    dim_head = 64,\n    cross_attn_linear = (True, False) * 3 # interleave linear and full attention\n).cuda()\n```\n\n## Kronecker Attention for Cross Attention\n\nThis \u003ca href=\"https://arxiv.org/abs/2007.08442\"\u003epaper\u003c/a\u003e suggests that if you have queries or contexts that have defined axials (say an image), you can reduce the amount of attention needed by averaging across those axials (height and width) and concatenating the averaged axials into one sequence. You can turn this on as a memory saving technique for the cross attention, specifically for the primary sequence.\n\n```python\nimport torch\nfrom alphafold2_pytorch import Alphafold2\n\nmodel = Alphafold2(\n    dim = 256,\n    depth = 6,\n    heads = 8,\n    dim_head = 64,\n    cross_attn_kron_primary = True # make sure primary sequence undergoes the kronecker operator during cross attention\n).cuda()\n```\n\nYou can also apply the same operator to the MSAs during cross attention with the `cross_attn_kron_msa` flag, if your MSAs are aligned and of the same width.\n\nTodo\n\n- [ ] offer masked mean reduction method\n- [ ] rotary embeddings\n\n## Memory Compressed Attention\n\nTo save on memory for cross attention, you can set a compression ratio for the key / values, following the scheme laid out in \u003ca href=\"https://arxiv.org/abs/1801.10198\"\u003ethis paper\u003c/a\u003e. A compression ratio of 2-4 is usually acceptable.\n\n```python\nmodel = Alphafold2(\n    dim = 256,\n    depth = 12,\n    heads = 8,\n    dim_head = 64,\n    cross_attn_compress_ratio = 3\n).cuda()\n```\n\n## MSA processing in Trunk\n\n\u003cimg src=\"./images/msa-transformer-diagram.png\" width=\"500px\"\u003e\u003c/img\u003e\n\nA \u003ca href=\"https://www.biorxiv.org/content/10.1101/2021.02.12.430858v1\"\u003enew paper\u003c/a\u003e by \u003ca href=\"https://github.com/rmrao\"\u003eRoshan Rao\u003c/a\u003e proposes using axial attention for pretraining on MSA's. Given the strong results, this repository will use the same scheme in the trunk, specifically for the MSA self-attention.\n\nYou can also tie the row attentions of the MSA with the `msa_tie_row_attn = True` setting on initialization of `Alphafold2`. However, in order to use this, you must make sure that if you have uneven number of MSAs per primary sequence, that the MSA mask is properly set to `False` for the rows not in use.\n\n```python\nmodel = Alphafold2(\n    dim = 256,\n    depth = 2,\n    heads = 8,\n    dim_head = 64,\n    msa_tie_row_attn = True # just set this to true\n)\n```\n\n## Template processing in Trunk\n\nTemplate processing is also largely done with axial attention, with cross attention done along the number of templates dimension. This largely follows the same scheme as in the recent all-attention approach to video classification as shown \u003ca href=\"https://github.com/lucidrains/TimeSformer-pytorch\"\u003ehere\u003c/a\u003e.\n\n```python\nimport torch\nfrom alphafold2_pytorch import Alphafold2\n\nmodel = Alphafold2(\n    dim = 256,\n    depth = 5,\n    heads = 8,\n    dim_head = 64,\n    reversible = True,\n    sparse_self_attn = False,\n    max_seq_len = 256,\n    cross_attn_compress_ratio = 3\n).cuda()\n\nseq = torch.randint(0, 21, (1, 16)).cuda()\nmask = torch.ones_like(seq).bool().cuda()\n\nmsa = torch.randint(0, 21, (1, 10, 16)).cuda()\nmsa_mask = torch.ones_like(msa).bool().cuda()\n\ntemplates_seq = torch.randint(0, 21, (1, 2, 16)).cuda()\ntemplates_coors = torch.randint(0, 37, (1, 2, 16, 3)).cuda()\ntemplates_mask = torch.ones_like(templates_seq).bool().cuda()\n\ndistogram = model(\n    seq,\n    msa,\n    mask = mask,\n    msa_mask = msa_mask,\n    templates_seq = templates_seq,\n    templates_coors = templates_coors,\n    templates_mask = templates_mask\n)\n```\n\nIf sidechain information is also present, in the form of the unit vector between the C and C-alpha coordinates of each residue, you can also pass it in as follows.\n\n```python\nimport torch\nfrom alphafold2_pytorch import Alphafold2\n\nmodel = Alphafold2(\n    dim = 256,\n    depth = 5,\n    heads = 8,\n    dim_head = 64,\n    reversible = True,\n    sparse_self_attn = False,\n    max_seq_len = 256,\n    cross_attn_compress_ratio = 3\n).cuda()\n\nseq = torch.randint(0, 21, (1, 16)).cuda()\nmask = torch.ones_like(seq).bool().cuda()\n\nmsa = torch.randint(0, 21, (1, 10, 16)).cuda()\nmsa_mask = torch.ones_like(msa).bool().cuda()\n\ntemplates_seq = torch.randint(0, 21, (1, 2, 16)).cuda()\ntemplates_coors = torch.randn(1, 2, 16, 3).cuda()\ntemplates_mask = torch.ones_like(templates_seq).bool().cuda()\n\ntemplates_sidechains = torch.randn(1, 2, 16, 3).cuda() # unit vectors of difference of C and C-alpha coordinates\n\ndistogram = model(\n    seq,\n    msa,\n    mask = mask,\n    msa_mask = msa_mask,\n    templates_seq = templates_seq,\n    templates_mask = templates_mask,\n    templates_coors = templates_coors,\n    templates_sidechains = templates_sidechains\n)\n```\n\n## Equivariant Attention\n\nI have prepared a reimplementation of SE3 Transformer, as explained by Fabian Fuchs in a \u003ca href=\"https://fabianfuchsml.github.io/alphafold2/\"\u003especulatory blogpost\u003c/a\u003e.\n\nIn addition, a \u003ca href=\"https://arxiv.org/abs/2102.09844\"\u003enew paper\u003c/a\u003e from Victor and Welling uses invariant features for E(n) equivariance, reaching SOTA and outperforming SE3 Transformer at a number of benchmarks, while being much faster. I have taken the main ideas from this paper and modified it to become a transformer (added attention to both features and coordinate updates).\n\nAll three of the equivariant networks above have been integrated and are available for use in the repository for atomic coordinate refinement by simply setting one hyperparameter `structure_module_type`.\n\n- `se3` \u003ca href=\"https://github.com/lucidrains/se3-transformer-pytorch\"\u003eSE3 Transformer\u003c/a\u003e\n\n- `egnn` \u003ca href=\"https://github.com/lucidrains/En-transformer\"\u003eEGNN\u003c/a\u003e\n\n- `en` \u003ca href=\"https://github.com/lucidrains/En-transformer\"\u003eE(n)-Transformer\u003c/a\u003e\n\nOf interest to readers, each of the three frameworks have also been validated by researchers on related problems.\n\n## Testing\n\n```bash\n$ python setup.py test\n```\n\n## Data\n\nThis library will use the awesome work by \u003ca href=\"http://github.com/jonathanking\"\u003eJonathan King\u003c/a\u003e at \u003ca href=\"https://github.com/jonathanking/sidechainnet\"\u003ethis repository\u003c/a\u003e. Thank you Jonathan 🙏!\n\nWe also have the MSA data, all ~3.5 TB worth, downloaded and hosted by Archivist, who owns \u003ca href=\"https://the-eye.eu/\"\u003eThe-Eye\u003c/a\u003e project. (They also host the data and models for \u003ca href=\"https://www.eleuther.ai/\"\u003eEleuther AI\u003c/a\u003e) Please consider a donation if you find them helpful.\n\n```bash\n$ curl -s https://the-eye.eu/eleuther_staging/globus_stuffs/tree.txt\n```\n\n## Speculation\n\nhttps://xukui.cn/alphafold2.html\n\nhttps://moalquraishi.wordpress.com/2020/12/08/alphafold2-casp14-it-feels-like-ones-child-has-left-home/\n\n\u003cimg src=\"./images/science.png\"\u003e\u003c/img\u003e\n\n\u003cimg src=\"./images/reddit.png\"\u003e\u003c/img\u003e\n\n## Recent works by competing labs\n\nhttps://www.biorxiv.org/content/10.1101/2020.12.10.419994v1.full.pdf\n\nhttps://pubmed.ncbi.nlm.nih.gov/33637700/\n\n\u003ca href=\"./images/tFold.pdf\"\u003etFold presentation, from Tencent AI labs\u003c/a\u003e\n\n## External packages\n\n* **Final step** - \u003ca href=\"https://graylab.jhu.edu/PyRosetta.documentation/pyrosetta.rosetta.protocols.relax.html#pyrosetta.rosetta.protocols.relax.FastRelax\"\u003eFast Relax\u003c/a\u003e - **Installation Instructions**:\n    * Download the pyrosetta wheel from: http://www.pyrosetta.org/dow (select appropiate version) - beware the file is heavy (approx 1.2 Gb)\n        * The download should be free for anyone with an academic email\n    * Bash \u003e `cd downloads_folder` \u003e `pip install pyrosetta_wheel_filename.whl`\n\n\u003ca href=\"https://parmed.github.io/ParmEd/html/omm_amber.html\"\u003eOpenMM Amber\u003c/a\u003e\n\n## Citations\n\n```bibtex\n@misc{unpublished2021alphafold2,\n    title   = {Alphafold2},\n    author  = {John Jumper},\n    year    = {2020},\n    archivePrefix = {arXiv},\n    primaryClass = {q-bio.BM}\n}\n```\n\n```bibtex\n@article{Rao2021.02.12.430858,\n    author  = {Rao, Roshan and Liu, Jason and Verkuil, Robert and Meier, Joshua and Canny, John F. and Abbeel, Pieter and Sercu, Tom and Rives, Alexander},\n    title   = {MSA Transformer},\n    year    = {2021},\n    publisher = {Cold Spring Harbor Laboratory},\n    URL     = {https://www.biorxiv.org/content/early/2021/02/13/2021.02.12.430858},\n    journal = {bioRxiv}\n}\n```\n\n```bibtex\n@article {Rives622803,\n    author  = {Rives, Alexander and Goyal, Siddharth and Meier, Joshua and Guo, Demi and Ott, Myle and Zitnick, C. Lawrence and Ma, Jerry and Fergus, Rob},\n    title   = {Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences},\n    year    = {2019},\n    doi     = {10.1101/622803},\n    publisher = {Cold Spring Harbor Laboratory},\n    journal = {bioRxiv}\n}\n```\n\n```bibtex\n@article {Elnaggar2020.07.12.199554,\n    author  = {Elnaggar, Ahmed and Heinzinger, Michael and Dallago, Christian and Rehawi, Ghalia and Wang, Yu and Jones, Llion and Gibbs, Tom and Feher, Tamas and Angerer, Christoph and Steinegger, Martin and BHOWMIK, DEBSINDHU and Rost, Burkhard},\n    title   = {ProtTrans: Towards Cracking the Language of Life{\\textquoteright}s Code Through Self-Supervised Deep Learning and High Performance Computing},\n    elocation-id = {2020.07.12.199554},\n    year    = {2021},\n    doi     = {10.1101/2020.07.12.199554},\n    publisher = {Cold Spring Harbor Laboratory},\n    URL     = {https://www.biorxiv.org/content/early/2021/05/04/2020.07.12.199554},\n    eprint  = {https://www.biorxiv.org/content/early/2021/05/04/2020.07.12.199554.full.pdf},\n    journal = {bioRxiv}\n}\n```\n\n```bibtex\n@misc{king2020sidechainnet,\n    title   = {SidechainNet: An All-Atom Protein Structure Dataset for Machine Learning}, \n    author  = {Jonathan E. King and David Ryan Koes},\n    year    = {2020},\n    eprint  = {2010.08162},\n    archivePrefix = {arXiv},\n    primaryClass = {q-bio.BM}\n}\n```\n\n```bibtex\n@misc{alquraishi2019proteinnet,\n    title   = {ProteinNet: a standardized data set for machine learning of protein structure}, \n    author  = {Mohammed AlQuraishi},\n    year    = {2019},\n    eprint  = {1902.00249},\n    archivePrefix = {arXiv},\n    primaryClass = {q-bio.BM}\n}\n```\n\n```bibtex\n@misc{gomez2017reversible,\n    title     = {The Reversible Residual Network: Backpropagation Without Storing Activations}, \n    author    = {Aidan N. Gomez and Mengye Ren and Raquel Urtasun and Roger B. Grosse},\n    year      = {2017},\n    eprint    = {1707.04585},\n    archivePrefix = {arXiv},\n    primaryClass = {cs.CV}\n}\n```\n\n```bibtex\n@misc{fuchs2021iterative,\n    title   = {Iterative SE(3)-Transformers},\n    author  = {Fabian B. Fuchs and Edward Wagstaff and Justas Dauparas and Ingmar Posner},\n    year    = {2021},\n    eprint  = {2102.13419},\n    archivePrefix = {arXiv},\n    primaryClass = {cs.LG}\n}\n```\n\n```bibtex\n@misc{satorras2021en,\n    title   = {E(n) Equivariant Graph Neural Networks}, \n    author  = {Victor Garcia Satorras and Emiel Hoogeboom and Max Welling},\n    year    = {2021},\n    eprint  = {2102.09844},\n    archivePrefix = {arXiv},\n    primaryClass = {cs.LG}\n}\n```\n\n```bibtex\n@misc{su2021roformer,\n    title   = {RoFormer: Enhanced Transformer with Rotary Position Embedding},\n    author  = {Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu},\n    year    = {2021},\n    eprint  = {2104.09864},\n    archivePrefix = {arXiv},\n    primaryClass = {cs.CL}\n}\n```\n\n```bibtex\n@article{Gao_2020,\n    title   = {Kronecker Attention Networks},\n    ISBN    = {9781450379984},\n    url     = {http://dx.doi.org/10.1145/3394486.3403065},\n    DOI     = {10.1145/3394486.3403065},\n    journal = {Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \u0026 Data Mining},\n    publisher = {ACM},\n    author  = {Gao, Hongyang and Wang, Zhengyang and Ji, Shuiwang},\n    year    = {2020},\n    month   = {Jul}\n}\n```\n\n```bibtex\n@article {Si2021.05.10.443415,\n    author  = {Si, Yunda and Yan, Chengfei},\n    title   = {Improved protein contact prediction using dimensional hybrid residual networks and singularity enhanced loss function},\n    elocation-id = {2021.05.10.443415},\n    year    = {2021},\n    doi     = {10.1101/2021.05.10.443415},\n    publisher = {Cold Spring Harbor Laboratory},\n    URL     = {https://www.biorxiv.org/content/early/2021/05/11/2021.05.10.443415},\n    eprint  = {https://www.biorxiv.org/content/early/2021/05/11/2021.05.10.443415.full.pdf},\n    journal = {bioRxiv}\n}\n```\n\n```bibtex\n@article {Costa2021.06.02.446809,\n    author  = {Costa, Allan and Ponnapati, Manvitha and Jacobson, Joseph M. and Chatterjee, Pranam},\n    title   = {Distillation of MSA Embeddings to Folded Protein Structures with Graph Transformers},\n    year    = {2021},\n    doi     = {10.1101/2021.06.02.446809},\n    publisher = {Cold Spring Harbor Laboratory},\n    URL     = {https://www.biorxiv.org/content/early/2021/06/02/2021.06.02.446809},\n    eprint  = {https://www.biorxiv.org/content/early/2021/06/02/2021.06.02.446809.full.pdf},\n    journal = {bioRxiv}\n}\n```\n\n```bibtex\n@article {Baek2021.06.14.448402,\n    author  = {Baek, Minkyung and DiMaio, Frank and Anishchenko, Ivan and Dauparas, Justas and Ovchinnikov, Sergey and Lee, Gyu Rie and Wang, Jue and Cong, Qian and Kinch, Lisa N. and Schaeffer, R. Dustin and Mill{\\'a}n, Claudia and Park, Hahnbeom and Adams, Carson and Glassman, Caleb R. and DeGiovanni, Andy and Pereira, Jose H. and Rodrigues, Andria V. and van Dijk, Alberdina A. and Ebrecht, Ana C. and Opperman, Diederik J. and Sagmeister, Theo and Buhlheller, Christoph and Pavkov-Keller, Tea and Rathinaswamy, Manoj K and Dalwadi, Udit and Yip, Calvin K and Burke, John E and Garcia, K. Christopher and Grishin, Nick V. and Adams, Paul D. and Read, Randy J. and Baker, David},\n    title   = {Accurate prediction of protein structures and interactions using a 3-track network},\n    year    = {2021},\n    doi     = {10.1101/2021.06.14.448402},\n    publisher = {Cold Spring Harbor Laboratory},\n    URL     = {https://www.biorxiv.org/content/early/2021/06/15/2021.06.14.448402},\n    eprint  = {https://www.biorxiv.org/content/early/2021/06/15/2021.06.14.448402.full.pdf},\n    journal = {bioRxiv}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucidrains%2Falphafold2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucidrains%2Falphafold2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucidrains%2Falphafold2/lists"}