{"id":15601067,"url":"https://github.com/lucidrains/equiformer-pytorch","last_synced_at":"2025-05-15T17:05:37.347Z","repository":{"id":63254719,"uuid":"559339596","full_name":"lucidrains/equiformer-pytorch","owner":"lucidrains","description":"Implementation of the Equiformer, SE3/E3 equivariant attention network that reaches new SOTA, and adopted for use by EquiFold for protein folding","archived":false,"fork":false,"pushed_at":"2024-12-17T14:28:50.000Z","size":18306,"stargazers_count":270,"open_issues_count":6,"forks_count":27,"subscribers_count":12,"default_branch":"main","last_synced_at":"2025-05-15T17:05:30.460Z","etag":null,"topics":["artificial-intelligence","attention-mechanisms","deep-learning","equivariance","molecules","protein-folding","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lucidrains.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-29T20:03:30.000Z","updated_at":"2025-05-14T08:32:21.000Z","dependencies_parsed_at":"2025-01-05T01:15:30.588Z","dependency_job_id":null,"html_url":"https://github.com/lucidrains/equiformer-pytorch","commit_stats":{"total_commits":164,"total_committers":5,"mean_commits":32.8,"dds":"0.060975609756097615","last_synced_commit":"0d6f5074899c7f3e3e5e4db2eb703c619f5d4090"},"previous_names":[],"tags_count":68,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fequiformer-pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fequiformer-pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fequiformer-pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fequiformer-pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lucidrains","download_url":"https://codeload.github.com/lucidrains/equiformer-pytorch/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254384988,"owners_count":22062422,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","attention-mechanisms","deep-learning","equivariance","molecules","protein-folding","transformers"],"created_at":"2024-10-03T02:13:39.659Z","updated_at":"2025-05-15T17:05:32.340Z","avatar_url":"https://github.com/lucidrains.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"./equiformer.png\" width=\"450px\"\u003e\u003c/img\u003e\n\n## Equiformer - Pytorch (wip)\n\nImplementation of the \u003ca href=\"https://arxiv.org/abs/2206.11990\"\u003eEquiformer\u003c/a\u003e, SE3/E3 equivariant attention network that reaches new SOTA, and adopted for use by \u003ca href=\"https://www.biorxiv.org/content/10.1101/2022.10.07.511322v1\"\u003eEquiFold (Prescient Design)\u003c/a\u003e for protein folding\n\nThe design of this seems to build off of \u003ca href=\"https://arxiv.org/abs/2006.10503\"\u003eSE3 Transformers\u003c/a\u003e, with the dot product attention replaced with MLP Attention and non-linear message passing from \u003ca href=\"https://arxiv.org/abs/2105.14491\"\u003eGATv2\u003c/a\u003e. It also does a depthwise tensor product for a bit more efficiency. If you think I am mistakened, please feel free to email me.\n\nUpdate: There has been a new development that makes scaling the number of degrees for SE3 equivariant networks dramatically better! \u003ca href=\"https://arxiv.org/abs/2206.14331\"\u003eThis paper\u003c/a\u003e first noted that by aligning the representations along the z-axis (or y-axis by some other convention), the spherical harmonics become sparse. This removes the m\u003csub\u003ef\u003c/sub\u003e dimension from the equation. \u003ca href=\"https://arxiv.org/abs/2302.03655\"\u003eA follow up paper\u003c/a\u003e from Passaro et al. noted the Clebsch Gordan matrix has also become sparse, leading to removal of m\u003csub\u003ei\u003c/sub\u003e and l\u003csub\u003ef\u003c/sub\u003e. They also made the connection that the problem has been reduced from SO(3) to SO(2) after aligning the reps to one axis. \u003ca href=\"https://arxiv.org/abs/2306.12059\"\u003eEquiformer v2\u003c/a\u003e (\u003ca href=\"https://github.com/atomicarchitects/equiformer_v2\"\u003eOfficial repository\u003c/a\u003e) leverages this in a transformer-like framework to reach new SOTA.\n\nWill definitely be putting more work / exploration into this. For now, I've incorporated the tricks from the first two paper for Equiformer v1, save for complete conversion into SO(2).\n\nUpdate 2: There appears to be a new SOTA without any interaction between higher degree reps (in other words, all tensor product / clebsch gordan math goes away). [GotenNet](https://github.com/lucidrains/gotennet-pytorch), which seems to be a transformer rendition of [HEGNN](https://github.com/GLAD-RUC/HEGNN)\n\n## Install\n\n```bash\n$ pip install equiformer-pytorch\n```\n\n## Usage\n\n```python\nimport torch\nfrom equiformer_pytorch import Equiformer\n\nmodel = Equiformer(\n    num_tokens = 24,\n    dim = (4, 4, 2),               # dimensions per type, ascending, length must match number of degrees (num_degrees)\n    dim_head = (4, 4, 4),          # dimension per attention head\n    heads = (2, 2, 2),             # number of attention heads\n    num_linear_attn_heads = 0,     # number of global linear attention heads, can see all the neighbors\n    num_degrees = 3,               # number of degrees\n    depth = 4,                     # depth of equivariant transformer\n    attend_self = True,            # attending to self or not\n    reduce_dim_out = True,         # whether to reduce out to dimension of 1, say for predicting new coordinates for type 1 features\n    l2_dist_attention = False      # set to False to try out MLP attention\n).cuda()\n\nfeats = torch.randint(0, 24, (1, 128)).cuda()\ncoors = torch.randn(1, 128, 3).cuda()\nmask  = torch.ones(1, 128).bool().cuda()\n\nout = model(feats, coors, mask) # (1, 128)\n\nout.type0 # invariant type 0    - (1, 128)\nout.type1 # equivariant type 1  - (1, 128, 3)\n```\n\nThis repository also includes a way to decouple memory usage from depth using \u003ca href=\"https://arxiv.org/abs/1707.04585\"\u003ereversible networks\u003c/a\u003e. In other words, if you increase depth, the memory cost will stay constant at the usage of one equiformer transformer block (attention and feedforward).\n\n```python\nimport torch\nfrom equiformer_pytorch import Equiformer\n\nmodel = Equiformer(\n    num_tokens = 24,\n    dim = (4, 4, 2),\n    dim_head = (4, 4, 4),\n    heads = (2, 2, 2),\n    num_degrees = 3,\n    depth = 48,          # depth of 48 - just to show that it runs - in reality, seems to be quite unstable at higher depths, so architecture stil needs more work\n    reversible = True,   # just set this to True to use https://arxiv.org/abs/1707.04585\n).cuda()\n\nfeats = torch.randint(0, 24, (1, 128)).cuda()\ncoors = torch.randn(1, 128, 3).cuda()\nmask  = torch.ones(1, 128).bool().cuda()\n\nout = model(feats, coors, mask)\n\nout.type0.sum().backward()\n```\n\n## Edges\n\nwith edges, ex. atomic bonds\n\n```python\nimport torch\nfrom equiformer_pytorch import Equiformer\n\nmodel = Equiformer(\n    num_tokens = 28,\n    dim = 64,\n    num_edge_tokens = 4,       # number of edge type, say 4 bond types\n    edge_dim = 16,             # dimension of edge embedding\n    depth = 2,\n    input_degrees = 1,\n    num_degrees = 3,\n    reduce_dim_out = True\n)\n\natoms = torch.randint(0, 28, (2, 32))\nbonds = torch.randint(0, 4, (2, 32, 32))\ncoors = torch.randn(2, 32, 3)\nmask  = torch.ones(2, 32).bool()\n\nout = model(atoms, coors, mask, edges = bonds)\n\nout.type0 # (2, 32)\nout.type1 # (2, 32, 3)\n```\n\nwith adjacency matrix\n\n```python\nimport torch\nfrom equiformer_pytorch import Equiformer\n\nmodel = Equiformer(\n    dim = 32,\n    heads = 8,\n    depth = 1,\n    dim_head = 64,\n    num_degrees = 2,\n    valid_radius = 10,\n    reduce_dim_out = True,\n    attend_sparse_neighbors = True,  # this must be set to true, in which case it will assert that you pass in the adjacency matrix\n    num_neighbors = 0,               # if you set this to 0, it will only consider the connected neighbors as defined by the adjacency matrix. but if you set a value greater than 0, it will continue to fetch the closest points up to this many, excluding the ones already specified by the adjacency matrix\n    num_adj_degrees_embed = 2,       # this will derive the second degree connections and embed it correctly\n    max_sparse_neighbors = 8         # you can cap the number of neighbors, sampled from within your sparse set of neighbors as defined by the adjacency matrix, if specified\n)\n\nfeats = torch.randn(1, 128, 32)\ncoors = torch.randn(1, 128, 3)\nmask  = torch.ones(1, 128).bool()\n\n# placeholder adjacency matrix\n# naively assuming the sequence is one long chain (128, 128)\n\ni = torch.arange(128)\nadj_mat = (i[:, None] \u003c= (i[None, :] + 1)) \u0026 (i[:, None] \u003e= (i[None, :] - 1))\n\nout = model(feats, coors, mask, adj_mat = adj_mat)\n\nout.type0 # (1, 128)\nout.type1 # (1, 128, 3)\n```\n\n## Appreciation\n\n- \u003ca href=\"https://stability.ai/\"\u003eStabilityAI\u003c/a\u003e for the generous sponsorship, as well as my other sponsors out there\n\n## Testing\n\nTests for equivariance etc\n\n```bash\n$ python setup.py test\n```\n\n## Example\n\nFirst install `sidechainnet`\n\n```bash\n$ pip install sidechainnet\n```\n\nThen run the protein backbone denoising task\n\n```bash\n$ python denoise.py\n```\n\n## Todo\n\n- [x] move xi and xj separate project and sum logic into Conv class\n- [x] move self interacting key / value production into Conv, fix no pooling in conv with self interaction\n- [x] go with a naive way to split up contribution from input degrees for DTP\n- [x] for dot product attention in higher types, try euclidean distance\n- [x] consider a all-neighbors attention layer just for type0, using linear attention\n\n- [ ] integrate the new finding from spherical channels paper, followed up by so(3) -\u003e so(2) paper, which reduces the computation from O(L^6) -\u003e O(L^3)!\n    - [x] add rotation matrix -\u003e ZYZ euler angles\n    - [x] function for deriving rotation matrix for r_ij -\u003e (0, 1, 0)\n    - [x] prepare get_basis to return D for rotating representations to (0, 1, 0) to greatly simplify spherical harmonics\n    - [x] add tests for batch rotating vectors to align with another - handle edge cases (0, 0, 0)?\n    - [x] redo get_basis to only calculate spherical harmonics Y for (0, 1, 0) and cache\n    - [x] do the further optimization to remove clebsch gordan (since m_i only depends on m_o), as noted in eSCN paper\n    - [x] validate one can train at higher degrees\n    - [x] figure out the whole linear bijection argument in appendix of eSCN and why parameterized lf can be removed\n    - [x] figure out why training NaNs with float32\n    - [ ] refactor into full so3 -\u003e so2 linear layer, as proposed in eSCN paper\n    - [ ] add equiformer v2, and start looking into equivariant protein backbone diffusion again\n\n## Citations\n\n```bibtex\n@article{Liao2022EquiformerEG,\n    title   = {Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs},\n    author  = {Yi Liao and Tess E. Smidt},\n    journal = {ArXiv},\n    year    = {2022},\n    volume  = {abs/2206.11990}\n}\n```\n\n```bibtex\n@article {Lee2022.10.07.511322,\n    author  = {Lee, Jae Hyeon and Yadollahpour, Payman and Watkins, Andrew and Frey, Nathan C. and Leaver-Fay, Andrew and Ra, Stephen and Cho, Kyunghyun and Gligorijevic, Vladimir and Regev, Aviv and Bonneau, Richard},\n    title   = {EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation},\n    elocation-id = {2022.10.07.511322},\n    year    = {2022},\n    doi     = {10.1101/2022.10.07.511322},\n    publisher = {Cold Spring Harbor Laboratory},\n    URL     = {https://www.biorxiv.org/content/early/2022/10/08/2022.10.07.511322},\n    eprint  = {https://www.biorxiv.org/content/early/2022/10/08/2022.10.07.511322.full.pdf},\n    journal = {bioRxiv}\n}\n```\n\n```bibtex\n@article{Shazeer2019FastTD,\n    title   = {Fast Transformer Decoding: One Write-Head is All You Need},\n    author  = {Noam M. Shazeer},\n    journal = {ArXiv},\n    year    = {2019},\n    volume  = {abs/1911.02150}\n}\n```\n\n```bibtex\n@misc{ding2021cogview,\n    title   = {CogView: Mastering Text-to-Image Generation via Transformers},\n    author  = {Ming Ding and Zhuoyi Yang and Wenyi Hong and Wendi Zheng and Chang Zhou and Da Yin and Junyang Lin and Xu Zou and Zhou Shao and Hongxia Yang and Jie Tang},\n    year    = {2021},\n    eprint  = {2105.13290},\n    archivePrefix = {arXiv},\n    primaryClass = {cs.CV}\n}\n```\n\n```bibtex\n@inproceedings{Kim2020TheLC,\n    title   = {The Lipschitz Constant of Self-Attention},\n    author  = {Hyunjik Kim and George Papamakarios and Andriy Mnih},\n    booktitle = {International Conference on Machine Learning},\n    year    = {2020}\n}\n```\n\n```bibtex\n@article{Zitnick2022SphericalCF,\n    title   = {Spherical Channels for Modeling Atomic Interactions},\n    author  = {C. Lawrence Zitnick and Abhishek Das and Adeesh Kolluru and Janice Lan and Muhammed Shuaibi and Anuroop Sriram and Zachary W. Ulissi and Brandon C. Wood},\n    journal = {ArXiv},\n    year    = {2022},\n    volume  = {abs/2206.14331}\n}\n```\n\n```bibtex\n@article{Passaro2023ReducingSC,\n  title     = {Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs},\n  author    = {Saro Passaro and C. Lawrence Zitnick},\n  journal   = {ArXiv},\n  year      = {2023},\n  volume    = {abs/2302.03655}\n}\n```\n\n```bibtex\n@inproceedings{Gomez2017TheRR,\n    title   = {The Reversible Residual Network: Backpropagation Without Storing Activations},\n    author  = {Aidan N. Gomez and Mengye Ren and Raquel Urtasun and Roger Baker Grosse},\n    booktitle = {NIPS},\n    year    = {2017}\n}\n```\n\n```bibtex\n@article{Bondarenko2023QuantizableTR,\n    title   = {Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing},\n    author  = {Yelysei Bondarenko and Markus Nagel and Tijmen Blankevoort},\n    journal = {ArXiv},\n    year    = {2023},\n    volume  = {abs/2306.12929},\n    url     = {https://api.semanticscholar.org/CorpusID:259224568}\n}\n```\n\n```bibtex\n@inproceedings{Arora2023ZoologyMA,\n  title   = {Zoology: Measuring and Improving Recall in Efficient Language Models},\n  author  = {Simran Arora and Sabri Eyuboglu and Aman Timalsina and Isys Johnson and Michael Poli and James Zou and Atri Rudra and Christopher R'e},\n  year    = {2023},\n  url     = {https://api.semanticscholar.org/CorpusID:266149332}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucidrains%2Fequiformer-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucidrains%2Fequiformer-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucidrains%2Fequiformer-pytorch/lists"}