{"id":19156219,"url":"https://github.com/kyegomez/alphafold3","last_synced_at":"2025-05-14T15:09:35.422Z","repository":{"id":238889746,"uuid":"797875909","full_name":"kyegomez/AlphaFold3","owner":"kyegomez","description":"Implementation of Alpha Fold 3 from the paper: \"Accurate structure prediction of biomolecular interactions with AlphaFold3\" in PyTorch","archived":false,"fork":false,"pushed_at":"2025-04-04T12:57:41.000Z","size":2305,"stargazers_count":783,"open_issues_count":3,"forks_count":105,"subscribers_count":22,"default_branch":"main","last_synced_at":"2025-04-06T10:04:26.941Z","etag":null,"topics":["ai","alphafold","artificial-intelligence","bio","biology","biology-ai","geneml","ml"],"latest_commit_sha":null,"homepage":"https://discord.gg/7VckQVxvKk","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kyegomez.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["kyegomez"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2024-05-08T16:54:21.000Z","updated_at":"2025-03-31T10:17:50.000Z","dependencies_parsed_at":"2024-05-15T17:24:56.633Z","dependency_job_id":"aa2c0ed1-dca5-4aed-8c00-ae0c559424ef","html_url":"https://github.com/kyegomez/AlphaFold3","commit_stats":{"total_commits":38,"total_committers":7,"mean_commits":5.428571428571429,"dds":"0.42105263157894735","last_synced_commit":"addc147e798b8108dd585d1625548f6aa5e18263"},"previous_names":["kyegomez/alphafold3"],"tags_count":0,"template":false,"template_full_name":"kyegomez/Python-Package-Template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FAlphaFold3","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FAlphaFold3/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FAlphaFold3/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FAlphaFold3/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kyegomez","download_url":"https://codeload.github.com/kyegomez/AlphaFold3/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248717237,"owners_count":21150389,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","alphafold","artificial-intelligence","bio","biology","biology-ai","geneml","ml"],"created_at":"2024-11-09T08:33:38.621Z","updated_at":"2025-04-13T13:13:35.334Z","avatar_url":"https://github.com/kyegomez.png","language":"Python","funding_links":["https://github.com/sponsors/kyegomez"],"categories":[],"sub_categories":[],"readme":"[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# AlphaFold3\nImplementation of Alpha Fold 3 from the paper: \"Accurate structure prediction of biomolecular interactions with AlphaFold3\" in PyTorch\n\n\n## install\n`$ pip install alphafold3`\n\n## Input Tensor Size Example\n\n```python\nimport torch\n\n# Define the batch size, number of nodes, and number of features\nbatch_size = 1\nnum_nodes = 5\nnum_features = 64\n\n# Generate random pair representations using torch.randn\n# Shape: (batch_size, num_nodes, num_nodes, num_features)\npair_representations = torch.randn(\n    batch_size, num_nodes, num_nodes, num_features\n)\n\n# Generate random single representations using torch.randn\n# Shape: (batch_size, num_nodes, num_features)\nsingle_representations = torch.randn(\n    batch_size, num_nodes, num_features\n)\n```\n\n## Genetic Diffusion\nNeed review but basically it operates on atomic coordinates.\n\n```python\nimport torch\nfrom alphafold3.diffusion import GeneticDiffusion\n\n# Create an instance of the GeneticDiffusionModuleBlock\nmodel = GeneticDiffusion(channels=3, training=True)\n\n# Generate random input coordinates\ninput_coords = torch.randn(10, 100, 100, 3)\n\n# Generate random ground truth coordinates\nground_truth = torch.randn(10, 100, 100, 3)\n\n# Pass the input coordinates and ground truth coordinates through the model\noutput_coords, loss = model(input_coords, ground_truth)\n\n# Print the output coordinates\nprint(output_coords)\n\n# Print the loss value\nprint(loss)\n```\n\n## Full Model Example Forward pass\n\n```python\nimport torch \nfrom alphafold3 import AlphaFold3\n\n# Create random tensors\nx = torch.randn(1, 5, 5, 64)  # Shape: (batch_size, seq_len, seq_len, dim)\ny = torch.randn(1, 5, 64)  # Shape: (batch_size, seq_len, dim)\n\n# Initialize AlphaFold3 model\nmodel = AlphaFold3(\n    dim=64,\n    seq_len=5,\n    heads=8,\n    dim_head=64,\n    attn_dropout=0.0,\n    ff_dropout=0.0,\n    global_column_attn=False,\n    pair_former_depth=48,\n    num_diffusion_steps=1000,\n    diffusion_depth=30,\n)\n\n# Forward pass through the model\noutput = model(x, y)\n\n# Print the shape of the output tensor\nprint(output.shape)\n```\n\n# Docker\nA basic PyTorch image is provided that includes the dependencies to run this code.\n\n```bash\n## Build the image\ndocker build -t af3 .\n\n## Run the image (with GPUs)\ndocker run  --gpus all -it af3\n```\n\n# Citation\n```bibtex\n@article{Abramson2024-fj,\n  title    = \"Accurate structure prediction of biomolecular interactions with\n              {AlphaFold} 3\",\n  author   = \"Abramson, Josh and Adler, Jonas and Dunger, Jack and Evans,\n              Richard and Green, Tim and Pritzel, Alexander and Ronneberger,\n              Olaf and Willmore, Lindsay and Ballard, Andrew J and Bambrick,\n              Joshua and Bodenstein, Sebastian W and Evans, David A and Hung,\n              Chia-Chun and O'Neill, Michael and Reiman, David and\n              Tunyasuvunakool, Kathryn and Wu, Zachary and {\\v Z}emgulyt{\\.e},\n              Akvil{\\.e} and Arvaniti, Eirini and Beattie, Charles and\n              Bertolli, Ottavia and Bridgland, Alex and Cherepanov, Alexey and\n              Congreve, Miles and Cowen-Rivers, Alexander I and Cowie, Andrew\n              and Figurnov, Michael and Fuchs, Fabian B and Gladman, Hannah and\n              Jain, Rishub and Khan, Yousuf A and Low, Caroline M R and Perlin,\n              Kuba and Potapenko, Anna and Savy, Pascal and Singh, Sukhdeep and\n              Stecula, Adrian and Thillaisundaram, Ashok and Tong, Catherine\n              and Yakneen, Sergei and Zhong, Ellen D and Zielinski, Michal and\n              {\\v Z}{\\'\\i}dek, Augustin and Bapst, Victor and Kohli, Pushmeet\n              and Jaderberg, Max and Hassabis, Demis and Jumper, John M\",\n  journal  = \"Nature\",\n  month    =  \"May\",\n  year     =  2024\n}\n```\n\n\n\n# Notes\n-\u003e pairwise representation -\u003e explicit atomic positions\n\n-\u003e within the trunk, msa processing is de emphasized with a simpler MSA block, 4 blocks\n\n-\u003e msa processing -\u003e pair weighted averaging \n\n-\u003e pairformer: replaces evoformer, operates on pair representation and single representation\n\n-\u003e pairformer 48 blocks\n\n-\u003e pair and single representation together with the input representation are passed to the diffusion module\n\n-\u003e diffusion takes in 3 tensors [pair, single representation, with new pairformer representation]\n\n-\u003e diffusion module operates directory on raw atom coordinates\n\n-\u003e standard diffusion approach, model is trained to receiev noised atomic coordinates then predict the true coordinates\n\n-\u003e the network learns protein structure at a variety of length scales where the denoising task at small noise emphasizes large scale structure of the system.\n\n-\u003e at inference time, random noise is sampled and then recurrently denoised to produce a final structure\n\n-\u003e diffusion module produces a distribution of answers\n\n-\u003e for each answer the local structure will be sharply defined\n\n-\u003e diffusion models are prone to hallucination where the model may hallucinate plausible looking structures\n\n-\u003e to counteract hallucination, they use a novel cross distillation method where they enrich the training data with alphafold multimer v2.3 predicted strutctures. \n\n-\u003e confidence measures predicts the atom level and pairwise errors in final structures, this is done by regressing the error in the outut of the structure mdule in training,\n\n-\u003e Utilizes diffusion rollout procedure for the full structure generation during training ( using a larger step suze than normal)\n\n-\u003e diffused predicted structure is used to permute the ground truth and ligands to compute metrics to train the confidence head.\n\n-\u003e confidence head uses the pairwise representation to predict the lddt (pddt) and a predicted aligned error matrix as used in alphafold 2 as well as distance error matrix which is the error in the distance matrix of the predicted structure as compared to the true structure\n\n-\u003e confidence measures also preduct atom level and pairwise errors\n\n-\u003e early stopping using a weighted average of all above metic\n\n-\u003e af3 can predict srtructures from input polymer sequences, rediue modifications, ligand smiles\n\n-\u003e uses structures below 1000 residues\n\n-\u003e alphafold3 is able to predict protein nuclear structures with thousnads of residues\n\n-\u003e Covalent modifications (bonded ligands, glycosylation, and modified protein residues and\n202 nucleic acid bases) are also accurately predicted by AF\n\n-\u003e distills alphafold2 preductions\n\n-\u003e key problem in protein structure prediction is they predict static structures and not the dynamical behavior\n\n-\u003e multiple random seeds for either the diffusion head or network does not product an approximation of the solution ensenble\n\n-\u003e in future: generate large number of predictions and rank them\n\n-\u003e inference: top confidence sample from 5 seed runs and 5 diffusion samples per model seed for a total of 25 samples\n\n-\u003e interface accuracy via interface lddt which is calculated from distances netween atoms across different chains in the interface\n\n-\u003e uses a lddt to polymer metric which considers differences from each atom of a entity to any c or c1 polymer atom within  aradius\n\n\n# Todo\n\n## Model Architecture\n- Implement input Embedder from Alphafold2 openfold \nimplementation [LINK](https://github.com/aqlaboratory/openfold)\n\n- Implement the template module from openfold [LINK](https://github.com/aqlaboratory/openfold)\n\n- Implement the MSA embedding from openfold [LINK](https://github.com/aqlaboratory/openfold)\n\n- Fix residuals and make sure pair representation and generated output goes into the diffusion model\n\n- Implement reclying to fix residuals\n\n\n## Training pipeline\n- Get all datasets pushed to huggingface\n\n# Resources\n- [ EvoFormer Paper ](https://www.nature.com/articles/s41586-021-03819-2)\n- [ Pairformer](https://arxiv.org/pdf/2311.03583)\n- [ AlphaFold 3 Paper](https://www.nature.com/articles/s41586-024-07487-w)\n\n- [OpenFold](https://github.com/aqlaboratory/openfold)\n\n\n## Datasets\nSmaller, start here\n- [Protein data bank](https://www.rcsb.org/)\n- [Working with pdb data](https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/dealing-with-coordinates)\n- [PDB ligands](https://huggingface.co/datasets/jglaser/pdb_protein_ligand_complexes)\n- [AlphaFold Protein Structure Database](https://alphafold.ebi.ac.uk/)\n- [Colab notebook for AlphaFold search](https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb)\n\n## Benchmarks\n\n- [RoseTTAFold](https://www.biorxiv.org/content/10.1101/2021.08.15.456425v1)(https://www.ipd.uw.edu/2021/07/rosettafold-accurate-protein-structure-prediction-accessible-to-all/0)\n\n## Related Projects\n\n- [NeuroFold](https://www.biorxiv.org/content/10.1101/2024.03.12.584504v1)\n\n## Tools\n\n- [PyMol](https://pymol.org/)\n- [ChimeraX](https://www.cgl.ucsf.edu/chimerax/download.html)\n\n## Community\n\n- [Agora](https://discord.gg/BAThAeeg)\n## Books \n\n- [Thinking in Systems](https://www.chelseagreen.com/product/thinking-in-systems/)\n\n\n## Citations\n\n```bibtex\n@article{Abramson2024-fj,\n  title    = \"Accurate structure prediction of biomolecular interactions with\n              {AlphaFold} 3\",\n  author   = \"Abramson, Josh and Adler, Jonas and Dunger, Jack and Evans,\n              Richard and Green, Tim and Pritzel, Alexander and Ronneberger,\n              Olaf and Willmore, Lindsay and Ballard, Andrew J and Bambrick,\n              Joshua and Bodenstein, Sebastian W and Evans, David A and Hung,\n              Chia-Chun and O'Neill, Michael and Reiman, David and\n              Tunyasuvunakool, Kathryn and Wu, Zachary and {\\v Z}emgulyt{\\.e},\n              Akvil{\\.e} and Arvaniti, Eirini and Beattie, Charles and\n              Bertolli, Ottavia and Bridgland, Alex and Cherepanov, Alexey and\n              Congreve, Miles and Cowen-Rivers, Alexander I and Cowie, Andrew\n              and Figurnov, Michael and Fuchs, Fabian B and Gladman, Hannah and\n              Jain, Rishub and Khan, Yousuf A and Low, Caroline M R and Perlin,\n              Kuba and Potapenko, Anna and Savy, Pascal and Singh, Sukhdeep and\n              Stecula, Adrian and Thillaisundaram, Ashok and Tong, Catherine\n              and Yakneen, Sergei and Zhong, Ellen D and Zielinski, Michal and\n              {\\v Z}{\\'\\i}dek, Augustin and Bapst, Victor and Kohli, Pushmeet\n              and Jaderberg, Max and Hassabis, Demis and Jumper, John M\",\n  journal  = \"Nature\",\n  month    = \"May\",\n  year     =  2024\n}\n```\n\n```bibtex\n@inproceedings{Darcet2023VisionTN,\n    title   = {Vision Transformers Need Registers},\n    author  = {Timoth'ee Darcet and Maxime Oquab and Julien Mairal and Piotr Bojanowski},\n    year    = {2023},\n    url     = {https://api.semanticscholar.org/CorpusID:263134283}\n}\n```\n\n```bibtex\n@article{Arora2024SimpleLA,\n    title   = {Simple linear attention language models balance the recall-throughput tradeoff},\n    author  = {Simran Arora and Sabri Eyuboglu and Michael Zhang and Aman Timalsina and Silas Alberti and Dylan Zinsley and James Zou and Atri Rudra and Christopher R'e},\n    journal = {ArXiv},\n    year    = {2024},\n    volume  = {abs/2402.18668},\n    url     = {https://api.semanticscholar.org/CorpusID:268063190}\n}\n```\n\n```bibtex\n@article{Puny2021FrameAF,\n    title   = {Frame Averaging for Invariant and Equivariant Network Design},\n    author  = {Omri Puny and Matan Atzmon and Heli Ben-Hamu and Edward James Smith and Ishan Misra and Aditya Grover and Yaron Lipman},\n    journal = {ArXiv},\n    year    = {2021},\n    volume  = {abs/2110.03336},\n    url     = {https://api.semanticscholar.org/CorpusID:238419638}\n}\n```\n\n```bibtex\n@article{Duval2023FAENetFA,\n    title   = {FAENet: Frame Averaging Equivariant GNN for Materials Modeling},\n    author  = {Alexandre Duval and Victor Schmidt and Alex Hernandez Garcia and Santiago Miret and Fragkiskos D. Malliaros and Yoshua Bengio and David Rolnick},\n    journal = {ArXiv},\n    year    = {2023},\n    volume  = {abs/2305.05577},\n    url     = {https://api.semanticscholar.org/CorpusID:258564608}\n}\n```\n\n```bibtex\n@article{Wang2022DeepNetST,\n    title   = {DeepNet: Scaling Transformers to 1, 000 Layers},\n    author  = {Hongyu Wang and Shuming Ma and Li Dong and Shaohan Huang and Dongdong Zhang and Furu Wei},\n    journal = {ArXiv},\n    year    = {2022},\n    volume  = {abs/2203.00555},\n    url     = {https://api.semanticscholar.org/CorpusID:247187905}\n}\n```\n\n```bibtex\n@inproceedings{Ainslie2023CoLT5FL,\n    title   = {CoLT5: Faster Long-Range Transformers with Conditional Computation},\n    author  = {Joshua Ainslie and Tao Lei and Michiel de Jong and Santiago Ontan'on and Siddhartha Brahma and Yury Zemlyanskiy and David Uthus and Mandy Guo and James Lee-Thorp and Yi Tay and Yun-Hsuan Sung and Sumit Sanghai},\n    year    = {2023}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyegomez%2Falphafold3","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkyegomez%2Falphafold3","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyegomez%2Falphafold3/lists"}