{"id":13530014,"url":"https://github.com/yangkky/Machine-learning-for-proteins","last_synced_at":"2025-04-01T17:31:40.339Z","repository":{"id":36070604,"uuid":"185913809","full_name":"yangkky/Machine-learning-for-proteins","owner":"yangkky","description":"Listing of papers about machine learning for proteins. ","archived":false,"fork":false,"pushed_at":"2024-05-31T14:43:27.000Z","size":654,"stargazers_count":1594,"open_issues_count":19,"forks_count":213,"subscribers_count":150,"default_branch":"master","last_synced_at":"2025-03-27T00:22:32.594Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yangkky.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-10T03:39:15.000Z","updated_at":"2025-03-21T19:24:36.000Z","dependencies_parsed_at":"2023-02-10T16:01:21.933Z","dependency_job_id":"c7b112c8-94e7-4c7e-80bd-0e34c4795698","html_url":"https://github.com/yangkky/Machine-learning-for-proteins","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yangkky%2FMachine-learning-for-proteins","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yangkky%2FMachine-learning-for-proteins/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yangkky%2FMachine-learning-for-proteins/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yangkky%2FMachine-learning-for-proteins/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yangkky","download_url":"https://codeload.github.com/yangkky/Machine-learning-for-proteins/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246680321,"owners_count":20816676,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T07:00:42.149Z","updated_at":"2025-04-01T17:31:40.002Z","avatar_url":"https://github.com/yangkky.png","language":null,"funding_links":[],"categories":["Uncategorized","Other lists","Related Resources","Others"],"sub_categories":["Uncategorized","\u003ccode\u003eRelated Repositories\u003c/code\u003e"],"readme":"## Papers on machine learning for proteins\n\n### Background\n\nWe recently released a [review](https://arxiv.org/abs/1811.10775) of machine learning methods in protein engineering, but the field changes so fast and there are so many new papers that any static document will inevitably be missing important work. This format also allows us to broaden the scope beyond engineering-specific applications. We hope that this will be a useful resource for people interested in the field.\n\nTo the best of our knowledge, this is the first public, collaborative list of machine learning papers on protein applications. We try to classify papers based on a combination of their applications and model type. If you have suggestions for other papers or categories, please make a pull request or issue!\n\n### Format\n\nWithin each category, papers are listed in reverse chronological order (newest first). Where possible, a link should be provided.\n\n### Categories\n\n[Reviews](#reviews)  \n[Tools and datasets](#tools-and-datasets)  \n[Machine-learning guided directed evolution](#machine-learning-guided-directed-evolution)  \n[Representation learning](#representation-learning)  \n[Unsupervised variant prediction](#unsupervised-variant-prediction)  \n[Generative models](#generative-models)  \n[Biophysics](#biophysics)  \n[Predicting stability](#predicting-stability)  \n[Predicting structure from sequence](#predicting-structure-from-sequence)  \n[Predicting sequence from structure](#predicting-sequence-from-structure)  \n[Classification, annotation, search, and alignments](#classification-annotation-search-and-alignments)  \n[Predicting interactions with other molecules](#predicting-interactions-with-other-molecules)  \n[Other supervised learning](#other-supervised-learning)\n\n### Reviews\n\n**Machine Learning for Protein Engineering.**  \nKadina E. Johnston, Clara Fannjiang, Bruce J. Wittmann, Brian L. Hie, Kevin K. Yang \u0026 Zachary Wu.  \n*Machine Learning in Molecular Sciences, October 2023.*  \n[[10.1007/978-3-031-37196-7_9](https://doi.org/10.1007/978-3-031-37196-7_9)]\n\n**Harnessing Generative AI to Decode Enzyme Catalysis and Evolution for Enhanced Engineering.**  \nWen Jun Xie, Arieh Warshel.  \n*Preprint, October 2023.*  \n[[10.1101/2023.10.10.561808](https://doi.org/10.1101/2023.10.10.561808)]\n\n**Machine Learning-Guided Protein Engineering.**  \nPetr Kouba, Pavel Kohout, Faraneh Haddadi, Anton Bushuiev, Raman Samusevich, Jiri Sedlar, Jiri Damborsky, Tomas Pluskal, Josef Sivic, and Stanislav Mazurenko.  \n*ACS Catalysis, October 2023.*  \n[[10.1021/acscatal.3c02743](https://doi.org/10.1021/acscatal.3c02743)]\n\n**Generative artificial intelligence for de novo protein design.**  \nAdam Winnifrith, Carlos Outeiral, Brian Hie.  \n*Preprint, October 2023.*  \n[[arxiv](https://arxiv.org/abs/2310.09685)]\n\n**Growing ecosystem of deep learning methods for modeling protein.**  \nJulia R. Rogers, Gergő Nikolényi, Mohammed AlQuraishi.  \n*Preprint, October 2023.*  \n[[arxiv](https://arxiv.org/abs/2310.06725)]\n\n**Exploring the Protein Sequence Space with Global Generative Models.**  \nSergio Romero-Romero, Sebastian Lindner, Noelia Ferruz.  \n*Preprint, May 2023.*  \n[[arxiv](https://arxiv.org/abs/2305.01941)]\n\n**Diffusion Models in Bioinformatics: A New Wave of Deep Learning Revolution in Action.**  \nZhiye Guo, Jian Liu, Yanli Wang, Mengrui Chen, Duolin Wang, Dong Xu, Jianlin Cheng.  \n*Preprint, February 2023.*  \n[[arxiv](https://arxiv.org/abs/2302.10907)]\n\n**Learning Epistasis and Residue Coevolution Patterns: Current Trends and Future Perspectives for Advancing Enzyme Engineering.**  \nMarcel Wittmund, Frederic Cadet and Mehdi D. Davari.  \n*ACS Catalysis, November 2022.*  \n[[10.1021/acscatal.2c01426](https://doi.org/10.1021/acscatal.2c01426)]\n\n**From sequence to function through structure: deep learning for protein design.**  \nNoelia Ferruz, Michael Heinzinger, Mehmet Akdel, Alexander Goncearenco, Luca Naef, Christian Dallago.  \n*Preprint, September 2022.*  \n[[10.1101/2022.08.31.505981](https://doi.org/10.1101/2022.08.31.505981)]\n\n**Computational protein design with evolutionary-based and physics-inspired modeling: current and future synergies.**  \nCyril Malbranke, David Bikard, Simona Cocco, Rémi Monasson, Jérôme Tubiana.  \n*Preprint, August 2022.*  \n[[arxiv](https://arxiv.org/abs/2208.13616)]\n\n**Deep learning approaches for conformational flexibility and switching properties in protein design.**  \nLucas S. P. Rudde, Mahdi Hijazi, Patrick Barth.  \n*Front. Mol. Biosci., August 2022.*  \n[[10.3389/fmolb.2022.928534](https://doi.org/10.3389/fmolb.2022.928534)]\n\n**Controllable protein design with language models.**  \nNoelia Ferruz, Birte Höker.  \n*Nature Machine Intelligence, June 2022.*  \n[[10.1038/s42256-022-00499-z](https://doi.org/10.1038/s42256-022-00499-z)]\n\n**The road to fully programmable protein catalysis.**  \nSarah L. Lovelock, Rebecca Crawshaw, Sophie Basler, Colin Levy, David Baker, Donald Hilvert, Anthony P. Green.  \n*Nature, June 2022.*  \n[[10.1038/s41586-022-04456-z](https://doi.org/10.1038/s41586-022-04456-z)]\n\n**Efficient Exploration of Sequence Space by Sequence-Guided Protein Engineering and Design.**  \nBen E. Clifton, Dan Kozome, and Paola Laurino.  \n*Biochemistry, March 2022.*  \n[[10.1021/acs.biochem.1c00757](https://doi.org/10.1021/acs.biochem.1c00757)]\n\n**Learning functional properties of proteins with language models.**  \nSerbulent Unsal, Heval Atas, Muammer Albayrak, Kemal Turhan, Aybar C. Acar \u0026 Tunca Doğan.  \n*Nature Machine Intelligence, March 2022.*  \n[[10.1038/s42256-022-00457-9](https://doi.org/10.1038/s42256-022-00457-9)]\n\n**Applications of artificial intelligence to enzyme and pathway design for metabolic engineering.**  \nWoo Dae Jang, Gi Bae Kim, Yeji Kim, Sang Yup Lee.  \n*Current Opinion in Biotechnology, February 2022.*  \n[[10.1016/j.copbio.2021.07.024](https://doi.org/10.1016/j.copbio.2021.07.024)]\n\n**Adaptive machine learning for protein engineering.**  \nBrian L. Hie, Kevin K. Yang.  \n*Current Opinion in Structural Biology, February 2022.*   \n[[10.1016/j.sbi.2021.11.002](https://doi.org/10.1016/j.sbi.2021.11.002)]\n\n**Protein sequence design with deep generative models.**  \nZachary Wu, Kadina E. Johnston, Frances H. Arnold, Kevin K. Yang.  \n*Current Opinion in Chemical Biology, December 2021*.  \n[[10.1016/j.cbpa.2021.04.004](https://doi.org/10.1016/j.cbpa.2021.04.004)]\n\n**AI challenges for predicting the impact of mutations on protein stability.**  \nFabrizio Pucci, Martin Schwersensky, Marianne Rooman.  \n*Preprint, November 2021.*  \n[[arxiv](https://arxiv.org/abs/2111.04208v1)]\n\n**Advances in machine learning for directed evolution.** \nBruce J Wittmann, Kadina E Johnston, Zachary Wu, Frances H Arnold.  \n*Current Opinion in Structural Biology, August 2021.*  \n[10.1016/j.sbi.2021.01.008](https://doi.org/10.1016/j.sbi.2021.01.008)]\n\n**A Brief Review of Machine Learning Techniques for Protein Phosphorylation Sites Prediction.**  \nFarzaneh Esmaili, Mahdi Pourmirzaei, Shahin Ramazi, Elham Yavari.\n*Preprint, August 2021.*  \n[[arxiv](https://arxiv.org/abs/2108.04951v1)]\n\n**Learning the protein language: Evolution, structure, and function.**  \nTristan Bepler, Bonnie Berger.  \n*Cell Systems, June 2021.*  \n[[10.1016/j.cels.2021.05.017](https://doi.org/10.1016/j.cels.2021.05.017)]\n\n**Representation learning applications in biological sequence analysis.**  \nHitoshi Iuchi, Taro Matsutani, Keisuke Yamada, Natsuki Iwano, Shunsuke Sumi, Shion Hosoda, Shitao Zhao, Tsukasa Fukunaga, Michiaki Hamada.  \n*Computational and Structural Biotechnology Journal, May 2021.*  \n[[10.1016/j.csbj.2021.05.039](https://doi.org/10.1016/j.csbj.2021.05.039)]\n\n**Data-driven computational protein design.**  \nVincent Frappier, Amy E. Keating.  \n*Current Opinion in Structural Biology, May 2021.*  \n[/10.1016/j.sbi.2021.03.009](https://doi.org/10.1016/j.sbi.2021.03.009)]\n\n**Machine learning in protein structure prediction.**  \nMohammed AlQuraishi.  \n*Current Opinion in Chemical Biology, May 2021.*  \n[[10.1016/j.cbpa.2021.04.005](https://doi.org/10.1016/j.cbpa.2021.04.005)]\n\n**Protein sequence-to-structure learning: Is this the end(-to-end revolution)?.**  \nElodie Laine, Stephan Eismann, Arne Elofsson, Sergei Grudinin.  \n*Preprint, May 2021.*  \n[[arxiv](https://arxiv.org/abs/2105.07407v1)]\n\n**Revolutionizing enzyme engineering through artificial intelligence and machine learning.**  \nNitu Singh, Sunny Malik, Anvita Gupta, Kinshuk Raj Srivastava.  \n*Emerging topics in life sciences, April 2021.*  \n[[10.1042/ETLS20200257](https://doi.org/10.1042/ETLS20200257)]\n\n**The language of proteins: NLP, machine learning \u0026 protein sequences.**  \nDan Ofer, Nadav Brandes, Michal Linial.  \n*Computational and Structural Biotechnology Journal, January 2021.*  \n[[10.1016/j.csbj.2021.03.022](https://doi.org/10.1016/j.csbj.2021.03.022)]\n\n**Chapter Twelve - Machine learning-assisted enzyme engineering.**  \nNiklas E. Siedhoff, Ulrich Schwaneberg and Mehdi D. Davari.  \n*Methods in Enzymology, November 2020.*  \n[[10.1016/bs.mie.2020.05.005](https://doi.org/10.1016/bs.mie.2020.05.005)]\n\n**Machine learning and AI-based approaches for bioactive ligand discovery and GPCR-ligand recognition.**  \nSebastian Raschka, Benjamin Kaufman.  \n*Preprint, January 2020.*  \n[[arXiv](https://arxiv.org/abs/2001.06545v2)]\n\n**Machine Learning in Enzyme Engineering.**   \nStanislav Mazurenko, Zbynek Prokop, Jiri Damborsky.   \n*ACS Catalysis, December 2019.*   \n[[10.1021/acscatal.9b04321](https://doi.org/10.1021/acscatal.9b04321)]  \n\n**Machine learning-guided directed evolution for protein engineering.**  \nKevin K. Yang, Zachary Wu, Frances H. Arnold.   \n*Nature Methods, July 2019.*   \n[[10.1038/s41592-019-0496-6](https://doi.org/10.1038/s41592-019-0496-6)]  \nPreprint available on [arxiv](https://arxiv.org/abs/1811.10775). \n\n**Evaluating Protein Transfer Learning with TAPE.**   \nRoshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen, John Canny, Pieter Abbeel, Yun S. Song.   \n*Preprint, June 2019.*   \n[[arxiv](https://arxiv.org/abs/1906.08230)]\n\n**Can Machine Learning Revolutionize Directed Evolution of Selective Enzymes?**  \nGuangyue Li, Yijie Dong, Manfred T. Reetz.  \n*Advanced Synthesis \u0026 Catalysis, March 2019.*  \n[[10.1002/adsc.201900149](https://doi.org/10.1002/adsc.201900149)]\n\n### Tools and datasets\n\n**Scaffold-Lab: Critical Evaluation and Ranking of Protein Backbone Generation Methods in A Unified Framework.**  \nZhuoqi Zheng, Bo Zhang, Bozitao Zhong, Kexin Liu, Zhengxin Li, Junjie Zhu, Jinyu Yu, Ting Wei, Hai-Feng Chen.  \n*Preprint, May 2024.*  \n[[10.1101/2024.02.10.579743](https://doi.org/10.1101/2024.02.10.579743)]\n\n**Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks.**  \nSean R. Johnson, Xiaozhi Fu, Sandra Viknander, Clara Goldin, Sarah Monaco, Aleksej Zelezniak, Kevin K. Yang.  \n*Nature Biotechnology, April 2024.*  \n[[10.1038/s41587-024-02214-2](https://doi.org/10.1038/s41587-024-02214-2)]\n\n**Deep indel mutagenesis reveals the impact of insertions and deletions on protein stability and function.**  \nMagdalena Topolska, Antoni Beltran, Ben Lehner.  \n*Preprint, October 2023.*  \n[[10.1101/2023.10.06.561180](https://doi.org/10.1101/2023.10.06.561180)]\n\n**OpenProteinSet: Training data for structural biology at scale.**  \nGustaf Ahdritz, Nazim Bouatta, Sachin Kadyan, Lukas Jarosch, Daniel Berenberg, Ian Fisk, Andrew M. Watkins, Stephen Ra, Richard Bonneau, Mohammed AlQuraishi.  \n*Preprint, August 2023.*  \n[[arxiv](https://arxiv.org/abs/2308.05326)]\n\n**Mega-scale experimental analysis of protein folding stability in biology and design.**  \nKotaro Tsuboyama, Justas Dauparas, Jonathan Chen, Elodie Laine, Yasser Mohseni Behbahani, Jonathan J. Weinstein, Niall M. Mangan, Sergey Ovchinnikov \u0026 Gabriel J. Rocklin.  \n*Nature, July 2023.*  \n[[10.1038/s41586-023-06328-6](https://doi.org/10.1038/s41586-023-06328-6)]\n\n**FLOP: Tasks for Fitness Landscapes Of Protein wildtypes.**  \nPeter Mørch Groth, Richard Michael, Jesper Salomon, Pengfei Tian, Wouter Boomsma.  \n*Preprint, June 2023.*  \n[[10.1101/2023.06.21.545880](https://doi.org/10.1101/2023.06.21.545880)]\n\n\n**PDBench: Evaluating Computational Methods for Protein-Sequence Design.**  \nLeonardo V Castorina, Rokas Petrenas, Kartic Subr, Christopher W Wood.  \n*Bioinformatics, January 2023.*  \n[[10.1093/bioinformatics/btad027](https://doi.org/10.1093/bioinformatics/btad027)]]\n\n\n**The energetic and allosteric landscape for KRAS inhibition.**  \nChenchun Weng, Andre J. Faure, Ben Lehner.  \n*Preprint, December 2022.*  \n[[10.1101/2022.12.06.519122](https://doi.org/10.1101/2022.12.06.519122)]\n\n**ManyFold: an efficient and flexible library for training and validating protein folding models.**  \nAmelia Villegas-Morcillo, Louis Robinson, Arthur Flajolet, Thomas D Barrett.  \n*Bioinformatics, December 2022.*  \n[[10.1093/bioinformatics/btac773](https://doi.org/10.1093/bioinformatics/btac773)]\n\n**Mega-scale experimental analysis of protein folding stability in biology and protein design.**  \nKotaro Tsuboyama, Justas Dauparas, Jonathan Chen, Elodie Laine, Yasser Mohseni Behbahani, Jonathan J. Weinstein, Niall M. Mangan, Sergey Ovchinnikov, Gabriel J. Rocklin.  \n*Preprint, December 2022.*  \n[[10.1101/2022.12.06.519132](https://doi.org/10.1101/2022.12.06.519132)]\n\n**Tuned Fitness Landscapes for Benchmarking Model-Guided Protein Design.**  \nNeil Thomas, Atish Agarwala, David Belanger, Yun S. Song, Lucy J. Colwell.  \n*Preprint, October 2022.*  \n[[10.1101/2022.10.28.514293](https://doi.org/10.1101/2022.10.28.514293)]\n\n**Deep mutational scanning and machine learning reveal structural and molecular rules governing allosteric hotspots in homologous proteins.**  \nMegan Leander, Zhuang Liu, Qiang Cui, Srivatsan Raman.  \n*Elife, October 2022.*  \n[[10.7554/eLife.79932](https://doi.org/10.7554/eLife.79932)]\n\n**Randomized gates eliminate bias in sort-seq assays.**  \nBrian L. Trippe, Buwei Huang, Erika A. DeBenedictis, Brian Coventry, Nicholas Bhattacharya, Kevin K. Yang, David Baker, Lorin Crawford.  \n*Protein Science, August 2022.*   \n[[10.1002/pro.4401](https://doi.org/10.1002/pro.4401)]\n\n**Uni-Fold: An Open-Source Platform for Developing Protein Folding Models beyond AlphaFold.**  \nZiyao Li, Xuyang Liu, Weijie Chen, Fan Shen, Hangrui Bi, Guolin Ke, Linfeng Zhang.  \n*Preprint, August 2022.*  \n[[10.1101/2022.08.04.502811](https://doi.org/10.1101/2022.08.04.502811)]\n\n**PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding.**  \nMinghao Xu, Zuobai Zhang, Jiarui Lu, Zhaocheng Zhu, Yangtian Zhang, Chang Ma, Runcheng Liu, Jian Tang.  \n*Preprint, June 2022.*  \n[[arxiv](https://arxiv.org/abs/2206.02096)]\n\n**FLIP: Benchmark tasks in fitness landscape inference for proteins.**  \nChristian Dallago, Jody Mou, Kadina E. Johnston, Bruce J. Wittmann, Nicholas Bhattacharya, Samuel Goldman, Ali Madani, Kevin K. Yang.  \n*NeurIPS 2021 Datasets and Benchmarks Track, December 2021.*  \n[[10.1101/2021.11.09.467890](https://doi.org/10.1101/2021.11.09.467890)]\n\n**evSeq: Cost-Effective Amplicon Sequencing of Every Variant in a Protein Library.**  \nBruce J. Wittmann, Kadina E. Johnston, Patrick J. Almhjell, Frances H. Arnold.  \n*Preprint, November 2021.*  \n[[10.1101/2021.11.18.469179](https://doi.org/10.1101/2021.11.18.469179)]\n\n**The immuneML ecosystem for machine learning analysis of adaptive immune receptor repertoires.**  \nMilena Pavlović, Lonneke Scheffer, Keshav Motwani, Chakravarthi Kanduri, Radmila Kompova, Nikolay Vazov, Knut Waagan, Fabian L. M. Bernal, Alexandre Almeida Costa, Brian Corrie, Rahmad Akbar, Ghadi S. Al Hajj, Gabriel Balaban, Todd M. Brusko, Maria Chernigovskaya, Scott Christley, Lindsay G. Cowell, Robert Frank, Ivar Grytten, Sveinung Gundersen, Ingrid Hobæk Haff, Eivind Hovig, Ping-Han Hsieh, Günter Klambauer, Marieke L. Kuijjer, Christin Lund-Andersen, Antonio Martini, Thomas Minotto, Johan Pensar, Knut Rand, Enrico Riccardi, Philippe A. Robert, Artur Rocha, Andrei Slabodkin, Igor Snapkov, Ludvig M. Sollid, Dmytro Titov, Cédric R. Weber, Michael Widrich, Gur Yaari, Victor Greiff \u0026 Geir Kjetil Sandve.  \n*Nature Machine Intelligence, November 2021.*  \n[[10.1038/s42256-021-00413-z](https://doi.org/10.1038/s42256-021-00413-z)]\n\n**Learned embeddings from deep learning to visualize and predict protein sets.**  \nChristian Dallago, Konstantin Schütze, Michael Heinzinger, Tobias Olenyi, Maria Littmann, Amy X Lu, Kevin K Yang, Seonwoo Min, Sungroh Yoon, James T Morton, Burkhard Rost.  \n*Current Protocols, May 2021.*  \n[[10.1002/cpz1.113](https://doi.org/10.1002/cpz1.113)]\n\n**Population-Based Black-Box Optimization for Biological Sequence Design.**  \nChristof Angermueller, David Belanger, Andreea Gane, Zelda Mariet, David Dohan, Kevin Murphy, Lucy Colwell, D Sculley.  \nICML, July 2020.  \n[[ICML](https://proceedings.icml.cc/static/paper_files/icml/2020/6338-Paper.pdf)]\n\n**Selene: a PyTorch-based deep learning library for sequence data.**  \nKathleen M. Chen, Evan M. Cofer, Jian Zhou, Olga G. Troyanskaya.  \n*Nature Methods, March 2019.*  \n[[10.1038/s41592-019-0360-8](https://doi.org/10.1038/s41592-019-0360-8)]\n\n### Machine-learning guided directed evolution\n\n**Enhanced Sequence-Activity Mapping and Evolution of Artificial Metalloenzymes by Active Learning.**  \nTobias Vornholt, Mojmír Mutný, Gregor W. Schmidt, Christian Schellhaas, Ryo Tachibana, Sven Panke, Thomas R. Ward, Andreas Krause*, and Markus Jeschek.  \n*ACS Central Science, May 2024.*  \n[[10.1021/acscentsci.4c00258](https://doi.org/10.1021/acscentsci.4c00258)]\n\n**Machine Learning and Directed Evolution of Base Editing Enzymes.**  \nRamiro M. Perrotta, Svenja Vinke, Raphaël Ferreira, Michaël Moret, Ahmed Mahas, Anush Chiappino-Pepe, Lisa M. Riedmayr, Anna-Thérèse Mehra, Louisa S. Lehmann, George M. Church.  \n*Preprint, May 2024.*  \n[[10.1101/2024.05.17.594556](https://doi.org/10.1101/2024.05.17.594556)]\n\n**Aligning protein generative models with experimental fitness via Direct Preference Optimization.**  \nTalal Widatalla, Rafael Rafailov, Brian Hie.  \n*Preprint, May 2024.*  \n[[10.1101/2024.05.20.595026](https://doi.org/10.1101/2024.05.20.595026)]\n\n**Computational stabilization of a non-heme iron enzyme enables efficient evolution of new function.**  \nBrianne R. King, Kiera H. Sumida, Jessica L. Caruso, David Baker, Jesse G. Zalatan.  \n*Preprint, May 2024.*  \n[[10.1101/2024.04.18.590141](https://doi.org/10.1101/2024.04.18.590141)]\n\n**Microdroplet screening rapidly profiles a biocatalyst to enable its AI-assisted engineering.**  \nMaximilian Gantz, Simon V. Mathis, Friederike E. H. Nintzel, Paul J. Zurek, Tanja Knaus, Elie Patel, Daniel Boros, Friedrich-Maximilian Weberling, Matthew R. A. Kenneth, Oskar J. Klein, Elliot J. Medcalf, Jacob Moss, Michael Herger, Tomasz S. Kaminski, Francesco G. Mutti, Pietro Lio, Florian Hollfelder.  \n*Preprint, April 2024.*   \n[[10.1101/2024.04.08.588565](https://doi.org/10.1101/2024.04.08.588565)]\n\n**Automated in vivo enzyme engineering accelerates biocatalyst optimization.**  \nEnrico Orsi, Lennart Schada von Borzyskowski, Stephan Noack, Pablo I. Nikel \u0026 Steffen N. Lindner.  \n*Nature Communications, April 2024.*  \n[[10.1038/s41467-024-46574-4](https://doi.org/10.1038/s41467-024-46574-4)]\n\n**Engineering highly active and diverse nuclease enzymes by combining machine learning and ultra-high-throughput screening.**  \nNeil Thomas, David Belanger, Chenling Xu, Hanson Lee, Kathleen Hirano, Kosuke Iwai, Vanja Polic, Kendra D Nyberg, Kevin Hoff, Lucas Frenz, Charlie A Emrich, Jun W Kim, Mariya Chavarha, Abi Ramanan, Jeremy J Agresti, Lucy J Colwell.  \n*Preprint, April 2024.*  \n[[10.1101/2024.03.21.585615](https://doi.org/10.1101/2024.03.21.585615)]\n\n**Interpretable and explainable predictive machine learning models for data-driven protein engineering.**  \nDavid Medina-Ortiz, Ashkan Khalifeh, Hoda Anvari-Kazemabad and Mehdi D. Davari.  \n*Preprint, March 2024.*  \n[[arxiv](https://doi.org/10.1101/2024.02.18.580860)]\n\n**Machine Learning-Assisted Engineering of Light, Oxygen, Voltage Photoreceptor Adduct Lifetime.**  \nStefanie Hemmer, Niklas Erik Siedhoff, Sophia Werner, Gizem Ölçücü, Ulrich Schwaneberg, Karl-Erich Jaeger, Mehdi D. Davari, and Ulrich Krauss\n*JACS Au, November 2023.*  \n[[10.1021/jacsau.3c00440](https://doi.org/10.1021/jacsau.3c00440)]\n\n**Biophysics-based protein language models for protein engineering.**  \nSam Gelman, Bryce Johnson, Chase Freschlin, Sameer D'Costa, Anthony Gitter, Philip A. Romero.  \n*Preprint, March 2024.*  \n[[10.1101/2024.03.15.585128](https://doi.org/10.1101/2024.03.15.585128)]\n\n**Improving protein expression, stability, and function with ProteinMPNN.**  \nKiera H. Sumida, Reyes Núñez-Franco, Indrek Kalvet, Samuel J. Pellock, Basile I. M. Wicky, Lukas F. Milles, Justas Dauparas, Jue Wang, Yakov Kipnis, Noel Jameson, Alex Kang, Joshmyn De La Cruz, Banumathi Sankaran, Asim K. Bera, Gonzalo Jiménez-Osés, David Baker.  \n*Preprint, October 2023.*  \n[[10.1101/2023.10.03.560713](https://doi.org/10.1101/2023.10.03.560713)]\n\n**Deploying synthetic coevolution and machine learning to engineer protein-protein interactions.**  \nAerin Yang, Kevin M Jude, Ben Lai, Mason Minot, Anna M Kocyla, Caleb R Glassman, Daisuke Nishimiya, Yoon Seok Kim, Sai T Reddy, Aly A Khan, K Christopher Garcia.  \n*Science, July 2023*  \n[[10.1126/science.adh1720](https://doi.org/10.1126/science.adh1720)]\n\n**Bidirectional Learning for Offline Model-based Biological Sequence Design.**  \nCan Chen, Yingxue Zhang, Xue Liu, Mark Coates.  \n*Preprint, January 2023.*  \n[[arxiv](https://arxiv.org/abs/2301.02931)]\n\n**Plug \u0026 Play Directed Evolution of Proteins with Gradient-based Discrete MCMC.**  \nPatrick Emami, Aidan Perreault, Jeffrey Law, David Biagioni, Peter C. St. John.  \n*Preprint, December 2022.*  \n[[arxiv](https://arxiv.org/abs/2212.09925)]\n\n**Combinatorial assembly and design of enzymes.**  \nRosalie Lipsh-Sokolik, Olga Khersonsky, Sybrin P. Schröder, Casper de Boer, Shlomo-Yakir Hoch, Gideon J. Davies, Hermen S. Overkleeft, Sarel J. Fleishman.  \n*Preprint, December 2022.*  \n[[10.1101/2022.09.17.508230](https://doi.org/10.1101/2022.09.17.508230)]\n\n**Forecasting labels under distribution-shift for machine-guided sequence design.**  \nLauren Berk Wheelock, Stephen Malina, Jeffrey Gerold, Sam Sinai.  \n*Preprint, November 2022*  \n[[arxiv](https://arxiv.org/abs/2211.10422)]\n\n**PropertyDAG: Multi-objective Bayesian optimization of partially ordered, mixed-variable properties for biological sequence design.**  \nJi Won Park, Samuel Stanton, Saeed Saremi, Andrew Watkins, Henri Dwyer, Vladimir Gligorijevic, Richard Bonneau, Stephen Ra, Kyunghyun Cho.  \n*Preprint, October 2022.*  \n[[arxiv](https://arxiv.org/abs/2210.04096)]\n\n**Designed active-site library reveals thousands of functional GFP variants.**  \nJonathan Yaacov Weinstein, Carlos Marti Gomez Aldaravi, Rosalie Lipsh-Sokolik, Shlomo Yakir Hoch, Demian Liebermann, Reinat Nevo, Haim Weissman, Ekaterina Petrovich-Kopitman, David Margulies, Dmitry Ivankov, David McCandlish, Sarel Jacob Fleishman.  \n*Preprint, October 2022.*  \n[[10.1101/2022.10.11.511732](https://doi.org/10.1101/2022.10.11.511732)]\n\n**Accelerated rational PROTAC design via deep learning and molecular simulations.**  \nShuangjia Zheng, Youhai Tan, Zhenyu Wang, Chengtao Li, Zhiqing Zhang, Xu Sang, Hongming Chen \u0026 Yuedong Yang.  \n*Nature Machine Intelligence, September 2022.*  \n[[10.1038/s42256-022-00527-y](https://doi.org/10.1038/s42256-022-00527-y)]\n\n**Inferring protein fitness landscapes from laboratory evolution experiments.**  \nSameer D’Costa, Emily C. Hinds, Chase R. Freschlin, Hyebin Song, Philip A. Romero.  \n*Preprint, September 2022.*  \n[[10.1101/2022.09.01.506224](https://doi.org/10.1101/2022.09.01.506224)]\n\n**Antibody optimization enabled by artificial intelligence predictions of binding affinity and naturalness.**  \nSharrol Bachas, Goran Rakocevic, David Spencer, Anand V. Sastry, Robel Haile, John M. Sutton, George Kasun, Andrew Stachyra, Jahir M. Gutierrez, Edriss Yassine, Borka Medjo, Vincent Blay, Christa Kohnert, Jennifer T. Stanton, Alexander Brown, Nebojsa Tijanic, Cailen McCloskey, Rebecca Viazzo, Rebecca Consbruck, Hayley Carter, Simon Levine, Shaheed Abdulhaqq, Jacob Shaul, Abigail B. Ventura, Randal S. Olson, Engin Yapici, Joshua Meier, Sean McClain, Matthew Weinstock, Gregory Hannum, Ariel Schwartz, Miles Gander, Roberto Spreafico.  \n*Preprint, August 2022.*  \n[[10.1101/2022.08.16.504181](https://doi.org/10.1101/2022.08.16.504181)]\n\n**Co-optimization of therapeutic antibody affinity and specificity using machine learning models that generalize to novel mutational space.**  \nEmily K. Makowski, Patrick C. Kinnunen, Jie Huang, Lina Wu, Matthew D. Smith, Tiexin Wang, Alec A. Desai, Craig N. Streu, Yulei Zhang, Jennifer M. Zupancic, John S. Schardt, Jennifer J. Linderman, Peter M. Tessier.  \n*Nature communications, July 2022.*  \n[[10.1038/s41467-022-31457-3](https://doi.org/10.1038/s41467-022-31457-3)]\n\n**A hybrid model combining evolutionary probability and machine learning leverages data-driven protein engineering.**  \nAlexander-Maurice Illig, Niklas E. Siedhoff, Ulrich Schwaneberg and Mehdi D. Davari.  \n*Preprint, June 2022.*  \n[[arxiv](https://doi.org/10.1101/2022.06.07.495081 )]\n\n**Heterogeneity of the GFP fitness landscape and data-driven protein design.**  \nLouisa Gonzalez Somermeyer, Aubin Fleiss, Alexander S Mishin, Nina G Bozhanova, Anna A Igolkina, Jens Meiler, Maria-Elisenda Alaball Pujol, Ekaterina V Putintseva, Karen S Sarkisyan.  \n*eLife, May 2022.*  \n[[10.7554/eLife.75842](https://doi.org/10.7554/eLife.75842)]\n\n**De novo protein design by deep network hallucination.**  \nIvan Anishchenko, Samuel J. Pellock, Tamuka M. Chidyausiku, Theresa A. Ramelot, Sergey Ovchinnikov, Jingzhou Hao, Khushboo Bafna, Christoffer Norn, Alex Kang, Asim K. Bera, Frank DiMaio, Lauren Carter, Cameron M. Chow, Gaetano T. Montelione \u0026 David Baker.  \n*Nature, December 2021.*  \n[[10.1038/s41586-021-04184-w](https://doi.org/10.1038/s41586-021-04184-w)]\n\n**Informed training set design enables efficient machine learning-assisted directed protein evolution.**  \nBruce J. Wittmann, Yisong Yue, Frances H. Arnold.  \n*Cell Systems, November 2021.*  \n[[10.1016/j.cels.2021.07.008](https://doi.org/10.1016/j.cels.2021.07.008)]\n\n\n\n**Machine learning-based library design improves packaging and diversity of adeno-associated virus (AAV) libraries.**  \nDanqing Zhu, David H. Brookes, Akosua Busia, Ana Carneiro, Clara Fannjiang, Galina Popova, David Shin, Edward F. Chang, Tomasz J. Nowakowski, Jennifer Listgarten, David. V. Schaffer.  \n*Preprint, November 2021.*  \n[[10.1101/2021.11.02.467003](https://doi.org/10.1101/2021.11.02.467003)]\n\n**Optimal Design of Stochastic DNA Synthesis Protocols based on Generative Sequence Models.**  \nEli N. Weinstein, Alan N. Amin, Will Grathwohl, Daniel Kassler, Jean Disset, Debora S. Marks.  \n*Preprint, October 2021.*  \n[[10.1101/2021.10.28.466307](https://doi.org/10.1101/2021.10.28.466307)]\n\n**Unifying Likelihood-free Inference with Black-box Sequence Design and Beyond.**  \nDinghuai Zhang, Jie Fu, Yoshua Bengio, Aaron Courville.  \n*Preprint, October 2021.*  \n[[arxiv](https://arxiv.org/abs/2110.03372)]\n\n**Machine-Directed Evolution of an Imine Reductase for Activity and Stereoselectivity.**  \nEric J. Ma, Elina Siirola, Charles Moore, Arkadij Kummer, Markus Stoeckli, Michael Faller, Caroline Bouquet, Fabian Eggimann, Mathieu Ligibel, Dan Huynh, Geoffrey Cutler, Luca Siegrist, Richard A. Lewis, Anne-Christine Acker, Ernst Freund, Elke Koch, Markus Vogel, Holger Schlingensiepen, Edward J. Oakeley, and Radka Snajdrova.  \n*ACS Catalysis, September 2021.*  \n[[10.1021/acscatal.1c02786](https://doi.org/10.1021/acscatal.1c02786)]\n\n**PyPEF—An Integrated Framework for Data-Driven Protein Engineering.**  \nNiklas E. Siedhoff, Alexander-Maurice Illig, Ulrich Schwaneberg, and Mehdi D. Davari*\n*J. Chem. Inf. Model, July 2021.*  \n[[10.1021/acs.jcim.1c00099](https://doi.org/10.1021/acs.jcim.1c00099)]\n\n**Conservative Objective Models for Effective Offline Model-Based Optimization.**  \nBrandon Trabucco, Aviral Kumar, Xinyang Geng, Sergey Levine.  \n*Preprint, July 2021.*  \n[[arxiv](https://arxiv.org/abs/2107.06882v1)]\n\n**Deep Extrapolation for Attribute-Enhanced Generation.**  \nAlvin Chan, Ali Madani, Ben Krause, Nikhil Naik.  \n*Preprint, July 2021.*  \n[[arxiv](https://arxiv.org/abs/2107.02968)]\n\n**Effective Surrogate Models for Protein Design with Bayesian Optimization.**  \nNate Gruver, Samuel Stanton,  Polina Kirichenko,  Marc Finzi, Phillip Maffettone, Vivek Myers,\nEmily Delaney, Peyton Greenside, Andrew Gordon Wilson.  \n*2021 ICML Workshop on Computational Biology, July 2021.*  \n[[pdf](https://icml-compbio.github.io/2021/papers/WCBICML2021_paper_61.pdf)] \n\n**Bayesian optimization with evolutionary and structure-based regularization for directed protein evolution.**  \nTrevor S. Frisby, Christopher James Langmead.  \n*Algorithms for Molecular Biology, July 2021.*  \n[[10.1186/s13015-021-00195-4](https://doi.org/10.1186/s13015-021-00195-4)]\n\n**Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design.**  \nAdam Foster, Desi R. Ivanova, Ilyas Malik, Tom Rainforth.  \n*Preprint, July 2021.*  \n[[arxiv](https://arxiv.org/abs/2103.02438)]\n\n**In silico proof of principle of machine learning-based antibody design at unconstrained scale.**  \nRahmad Akbar,Philippe A. Robert,Cédric R. Weber,Michael Widrich,Robert Frank,Milena Pavlović,Lonneke Scheffer,Maria Chernigovskaya,Igor Snapkov,Andrei Slabodkin,Brij Bhushan Mehta,Enkelejda Miho,Fridtjof Lund-Johansen,Jan Terje Andersen,Sepp Hochreiter, Ingrid Hobæk Haff,Günter Klambauer,Geir Kjetil Sandve,Victor Greiff.  \n*Preprint, July 2021.*  \n[[10.1101/2021.07.08.451480](https://doi.org/10.1101/2021.07.08.451480)]\n\n**Deep diversification of an AAV capsid protein by machine learning.**  \nDrew H. Bryant, Ali Bashir, Sam Sinai, Nina K. Jain, Pierce J. Ogden, Patrick F. Riley, George M. Church, Lucy J. Colwell \u0026 Eric D. Kelsic.  \n*Nature Biotechnology, February 2021.*  \n[[10.1038/s41587-020-00793-4](https://doi.org/10.1038/s41587-020-00793-4)]\n\n**Deep Uncertainty and the Search for Proteins.**  \nZelda Mariet, Ghassen Jerfel, Zi Wang, Christof Angermüller, David Belanger, Suhani Vora, Maxwell Bileschi, Lucy Colwell, D Sculley, Dustin Tran, Jasper Snoek.  \n*NeurIPS 2020 ML for Molecules Workshop, December 2020.*   \n[[pdf](https://ml4molecules.github.io/papers2020/ML4Molecules_2020_paper_23.pdf)]\n\n**Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol production.**  \nJonathan C. Greenhalgh, Sarah A. Fahlberg, Brian F. Pfleger, Philip A. Romero.  \n*Preprint, May 2021.*  \n[[10.1101/2021.05.21.445192](https://doi.org/10.1101/2021.05.21.445192)]\n\n**Large-scale design and refinement of stable proteins using sequence-only models.**  \nJedediah M. Singer, Scott Novotney, Devin Strickland, Hugh K. Haddox, Nicholas Leiby, Gabriel J. Rocklin, Cameron M. Chow, Anindya Roy, Asim K. Bera, Francis C. Motta, … Eric Klavins.  \n*Preprint, March 2021.*  \n[[10.1101/2021.03.12.435185](https://doi.org/10.1101/2021.03.12.435185)]\n\n**AdaLead: A simple and robust adaptive greedy search algorithm for sequence design.**  \nSam Sinai, Richard Wang, Alexander Whatley, Stewart Slocum, Elina Locane, Eric D. Kelsic. \n*Preprint, October 2020.*  \n[[arxiv](http://arxiv.org/abs/2010.02141)] \n\n**The NK Landscape as a Versatile Benchmark for Machine Learning Driven Protein Engineering.**  \nAdam C. Mater, Mahakaran Sandhu, Colin Jackson.  \n *Preprint, October 2020.*  \n[[10.1101/2020.09.30.319780](https://doi.org/10.1101/2020.09.30.319780)]\n\n**Learning with uncertainty for biological discovery and design.**  \nBrian Hie, Bryan Bryson, Bonnie Berger.  \n*Preprint, August 2020.*  \n[[10.1101/2020.08.11.247072](https://doi.org/10.1101/2020.08.11.247072)]\n\n**Population-Based Black-Box Optimization for Biological Sequence Design.**  \nChristof Angermueller, David Belanger, Andreea Gane, Zelda Mariet, David Dohan, Kevin Murphy, Lucy Colwell, D Sculley.  \n*ICML, July 2020.*  \n[[ICML](https://proceedings.icml.cc/static/paper_files/icml/2020/6338-Paper.pdf)]\n\n**Autofocused oracles for model-based design.**  \nClara Fannjiang, Jennifer Listgarten.  \n*Preprint, June 2020.*  \n[[arxiv](https://arxiv.org/abs/2006.08052)]\n\n**Domain Extrapolation via Regret Minimization.**  \nWengong Jin, Regina Barzilay, Tommi Jaakkola.  \n*Preprint, June 2020.*  \n[[arxiv](https://arxiv.org/abs/2006.03908)]\n\n**Fast differentiable DNA and protein sequence optimization for molecular design.**  \nJohannes Linder, Georg Seelig.  \n*Preprint, May 2020.*  \n[[arxiv](https://arxiv.org/abs/2005.11275)]\n\n**A Deep Dive into Machine Learning Models for Protein Engineering.**  \nYuting Xu, Deeptak Verma, Robert P Sheridan, Andy Liaw, Junshui Ma, Nicholas\nMarshall, John McIntosh, Edward C. Sherer, Vladimir Svetnik, Jennifer Johnston.  \n*Journal of Chemical Information and Modeling, April 2020.*  \n[[10.1021/acs.jcim.0c00073](https://doi.org/10.1021/acs.jcim.0c00073)]\n\n**Evolutionary context-integrated deep sequence modeling for protein engineering.**  \nYunan Luo, Lam Vo, Hantian Ding, Yufeng Su, Yang Liu, Wesley Wei Qian, Huimin Zhao, Jian Peng.  \n*Preprint, January 2020.*  \n[[10.1101/2020.01.16.908509](https://doi.org/10.1101/2020.01.16.908509)]\n\n**Biological Sequence Design using Batched Bayesian Optimization.**  \nDavid Belanger, Suhani Vora, Zelda Mariet, Ramya Deshpande, David Dohan, Christof Angermueller, Kevin Murphy, Olivier Chapelle, Lucy Colwell.  \n*NeurIPS Workshop on Machine Learning and the Physical Sciences, December 2019.*  \n[[ML4PS](https://ml4physicalsciences.github.io/files/NeurIPS_ML4PS_2019_141.pdf)]\n\n**Model Inversion Networks for Model-Based Optimization.**  \nAviral Kumar, Sergey Levine\n*Preprint, December 2019.*  \n[[arxiv](https://arxiv.org/abs/1912.13464v1)]\n\n**Interpreting mutational effects predictions, one substitution at a time.**  \nC. K. Sruthi, Meher K. Prakash.  \n*bioRxiv, December 2019*  \n[[10.1101/867812](https://doi.org/10.1101/867812)]\n\n**A structure-based deep learning framework for protein engineering.**  \nRaghav Shroff, Austin W. Cole, Barrett R. Morrow, Daniel J. Diaz, Isaac Donnell, Jimmy Gollihar, Andrew D. Ellington, Ross Thyer.  \n*Preprint, November 2019.*  \n[[10.1101/833905](https://doi.org/10.1101/833905)]\n\n**Comprehensive AAV capsid fitness landscape reveals a viral gene and enables machine-guided design.**  \nPierce J. Ogden, Eric D. Kelsic, Sam Sinai, George M. Church.  \n*Science, November 2019.*  \n[[10.1126/science.aaw2900](https://doi.org/10.1126/science.aaw2900)]\n\n**Machine learning-guided channelrhodopsin engineering enables minimally-invasive optogenetics.**  \nClaire N. Bedbrook, Kevin K. Yang, J. Elliott Robinson, Viviana Gradinaru, Frances H Arnold.    \n*Nature Methods, October 2019.*  \n[[10.1038/s41592-019-0583-8](https://doi.org/10.1038/s41592-019-0583-8)]   \nPreprint available on [[bioRxiv](https://www.biorxiv.org/content/10.1101/565606v1)]\n\n**Batched Stochastic Bayesian Optimization via Combinatorial Constraints Design**.   \nKevin K. Yang, Yuxin Chen, Alycia Lee, Yisong Yue.   \n*International Conference on Artificial Intelligence and Statistics (AISTATS), April 2019.*  \n[[arxiv](https://arxiv.org/abs/1904.08102)] [[PMLR](http://proceedings.mlr.press/v89/yang19c.html)]\n\n**Machine learning-assisted directed protein evolution with combinatorial libraries.**  \nZachary Wu, S. B. Jennifer Kan, Russell D. Lewis, Bruce J. Wittmann, Frances H. Arnold.  \n*PNAS, April 2019.*  \n[[10.1073/pnas.1901979116](https://doi.org/10.1073/pnas.1901979116)]\n\n**Conditioning by adaptive sampling for robust design.**  \nDavid H. Brookes, Hahnbeom Park, Jennifer Listgarten.  \n*Preprint, January 2019.*  \n[[arxiv](https://arxiv.org/abs/1901.10060)]\n\n**A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes.**  \nFrédéric Cadet, Nicolas Fontaine, Guangyue Li, Joaquin Sanchis, Matthieu Ng Fuk Chong, Rudy Pandjaitan, Iyanar Vetrivel, Bernard Offmann, Manfred T. Reetz.  \n*Scientific Reports, November 2018.*  \n[[10.1038/s41598-018-35033-y](https://doi.org/10.1038/s41598-018-35033-y)]\n\n**Design by adaptive sampling.**  \nDavid H. Brookes, Jennifer Listgarten.  \n*Preprint, October 2018.*  \n[[arxiv](https://arxiv.org/abs/1810.03714)]\n\n**Machine-Learning-Guided Mutagenesis for Directed Evolution of Fluorescent Proteins.**  \nYutaka Saito, Misaki Oikawa, Hikaru Nakazawa, Teppei Niide, Tomoshi Kameda, Koji Tsuda, and Mitsuo Umetsu.  \n*ACS Synthetic Biology, August 2018.*  \n[[10.1021/acssynbio.8b00155](https://doi.org/10.1021/acssynbio.8b00155)]\n\n**Toward machine-guided design of proteins.**  \nSurojit Biswas,  Gleb Kuznetsov, Pierce J. Ogden, Nicholas J. Conway, Ryan P. Adams, George M. Church.  \n*Preprint, June 2018.*  \n[[10.1101/337154](https://doi.org/10.1101/337154)] [[bioRxiv](https://www.biorxiv.org/content/10.1101/337154v1)]\n\n**Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions.**  \nAnvita Gupta, James Zou.  \n*Preprint, April 2018.*  \n[[arxiv](https://arxiv.org/abs/1804.01694)]\n\n**Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization.**  \nClaire N. Bedbrook, Kevin K. Yang, Austin J. Rice, Viviana Gradinaru, Frances H. Arnold.  \n*PLOS Computational Biology, October 2017.*  \n[[10.1371/journal.pcbi.1005786](https://doi.org/10.1371/journal.pcbi.1005786)]\n\n**Exploring sequence-function space of a poplar glutathione transferase using designed information-rich gene variants.**  \nYaman Musdal, Sridhar Govindarajan, Bengt Mannervik.  \n*Protein Engineering, Design, and Selection, August 2017.*  \n[[10.1093%2Fprotein%2Fgzx045](https://dx.doi.org/10.1093%2Fprotein%2Fgzx045)]\n\n**Navigating the protein fitness landscape with Gaussian processes.**  \nPhilip A. Romero, Andreas Krause, Frances H. Arnold.  \n*PNAS, January 2013.*  \n[[10.1073/pnas.1215251110](https://doi.org/10.1073/pnas.1215251110)]\n\n**Engineering proteinase K using machine learning and synthetic genes.**  \nJun Liao, Manfred K. Warmuth, Sridhar Govindarajan, Jon E. Ness, Rebecca P Wang, Claes Gustafsson, Jeremy Minshull.  \n*BMC Biotechnology, March 2007.*  \n[[10.1186/1472-6750-7-16](https://doi.org/10.1186/1472-6750-7-16)]\n\n**Improving catalytic function by ProSAR-driven enzyme evolution.**  \nRichard J. Fox, S. Christopher Davis, Emily C. Mundorff, Lisa M. Newman, Vesna Gavrilovic, Steven K. Ma, Loleta M. Chung, Charlene Ching, Sarena Tam, Sheela Muley, John Grate, John Gruber, John C. Whitman, Roger A. Sheldon, Gjalt W. Huisman.  \n*Nature Biotechnology, February 2007.*  \n[[Nature Biotechnology](https://www.nature.com/articles/nbt1286)]\n\n### Representation learning\n\n**Feature Reuse and Scaling: Understanding Transfer Learning with Protein Language Models.**  \nFrancesca-Zhoufan Li, Ava P. Amini, Yisong Yue, Kevin K. Yang, Alex X. Lu. \n*ICML, July 2024.*  \n[[10.1101/2024.02.05.578959](https://doi.org/10.1101/2024.02.05.578959)] \n\n**ProSST: Protein Language Modeling with Quantized Structure and Disentangled Attention.**  \nMingchen Li, Yang Tan, Xinzhu Ma, Bozitao Zhong, Huiqun Yu, Ziyi Zhou, Wanli Ouyang, Bingxin Zhou, Liang Hong, Pan Tan.  \n*Preprint, May 2024.*  \n[[10.1101/2024.04.15.589672](https://doi.org/10.1101/2024.04.15.589672)]\n\n**Biophysics-based protein language models for protein engineering.**  \nSam Gelman, Bryce Johnson, Chase Freschlin, Sameer D’Costa, Anthony Gitter, Philip A. Romero.  \n*Preprint, March 2024.*  \n[[10.1101/2024.03.15.585128](https://doi.org/10.1101/2024.03.15.585128)]\n\n**Convolutions are competitive with transformers for protein sequence pretraining.**  \nKevin K. Yang, Nicolo Fusi, Alex X. Lu.  \n*Cell Systems, February 2024.*  \n[[10.1016/j.cels.2024.01.008](https://doi.org/10.1016/j.cels.2024.01.008)]\n\n**Two sequence- and two structure-based ML models have learned different aspects of protein biochemistry.**  \nAnastasiya V. Kulikova, Daniel J. Diaz, Tianlong Chen, T. Jeffrey Cole, Andrew D. Ellington \u0026 Claus O. \n*Scientific reports, August 2023.*  \n[[10.1038/s41598-023-40247-w](https://doi.org/10.1038/s41598-023-40247-w)]\n\n**Domain-PFP: Protein Function Prediction Using Function-Aware Domain Embedding Representations.**  \nNabil Ibtehaz, Yuki Kagaya, Daisuke Kihara.  \n*Preprint, August 2023.*  \n[[10.1101/2023.08.23.554486](https://doi.org/10.1101/2023.08.23.554486)]\n\n**Contextual protein and antibody encodings from equivariant graph transformers.**  \nSai Pooja Mahajan, Jeffrey A. Ruffolo, Jeffrey J. Gray.  \n*Preprint, July 2023.*  \n[[10.1101/2023.07.15.549154](https://doi.org/10.1101/2023.07.15.549154)]\n\n**Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling.**  \nAhmed Elnaggar, Hazem Essam, Wafaa Salah-Eldin, Walid Moustafa, Mohamed Elkerdawy, Charlotte Rochereau, Burkhard Rost.  \n*Preprint, June 2023.*  \n[[arxiv](https://arxiv.org/abs/2301.06568)]\n\n**Structure-aware protein self-supervised learning.**  \nCan (Sam) Chen, Jingbo Zhou, Fan Wang, Xue Liu, Dejing Dou.  \n*Bioinformatics, April 2023.*  \n[[10.1093/bioinformatics/btad189](https://doi.org/10.1093/bioinformatics/btad189)]\n\n**Lightweight Contrastive Protein Structure-Sequence Transformation.**  \nJiangbin Zheng, Ge Wang, Yufei Huang, Bozhen Hu, Siyuan Li, Cheng Tan, Xinwen Fan, Stan Z. Li.  \n*Preprint, March 2023.*  \n[[arxiv](https://arxiv.org/abs/2303.11783)]\n\n**A Systematic Study of Joint Representation Learning on Protein Sequences and Structures.**  \nZuobai Zhang, Chuanrui Wang, Minghao Xu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, Jian Tang.  \n*Preprint, March 2023.*  \n[[arxiv](https://arxiv.org/abs/2303.06275)]\n\n**Structure-informed Language Models Are Protein Designers.**  \nZaixiang Zheng, Yifan Deng, Dongyu Xue, Yi Zhou, Fei Ye, Quanquan Gu.  \n*Preprint, Feb 2023.*  \n[[10.1101/2023.02.03.526917](https://doi.org/10.1101/2023.02.03.526917)]\n\n**Retrieved Sequence Augmentation for Protein Representation Learning.**  \nChang Ma, Haiteng Zhao, Lin Zheng, Jiayi Xin, Qintong Li, Lijun Wu, Zhihong Deng, Yang Lu, Qi Liu, Lingpeng Kong.  \n*Preprint, Feb 2023.*  \n[[10.1101/2023.02.22.529597](https://doi.org/10.1101/2023.02.22.529597)]\n\n**ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts.**  \nMinghao Xu, Xinyu Yuan, Santiago Miret, Jian Tang.  \n*Preprint, January 2023.*  \n[[arxiv](https://arxiv.org/abs/2301.12040)]\n\n**Codon language embeddings provide strong signals for protein engineering.**  \nCarlos Outeiral, Charlotte M. Deane.  \n*Preprint, December 2022.*  \n[[10.1101/2022.12.15.519894](https://doi.org/10.1101/2022.12.15.519894)]\n\n**When Geometric Deep Learning Meets Pretrained Protein Language Models.**  \nFang Wu, Yu Tao, Dragomir Radev, Jinbo Xu.  \n*Preprint, December 2022.*  \n[[arxiv](https://arxiv.org/abs/2212.03447)]\n\n**Contrastive learning of protein representations with graph neural networks for structural and functional annotations.**  \nJiaqi Luo, Yunan Luo.  \n*Preprint, December 2022.*  \n[[10.1101/2022.11.29.518451](https://doi.org/10.1101/2022.11.29.518451)]\n\n**Training self-supervised peptide sequence models on artificially chopped proteins\n.**  \nGil Sadeh, Zichen Wang, Jasleen Grewal, Huzefa Rangwala, Layne Price.  \n*Preprint, November 2022.*  \n[[arxiv](https://arxiv.org/abs/2211.06428)]\n\n**Masked inverse folding with sequence transfer for protein representation learning.**  \nKevin K. Yang, Niccolò Zanichelli, Hugh Yeh.  \n*Protein Engineering, Design, and Selection, October 2022.*  \n[[10.1093/protein/gzad015](https://doi.org/10.1093/protein/gzad015)]\n\n**Language models of protein sequences at the scale of evolution enable accurate structure prediction.**  \nZeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.  \n*Preprint, July 2022.*  \n[[10.1101/2022.07.20.500902](https://doi.org/10.1101/2022.07.20.500902)]\n\n**Advancing protein language models with linguistics: a roadmap for improved interpretability.**  \nMai Ha Vu, Rahmad Akbar, Philippe A. Robert, Bartlomiej Swiatczak, Victor Greiff, Geir Kjetil Sandve, Dag Trygve Truslew Haug.  \n*Preprint, July 2022.*  \n[[arxiv](https://arxiv.org/abs/2207.00982)]\n\n**Self-supervised deep learning encodes high-resolution features of protein subcellular localization.**  \nHirofumi Kobayashi, Keith C. Cheveralls, Manuel D. Leonetti \u0026 Loic A. Royer.  \n*Nature Methods, July 2022.*  \n[[10.1038/s41592-022-01541-z](https://doi.org/10.1038/s41592-022-01541-z)]\n\n**COLLAPSE: A representation learning framework for identification and characterization of protein structural sites.**  \nAlexander Derry, Russ B. Altman.  \n*Preprint, July 2022.*  \n[[10.1101/2022.07.20.500713](https://doi.org/10.1101/2022.07.20.500713)]\n\n**CoSP: Co-supervised pretraining of pocket and ligand.**  \nZhangyang Gao, Cheng Tan, Lirong Wu, Stan Z. Li.  \n*Preprint, June 2022.*  \n[[arxiv](https://arxiv.org/abs/2206.12241)]\n\n**Pre-training Protein Models with Molecular Dynamics Simulations for Drug Binding.**  \nWu F, Zhang Q, Radev D, Wang Y, Jin X, Jiang Y, Li SZ, Niu Z.  \n*Preprint, June 2022.*  \n[[10.21203/rs.3.rs-1566483/v1](https://doi.org/10.21203/rs.3.rs-1566483/v1)]\n\n**Exploring evolution-based \u0026-free protein language models as protein function predictors.**  \nMingyang Hu, Fajie Yuan, Kevin K. Yang, Fusong Ju, Jin Su, Hui Wang, Fei Yang, Qiuyang Ding.  \n*Preprint, June 2022.*  \n[[arxiv](https://arxiv.org/abs/2206.06583)]\n\n**Evolutionary velocity with protein language models.**  \nBrian L. Hie, Kevin K. Yang, Peter S. Kim.  \n*Cell Systems, April 2022.*  \n[[10.1016/j.cels.2022.01.003](https://doi.org/10.1016/j.cels.2022.01.003)]\n\n**Identification of Enzymatic Active Sites with Unsupervised Language Modeling.**  \nLoïc Kwate Dassi, Matteo Manica, Daniel Probst, Philippe Schwaller, Yves Gaetan Nana Teukam, Teodoro Laino.  \n*Preprint, November 2021.*  \n[[10.33774/chemrxiv-2021-m20gg](https://doi.org/10.33774/chemrxiv-2021-m20gg)]\n\n**Artificial Intelligence Guided Conformational Mining of Intrinsically Disordered Proteins.**  \nAayush Gupta, Souvik Dey,  Huan-Xiang Zhou.  \n*Preprint, November 2021.*  \n[[10.1101/2021.11.21.469457](https://doi.org/10.1101/2021.11.21.469457)]\n\n**Deciphering the language of antibodies using self-supervised learning.**  \nJinwoo Leem,  Laura S. Mitchell,  James H.R. Farmery,  Justin Barton,  Jacob D. Galson.  \n*Preprint, November 2021.*  \n[[10.1101/2021.11.10.468064](https://doi.org/10.1101/2021.11.10.468064)]\n\n**Pre-training Co-evolutionary Protein Representation via A Pairwise Masked Language Model.**  \nLiang He, Shizhuo Zhang, Lijun Wu, Huanhuan Xia, Fusong Ju, He Zhang, Siyuan Liu, Yingce Xia, Jianwei Zhu, Pan Deng, Bin Shao, Tao Qin, Tie-Yan Liu.  \n*Preprint, October 2021.*  \n[[arxiv](https://arxiv.org/abs/2110.15527)]\n\n**Neural Distance Embeddings for Biological Sequences.**  \nGabriele Corso, Rex Ying, Michal Pándy, Petar Veličković, Jure Leskovec, Pietro Liò.  \n*Preprint, September 2021.*  \n[[arxiv](https://arxiv.org/abs/2109.09740)]\n\n**Biologically relevant transfer learning improves transcription factor binding prediction.**  \nGherman Novakovsky, Manu Saraswat, Oriol Fornes, Sara Mostafavi \u0026 Wyeth W. Wasserman.  \n*Genome Biology, September 2021.*  \n[[10.1186/s13059-021-02499-5](https://doi.org/10.1186/s13059-021-02499-5)]\n\n**Toward More General Embeddings for Protein Design: Harnessing Joint Representations of Sequence and Structure.**  \nSanaa Mansoor, Minkyung Baek, Umesh Madan, Eric Horvitz.  \n*Preprint, September 2021.*  \n[[10.1101/2021.09.01.458592](https://doi.org/10.1101/2021.09.01.458592)]\n\n**Hydrogen bonds meet self-attention: all you need for general-purpose protein structure embedding.**  \nCheng Chen, Yuguo Zha, Daming Zhu, Kang Ning, Xuefeng Cui.  \n*Preprint, August 2021.*  \n[[10.1101/2021.01.31.428935](https://doi.org/10.1101/2021.01.31.428935)]\n\n**Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning.**  \nAlex X Lu, Amy X Lu, Iva Pritišanac, Taraneh Zarin, Julie D Forman-Kay, Alan M Moses.  \n*Preprint, July 2021.*  \n[[10.1101/2021.07.29.454330](https://doi.org/10.1101/2021.07.29.454330)] \n\n**Inferring a Continuous Distribution of Atom Coordinates from Cryo-EM Images using VAEs.**  \nDan Rosenbaum, Marta Garnelo, Michal Zielinski, Charlie Beattie, Ellen Clancy, Andrea Huber, Pushmeet Kohli, Andrew W. Senior, John Jumper, Carl Doersch, S. M. Ali Eslami, Olaf Ronneberger, Jonas Adler.  \n*Preprint, June 2021.*. \n[[arxiv](https://arxiv.org/abs/2106.14108)]\n\n**Pretraining model for biological sequence data.**  \nBosheng Song, Zimeng Li, Xuan Lin, Jianmin Wang, Tian Wang, Xiangzheng Fu.  \n*Briefings in Functional Genomics, May 2021*.  \n[[10.1093/bfgp/elab025](https://doi.org/10.1093/bfgp/elab025)]\n\n**ProteinBERT: A universal deep-learning model of protein sequence and function.**  \nNadav Brandes, Dan Ofer, Yam Peleg, Nadav Rappoport, Michal Linial.  \n*Preprint, May 2021.*  \n[[10.1101/2021.05.24.445464](https://doi.org/10.1101/2021.05.24.445464)]\n\n**Random Embeddings and Linear Regression can Predict Protein Function.**  \nTianyu Lu, Alex X. Lu, Alan M. Moses.  \n*Preprint, April 2021.*  \n[[arxiv](https://arxiv.org/abs/2104.14661)]\n\n**Combining evolutionary and assay-labelled data for protein fitness prediction.**  \nChloe Hsu, Hunter Nisonoff, Clara Fannjiang, Jennifer Listgarten.  \n*Preprint, March 2021.*  \n[[10.1101/2021.03.28.437402](https://doi.org/10.1101/2021.03.28.437402)]\n\n**MSA Transformer.**  \nRoshan Rao, Jason Liu, Robert Verkuil, Joshua Meier, John F. Canny, Pieter Abbeel, Tom Sercu, Alexander Rives.  \n*Preprint, February 2021.*  \n[[10.1101/2021.02.12.430858](https://doi.org/10.1101/2021.02.12.430858)]\n\n**Improving Generalizability of Protein Sequence Models with Data Augmentations.**  \nHongyu Shen, Layne C. Price, Taha Bahadori, Franziska Seeger.  \n*Preprint, February 2021.*  \n[[10.1101/2021.02.18.431877](https://doi.org/10.1101/2021.02.18.431877)]\n\n**Capturing Protein Domain Structure and Function Using Self-Supervision on Domain Architectures.**  \nDamianos P. Melidis, Wolfgang Nejdl.  \n*Algorithms, January 2021.*  \n[[10.3390/a14010028](https://doi.org/10.3390/a14010028)]\n\n**Adversarial Contrastive Pre-training for Protein Sequences.**  \nMatthew B. A. McDermott, Brendan Yap, Harry Hsu, Di Jin, Peter Szolovits. \n*Preprint, January 2021.*  \n[[arxiv](http://arxiv.org/abs/2102.00466)] \n\n**Fast end-to-end learning on protein surfaces.**  \nFreyr Sverrisson, Jean Feydy, Bruno E. Correia, Michael M. Bronstein.  \n*Preprint, December 2020.*  \n[[10.1101/2020.12.28.424589](https://doi.org/10.1101/2020.12.28.424589)]\n\n**Transformer protein language models are unsupervised structure learners.**  \nRoshan Rao, Sergey Ovchinnikov, Joshua Meier, Alexander Rives, Tom Sercu.  \n*Preprint, December 2020.*  \n[[10.1101/2020.12.15.422761](https://doi.org/10.1101/2020.12.15.422761)]\n\n**Self-Supervised Representation Learning of Protein Tertiary Structures (PtsRep): Protein Engineering as A Case Study.**  \nJunwen Luo, Yi Cai, Jialin Wu, Hongmin Cai, Xiaofeng Yang, Zhanglin Lin.  \n*Preprint, December 2020.*  \n[[10.1101/2020.12.22.423916](https://doi.org/10.1101/2020.12.22.423916)]\n\n**What is a meaningful representation of protein sequences?.**\nNicki Skafte Detlefsen, Søren Hauberg, Wouter Boomsma.  \n*Preprint, November 2020.*  \n[[arxiv](https://arxiv.org/abs/2012.02679v3)]\n\n**Profile Prediction: An Alignment-Based Pre-Training Task for Protein Sequence Models.**  \nPascal Sturmfels, Jesse Vig, Ali Madani, Nazneen Fatema Rajani. \n*Preprint, November 2020.*  \n[[arxiv](http://arxiv.org/abs/2012.00195)] \n\n**Fixed-Length Protein Embeddings using Contextual Lenses.**  \nAmir Shanehsazzadeh, David Belanger, David Dohan. \n*Preprint, October 2020.*  \n[[arxiv](http://arxiv.org/abs/2010.15065)] \n\n**Evaluation of Methods for Protein Representation Learning: A Quantitative Analysis.**  \nSerbulent Unsal, Heval Ataş, Muammer Albayrak, Kemal Turhan, Aybar C. Acar, Tunca Doğan.  \n*Preprint, October 2020.*  \n[[10.1101/2020.10.28.359828](https://doi.org/10.1101/2020.10.28.359828)]\n\n**Self-Supervised Contrastive Learning of Protein Representations By Mutual Information Maximization.**  \nAmy X. Lu, Haoran Zhang, Marzyeh Ghassemi, Alan Moses.  \n*Preprint, September 2020.*  \n[[10.1101/2020.09.04.283929](https://doi.org/10.1101/2020.09.04.283929)]\n\n**ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing.**  \nAhmed Elnaggar, Michael Heinzinger, Christian Dallago, Ghalia Rehawi, Yu Wang, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Martin Steinegger, Debsindhu Bhowmik, Burkhard Rost.  \n*Preprint, July 2020.*  \n[[10.1101/2020.07.12.199554](https://doi.org/10.1101/2020.07.12.199554)]\n\n**Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function.**  \nAmelia Villegas-Morcillo, Stavros Makrodimitris, Roeland van Ham, Angel M. Gomez, Victoria Sanchez, Marcel Reinders.  \n*Preprint, April 2020.*  \n[[10.1101/2020.04.07.028373](https://doi.org/10.1101/2020.04.07.028373)]\n\n**Site2Vec: a reference frame invariant algorithm for vector embedding of protein-ligand binding sites.**  \nArnab Bhadra, Kalidas Y.  \n*Preprint, March 2020.*  \n[[arxiv](https://arxiv.org/abs/2003.08149v1)]\n\n**Evolutionary context-integrated deep sequence modeling for protein engineering.**  \nYunan Luo, Lam Vo, Hantian Ding, Yufeng Su, Yang Liu, Wesley Wei Qian, Huimin Zhao, Jian Peng.  \n*Preprint, January 2020.*  \n[[10.1101/2020.01.16.908509](https://doi.org/10.1101/2020.01.16.908509)]\n\n**Sequence representations and their utility for predicting protein-protein interactions.**  \nDhananjay Kimothi, Pravesh Biyani, James M Hogan.  \n*Preprint, December 2019.*  \n[[10.1101/2019.12.31.890699](https://doi.org/10.1101/2019.12.31.890699)]\n\n**Language modelling for biological sequences – curated datasets and baselines.**  \nJose Juan Almagro Armenteros, Alexander Rosenberg Johansen, Ole Winther, Henrik Nielsen.  \n*Preprint, December 2019*.  \n[[alrojo.github.io](https://alrojo.github.io/media/publications/LMProteins/preprint.pdf)]\n\n**Deciphering protein evolution and fitness landscapes with latent space models**  \nXinqiang Ding, Zhengting Zou, Charles L. Brooks III.  \n*Nature Communications, December 2019.*  \n[[10.1038/s41467-019-13633-0](https://doi.org/10.1038/s41467-019-13633-0)]\n\n**End-to-end multitask learning, from protein language to protein features without alignments.**  \nAhmed Elnaggar, Michael Heinzinger, Christian Dallago, Burkhard Rost.  \n*Preprint, December 2019.*  \n[[10.1101/864405](https://doi.org/10.1101/864405)]\n\n**Unified rational protein engineering with sequence-only deep representation learning.**  \nEthan C. Alley, Grigory Khimulya, Surojit Biswas, Mohammed AlQuraishi, George M. Church.  \n*Nature Methods, October 2019*  \n[[10.1038/s41592-019-0598-1](https://doi.org/10.1038/s41592-019-0598-1)]\n\n**Structure-Based Function Prediction using Graph Convolutional Networks.**  \nVladimir Gligorijevic, P. Douglas Renfrew, Tomasz Kosciolek, Julia Koehler Leman, Kunghyun Cho, Tommi Vatanen, Daniel Berenberg, Bryn Taylor, Ian M. Fisk, Ramnik J. Xavier, Rob Knight, Richard Bonneau.  \n*Preprint, October 2019*.  \n[[0.1101/786236](https://doi.org/10.1101/786236)]\n\n**Modeling the language of life – Deep Learning Protein Sequences.**  \nMichael Heinzinger, Ahmed Elnaggar, Yu Wang, Christian Dallago, Dmitrii Nechaev, Florian Matthes, Burkhard Rost.  \n*Preprint, September 2019.*  \n[[10.1101/614313](https://doi.org/10.1101/614313)]\n\n**Augmenting Protein Network Embeddings with Sequence Information.**  \nHassan Kane, Mohamed K. Coulibali, Pelkins Ajanoh, Ali Abdallah.  \n*Preprint, August 2019.*  \n[[10.1101/730481](https://doi.org/10.1101/730481)]\n\n**Universal Deep Sequence Models for Protein Classification.**  \nNils Strodthoff, Patrick Wagner, Markus Wenzel, Wojciech Samek.  \n*Preprint, July 2019.*  \n[[10.1101/704874](https://doi.org/10.1101/704874)]\n\n**DeepPrime2Sec: Deep Learning for Protein Secondary Structure Prediction from the Primary Sequences.**  \nEhsaneddin Asgari, Nina Poerner, Alice C. McHardy,  Mohammad R.K. Mofrad.  \n*Preprint, July 2019.*  \n[[10.1101/705426](https://doi.org/10.1101/705426)]\n\n**A Self-Consistent Sonification Method to Translate Amino Acid Sequences into Musical Compositions and Application in Protein Design Using Artificial Intelligence.**  \nChi-Hua Yu, Zhao Qin, Francisco J. Martin-Martinez, Markus J. Buehler.  \n*ACS Nano, June 2019.*  \n[[10.1021/acsnano.9b02180](https://doi.org/10.1021/acsnano.9b02180)]\n\n**Evaluating Protein Transfer Learning with TAPE.**  \nRoshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen, John Canny, Pieter Abbeel, Yun S. Song.  \n*Preprint, June 2019.*  \n[[arxiv](https://arxiv.org/abs/1906.08230)]\n\n**Leveraging implicit knowledge in neural networks for functional dissection and engineering of proteins.**  \nJulius Upmeier zu Belzen, Thore Bürgel, Stefan Holderbach, Felix Bubeck, Lukas Adam, Catharina Gandor, Marita Klein, Jan Mathony, Pauline Pfuderer, Lukas Platz, Moritz Przybilla, Max Schwendemann, Daniel Heid, Mareike Daniela Hoffmann, Michael Jendrusch, Carolin Schmelas, Max Waldhauer, Irina Lehmann, Dominik Niopek, Roland Eils.  \n*Nature Machine Intelligence, May 2019.*  \n[[Nature Machine Intelligence](https://www.nature.com/articles/s42256-019-0049-9)]\n\n**Modeling the Language of Life – Deep Learning Protein Sequences.**   \nMichael Heinzinger, Ahmed Elnaggar, Yu Wang, Christian Dallago, Dmitrii Nechaev, Florian Matthes, Burkhard Rost.  \n*Preprint, May 2019.*  \n[[10.1101/614313](https://doi.org/10.1101/614313)] [[bioRxiv](https://www.biorxiv.org/content/10.1101/614313v2)]\n\n**Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences.**  \nAlexander Rives, Siddharth Goyal, Joshua Meier, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, Rob Fergus.  \n*Preprint, April 2019.*  \n[[10.1101/622803](https://doi.org/10.1101/622803)] [[bioRxiv](https://www.biorxiv.org/content/10.1101/622803v1)]\n\n**Learning protein constitutive motifs from sequence data.**  \nJérôme Tubiana, Simona Cocco, Rémi Monasson.  \n*eLife, March 2019.*  \n[[10.7554/eLife.39397](https://doi.org/10.7554/eLife.39397)]\n\n**Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (DiMotif) and sequence embedding (ProtVecX).**  \nEhsaneddin Asgari, Alice C. McHardy, Mohammad R. K. Mofrad.  \n*Scientific Reports, March 2019.*  \n[[10.1038/s41598-019-38746-w](https://doi.org/10.1038/s41598-019-38746-w)]\n\n**Learning protein sequence embeddings using information from structure.**  \nTristan Bepler, Bonnie Berger.  \n*International Conference on Learning Representations, February 2019.*  \n[[ICLR](https://arxiv.org/abs/1902.08661)] \n\n**Application of fourier transform and proteochemometrics principles to protein engineering.**  \nFrédéric Cadet, Nicolas Fontaine, Iyanar Vetrivel, Matthieu Ng Fuk Chong, Olivier Savriama, Xavier Cadet, Philippe Charton.  \n*BMC Bioinformatics, October 2018.*  \n[[10.1186/s12859-018-2407-8](https://doi.org/10.1186/s12859-018-2407-8)]\n\n**Learned protein embeddings for machine learning.**  \nKevin K Yang, Zachary Wu, Claire N Bedbrook, Frances H Arnold  \n*Bioinformatics, August 2018*  \n[[10.1093/bioinformatics/bty178](https://doi.org/10.1093/bioinformatics/bty178)] \n\n**Deep Semantic Protein Representation for Annotation, Discovery, and Engineering.**  \nAriel S Schwartz, Gregory J Hannum, Zach R Dwiel, Michael E Smoot, Ana R Grant, Jason M Knight, Scott A Becker, Jonathan R Eads, Matthew C LaFave, Harini Eavani, Yinyin Liu, Arjun K Bansal, Toby H Richardson   \n*Preprint, July 2018*   \n[[10.1101/365965](https://doi.org/10.1101/365965)]  \n\n**Improved Descriptors for the Quantitative Structure–Activity Relationship Modeling of Peptides and Proteins.**  \nMark H. Barley, Nicholas J. Turner, Royston Goodacre.  \n*Journal of Chemical Information and Modeling, January 2018.*  \n[[10.1021/acs.jcim.7b00488](https://doi.org/10.1021/acs.jcim.7b00488)]\n\n**Variational auto-encoding of protein sequences.**  \nSam Sinai, Eric Kelsic, George M. Church, Martin A. Nowak  \n*Preprint, December 2017*  \n[[arxiv](https://arxiv.org/abs/1712.03346)]\n\n**Predicting Protein Binding Affinity With Word Embeddings and Recurrent Neural Networks.**  \nCarlo Mazzaferro.  \n*Preprint, April 2017.*  \n[[10.1101/128223](https://doi.org/10.1101/128223)] [[bioRxiv](https://www.biorxiv.org/node/37703.abstract)]\n\n**dna2vec: Consistent vector representations of variable-length k-mers.**  \nPatrick Ng  \n*Preprint, January 2017*  \n[[arxiv](https://arxiv.org/abs/1701.06279)]  \n\n**Distributed Representations for Biological Sequence Analysis.**  \nDhananjay Kimothi, Akshay Soni, Pravesh Biyani, James M. Hogan  \n*Preprint, August 2016*  \n[[arxiv](https://arxiv.org/abs/1608.05949)]  \n\n**ProFET: Feature engineering captures high-level protein functions.**  \nDan Ofer, Michal Linial.  \n*Bioinformatics, June 2015.*  \n[[10.1093/bioinformatics/btv345](https://doi.org/10.1093/bioinformatics/btv345)]\n\n**AAindex: amino acid index database, progress report 2008.**  \nShuichi Kawashima, Piotr Pokarowski, Maria Pokarowska, Andrzej Kolinski, Toshiaki Katayama, Minoru Kanehisa.  \n*Nucleic Acids Research, January 2008.*  \n[[10.1093/nar/gkm998](https://doi.org/10.1093/nar/gkm998)] \n\n### Unsupervised variant prediction\n\n**Predicted mechanistic impacts of human protein missense variants.**  \nJurgen Janes, Marc Muller, Senthil Selvaraj, Diogo Manoel, James Stephenson, Catarina Goncalves, Aleix Lafita, Benjamin Polacco, Kirsten Obernier, Kaur Alasoo, Manuel C Lemos, Nevan Krogan, Maria Martin, Luis R. Saraiva, David Burke, Pedro Beltrao.  \n*Preprint, May 2024.*  \n[[10.1101/2024.05.29.596373](https://doi.org/10.1101/2024.05.29.596373)]\n\n**Decoding molecular mechanisms for loss of function variants in the human proteome.**  \nMatteo Cagiada, Nicolas Jonsson, Kresten Lindorff-Larsen.  \n*Preprint, May 2024.*  \n[[10.1101/2024.05.21.595203](https://doi.org/10.1101/2024.05.21.595203)]\n\n**AlphaFold2 can predict single-mutation effects.**  \nJohn M. McBride, Konstantin Polev, Amirbek Abdirasulov, Vladimir Reinharz, Bartosz A. Grzybowski, Tsvi Tlusty.  \n*Nature, October 2023.*  \n[[10.1101/2022.04.14.488301](https://doi.org/10.1101/2022.04.14.488301)]\n\n**Learning from prepandemic data to forecast viral escape.**  \nNicole N. Thadani, Sarah Gurev, Pascal Notin, Noor Youssef, Nathan J. Rollins, Daniel Ritter, Chris Sander, Yarin Gal \u0026 Debora S. Marks.  \n*Nature, October 2023.*  \n[[10.1038/s41586-023-06617-0](https://doi.org/10.1038/s41586-023-06617-0)]\n\n**Genome-wide prediction of disease variant effects with a deep protein language model.**  \nNadav Brandes, Grant Goldman, Charlotte H. Wang, Chun Jimmie Ye \u0026 Vasilis Ntranos.  \n*Nature Genetics, August 2023.*  \n[[10.1038/s41588-023-01465-0](https://doi.org/10.1038/s41588-023-01465-0)]\n\n**Protein Fitness Prediction is Impacted by the Interplay of Language Models, Ensemble Learning, and Sampling Methods.**  \nMehrsa Mardikoraem, Daniel Woldring.  \n*Preprint, February 2023.*  \n[[10.1101/2023.02.09.527362](https://doi.org/10.1101/2023.02.09.527362)]\n\n**Predicting Immune Escape with Pretrained Protein Language Model Embeddings.**  \nKyle Swanson, Howard Chang, James Zou.  \n*Preprint, December 2022.*  \n[[10.1101/2022.11.30.518466](https://doi.org/10.1101/2022.11.30.518466)]\n\n**Protein language model rescue mutations highlight variant effects and structure in clinically relevant genes.**  \nOnuralp Soylemez, Pablo Cordero.  \n*Preprint, November 2022.*  \n[[arxiv](https://arxiv.org/abs/2211.10000)]\n\n**Updated benchmarking of variant effect predictors using deep mutational scanning.**  \nBenjamin J. Livesey, Joseph A. Marsh.  \n*Preprint, November 2022.*  \n[[10.1101/2022.11.19.517196](https://doi.org/10.1101/2022.11.19.517196)]\n\n**Accurate Mutation Effect Prediction using RoseTTAFold.**  \nSanaa Mansoor, Minkyung Baek, David Juergens, Joseph L. Watson, David Baker.  \n*Preprint, November 2022.*  \n[[10.1101/2022.11.04.515218](https://doi.org/10.1101/2022.11.04.515218)]\n\n**Evotuning protocols for Transformer-based variant effect prediction on multi-domain proteins.**  \nHideki Yamaguchi, Yutaka Saito.  \n*Briefings in Bioinformatics, November 2021.*  \n[[10.1093/bib/bbab234](https://doi.org/10.1093/bib/bbab234)]\n\n**Disease variant prediction with deep generative models of evolutionary data.**  \nJonathan Frazer, Pascal Notin, Mafalda Dias, Aidan Gomez, Joseph K Min, Kelly Brock, Yarin Gal, Debora S Marks.  \n*Nature, November 2021.*  \n[[10.1038/s41586-021-04043-8](https://doi.org/10.1038/s41586-021-04043-8)]\n\n**Language models enable zero-shot prediction of the effects of mutations on protein function.**  \nJoshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu, Alexander Rives.  \n*Preprint, July 2021.*  \n[[10.1101/2021.07.09.450648](https://doi.org/10.1101/2021.07.09.450648)]\n\n**Unsupervised inference of protein fitness landscape from deep mutational scan.**  \nJorge Fernandez-de-Cossio-Diaz, Guido Uguzzoni, Andrea Pagnani.  \n*Preprint, March 2020.*  \n[[10.1101/2020.03.18.996595](https://doi.org/10.1101/2020.03.18.996595)]\n\n**Deep generative models of genetic variation capture the effects of mutations.**  \nAdam J. Riesselman, John B. Ingraham, Debora S. Marks.   \n*Nature Methods, September 2018*  \n[[10.1038/s41592-018-0138-4](https://doi.org/10.1038/s41592-018-0138-4)] \n\n**Variational auto-encoding of protein sequences.**  \nSam Sinai, Eric Kelsic, George M. Church, Martin A. Nowak  \n*Preprint, December 2017*  \n[[arxiv](https://arxiv.org/abs/1712.03346)] \n\n### Generative models\n\n**Cramming Protein Language Model Training in 24 GPU Hours.**  \nNathan C. Frey, Taylor Joren, Aya Abdelsalam Ismail, Allen Goodman, Richard Bonneau, Kyunghyun Cho, Vladimir Gligorijević.  \n*Preprint, May 2024.*  \n[[10.1101/2024.05.14.594108](https://doi.org/10.1101/2024.05.14.594108)]\n\n**Learning the Language of Protein Structure.**  \nBenoit Gaujac, Jérémie Donà, Liviu Copoiu, Timothy Atkinson, Thomas Pierrot, Thomas D. Barrett.  \n*Preprint, May 2024.*  \n[[arxiv](https://arxiv.org/abs/2405.15840)]\n\n**Out of Many, One: Designing and Scaffolding Proteins at the Scale of the Structural Universe with Genie 2.**  \nYeqing Lin, Minji Lee, Zhao Zhang, Mohammed AlQuraishi\n*Preprint, May 2024.*  \n[[arxiv](https://arxiv.org/abs/2405.15489)]\n\n**ProtMamba: a homology-aware but alignment-free protein state space model.**  \nDamiano Sgarbossa, Cyril Malbranke, Anne-Florence Bitbol.  \n*Preprint, May 2024.*  \n[[10.1101/2024.05.24.595730](https://doi.org/10.1101/2024.05.24.595730)]\n\n**ProtT3: Protein-to-Text Generation for Text-based Protein Understanding.**   \nZhiyuan Liu, An Zhang, Hao Fei, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua.  \n*Preprint, May 2024.*  \n[[arxiv](https://arxiv.org/abs/2405.12564)]\n\n**The Continuous Language of Protein Structure.**  \nLukas Billera, Anton Oresten, Aron Stålmarck, Kenta Sato, Mateusz Kaduk, Ben Murrell.  \n*Preprint, May 2024.*  \n[[10.1101/2024.05.11.593685](https://doi.org/10.1101/2024.05.11.593685)]\n\n**Design of highly functional genome editors by modeling the universe of CRISPR-Cas sequences.**  \nJeffrey A. Ruffolo, Stephen Nayfach, Joseph Gallagher, Aadyot Bhatnagar, Joel Beazer, Riffat Hussain, Jordan Russ, Jennifer Yip, Emily Hill, Martin Pacesa, Alexander J. Meeske, Peter Cameron, Ali Madani.  \n*Preprint, April 2024.*  \n[[10.1101/2024.04.22.590591](https://doi.org/10.1101/2024.04.22.590591)]\n\n**Diffusion on language model embeddings for protein sequence generation.**  \nViacheslav Meshchaninov, Pavel Strashnov, Andrey Shevtsov, Fedor Nikolaev, Nikita Ivanisenko, Olga Kardymon, Dmitry Vetrov.  \n*Preprint, March 2024.*  \n[[arxiv](https://arxiv.org/abs/2403.03726)]\n\n**Protein structure generation via folding diffusion.**  \nKevin E. Wu, Kevin K. Yang, Rianne van den Berg, James Y. Zou, Alex X. Lu, Ava P. Amini.  \n*Nature Communications, February 2024.*  \n[[10.1038/s41467-024-45051-2](https://doi.org/10.1038/s41467-024-45051-2)]\n\n**Proteus: exploring protein structure generation for enhanced designability and efficiency.**  \nChentong Wang, Yannan Qu, Zhangzhi Peng, Yukai Wang, Hongli Zhu, Dachuan Chen, Longxing Cao.  \n*Preprint, February 2024.*  \n[[10.1101/2024.02.10.579791](https://doi.org/10.1101/2024.02.10.579791)]\n\n**Diffusion Language Models Are Versatile Protein Learners.**  \nXinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, Quanquan Gu.  \n*Preprint, February 2024.*  \n[[arxiv](https://arxiv.org/abs/2402.18567)]\n\n**Harmonic Self-Conditioned Flow Matching for Multi-Ligand Docking and Binding Site Design.**  \nHannes Stärk, Bowen Jing, Regina Barzilay, Tommi Jaakkola.  \n*Preprint, November 2023.*  \n[[arxiv](https://arxiv.org/abs/2310.05764)]\n\n**Fast protein backbone generation with SE(3) flow matching.**  \nJason Yim, Andrew Campbell, Andrew Y. K. Foong, Michael Gastegger, José Jiménez-Luna, Sarah Lewis, Victor Garcia Satorras, Bastiaan S. Veeling, Regina Barzilay, Tommi Jaakkola, Frank Noé.  \n*Preprint, October 2023.*  \n[[arxiv](https://arxiv.org/abs/2310.05297v2)]\n\n**SE(3)-Stochastic Flow Matching for Protein Backbone Generation.**  \nAvishek Joey Bose, Tara Akhound-Sadegh, Kilian Fatras, Guillaume Huguet, Jarrid Rector-Brooks, Cheng-Hao Liu, Andrei Cristian Nica, Maksym Korablyov, Michael Bronstein, Alexander Tong.  \n*Preprint, October 2023.*  \n[[arxiv](https://arxiv.org/abs/2310.02391)]\n\n**Joint Design of Protein Sequence and Structure based on Motifs.**  \nZhenqiao Song, Yunlong Zhao, Yufei Song, Wenxian Shi, Yang Yang, Lei Li.  \n*Preprint, October 2023.*  \n[[arxiv](https://arxiv.org/abs/2310.02546)]\n\n**PepMLM: Target Sequence-Conditioned Generation of Peptide Binders via Masked Language Modeling.**  \nTianlai Chen, Sarah Pertsemlidis, Rio Watson, Venkata Srikar Kavirayuni, Ashley Hsu, Pranay Vure, Rishab Pulugurta, Sophia Vincoff, Lauren Hong, Tian Wang, Vivian Yudistyra, Elena Haarer, Lin Zhao, Pranam Chatterjee.  \n*Preprint, October 2023.*  \n[[arxiv](https://arxiv.org/abs/2310.03842)]\n\n**Enhancing Luciferase Activity and Stability through Generative Modeling of Natural Enzyme Sequences.**  \nWen Jun Xie, Dangliang Liu, Xiaoya Wang, Aoxuan Zhang, Qijia Wei, Ashim Nandi, Suwei Dong, Arieh Warshel.  \n*Preprint, October 2023.*  \n[[10.1101/2023.09.18.558367](https://doi.org/10.1101/2023.09.18.558367)]\n\n**Protein generation with evolutionary diffusion: sequence is all you need.**  \nSarah Alamdari, Nitya Thakkar, Rianne van den Berg, Alex Xijie Lu, Nicolo Fusi, Ava Pardis Amini, Kevin K Yang.  \n*Preprint, September 2023.*  \n[[10.1101/2023.09.11.556673](https://doi.org/10.1101/2023.09.11.556673)]\n\n**Efficient and accurate sequence generation with small-scale protein language models.**  \nYaiza Serrano, Sergi Roda, Victor Guallar, Alexis Molina.  \n*Preprint, August 2023.*  \n[[10.1101/2023.08.04.551626](https://doi.org/10.1101/2023.08.04.551626)]\n\n**SE(3) diffusion model with application to protein backbone generation**.  \nJason Yim, Brian L. Trippe, Valentin De Bortoli, Emile Mathieu, Arnaud Doucet, Regina Barzilay, Tommi Jaakkola.  \n*ICML, July 2023.*  \n[ACM](https://dl.acm.org/doi/10.5555/3618408.3620080)\n\n**De novo design of protein structure and function with RFdiffusion.**  \nJoseph L. Watson, David Juergens, Nathaniel R. Bennett, Brian L. Trippe, Jason Yim, Helen E. Eisenach, Woody Ahern, Andrew J. Borst, Robert J. Ragotte, Lukas F. Milles, Basile I. M. Wicky, Nikita Hanikel, Samuel J. Pellock, Alexis Courbet, William Sheffler, Jue Wang, Preetham Venkatesh, Isaac Sappington, Susana Vázquez Torres, Anna Lauko, Valentin De Bortoli, Emile Mathieu, Sergey Ovchinnikov, Regina Barzilay, Tommi S. Jaakkola, Frank DiMaio, Minkyung Baek \u0026 David Baker.  \n*Nature, July 2023.*  \n[[10.1038/s41586-023-06415-8](https://doi.org/10.1038/s41586-023-06415-8)]\n\n**PoET: A generative model of protein families as sequences-of-sequences.**  \nTimothy F. Truong Jr, Tristan Bepler.  \n*Preprint, June 2023.*  \n[[arxiv](https://arxiv.org/abs/2306.06156)]\n\n**Protein Sequence and Structure Co-Design with Equivariant Translation.**  \nChence Shi, Chuanrui Wang, Jiarui Lu, Bozitao Zhong, Jian Tang.  \n*ICLR, May 2023.*  \n[[arxiv](https://arxiv.org/abs/2210.08761)]\n\n**Generative design of de novo proteins based on secondary-structure constraints using an attention-based diffusion model.**  \nBo Ni, David L. Kaplan, Markus J. Buehler.  \n*Cell Chem, April 2023.*  \n[[10.1016/j.chempr.2023.03.020](https://doi.org/10.1016/j.chempr.2023.03.020)]\n\n**ProtWave-VAE: Integrating autoregressive sampling with latent-based inference for data-driven protein design.**  \nNiksa Praljak, Xinran Lian, Rama Ranganathan, Andrew L. Ferguson.  \n*Preprint, April 2023.*  \n[[10.1101/2023.04.23.537971](https://doi.org/10.1101/2023.04.23.537971)]\n\n**ProtFIM: Fill-in-Middle Protein Sequence Design via Protein Language Models.**  \nYouhan Lee, Hasun Yu.  \n*Preprint, March 2023.*  \n[[arxiv](https://arxiv.org/abs/2303.16452)]\n\n**Extrapolative Controlled Sequence Generation via Iterative Refinement.**  \nVishakh Padmakumar, Richard Yuanzhe Pang, He He, Ankur P. Parikh.  \n*Preprint, March 2023.*  \n[[arxiv](https://arxiv.org/abs/2303.04562)]\n\n**ProteinVAE: Variational AutoEncoder for Translational Protein Design.**  \nSuyue Lyu, Shahin Sowlati-Hashjin, Michael Garton.  \n*Preprint, March 2023.*  \n[[10.1101/2023.03.04.531110](https://doi.org/10.1101/2023.03.04.531110)]\n\n**Generative power of a protein language model trained on multiple sequence alignments.**  \nDamiano Sgarbossa, Umberto Lupo, Anne-Florence Bitbol.  \n*eLife, Februrary 2023.*  \n[[10.7554/eLife.79854](https://doi.org/10.7554/eLife.79854)]\n\n**Evaluating Prompt Tuning for Conditional Protein Sequence Generation.**  \nAndrea Nathansen, Kevin Klein, Bernhard Y. Renard, Melania Nowicka, Jakub M. Bartoszewicz.  \n*Preprint, February 2023.*  \n[[10.1101/2023.02.28.530492](https://doi.org/10.1101/2023.02.28.530492)]\n\n**De novo design of luciferases using deep learning.**  \nAndy Hsien-Wei Yeh, Christoffer Norn, Yakov Kipnis, Doug Tischer, Samuel J. Pellock, Declan Evans, Pengchen Ma, Gyu Rie Lee, Jason Z. Zhang, Ivan Anishchenko, Brian Coventry, Longxing Cao, Justas Dauparas, Samer Halabiya, Michelle DeWitt, Lauren Carter, K. N. Houk \u0026 David Baker.  \n*Nature, February 2023.*  \n[[10.1038/s41586-023-05696-3](https://doi.org/10.1038/s41586-023-05696-3)]\n\n**A Text-guided Protein Design Framework.**  \nShengchao Liu, Yutao Zhu, Jiarui Lu, Zhao Xu, Weili Nie, Anthony Gitter, Chaowei Xiao, Jian Tang, Hongyu Guo, Anima Anandkumar.  \n*Preprint, February 2023.*  \n[[arxiv](https://arxiv.org/abs/2302.04611)]\n\n**Large language models generate functional protein sequences across diverse families.**  \nAli Madani, Ben Krause, Eric R. Greene, Subu Subramanian, Benjamin P. Mohr, James M. Holton, Jose Luis Olmos Jr., Caiming Xiong, Zachary Z. Sun, Richard Socher, James S. Fraser \u0026 Nikhil Naik.  \n*Nature Biotechnology, January 2023.*  \n[[10.1038/s41587-022-01618-2](https://doi.org/10.1038/s41587-022-01618-2)]\n\n**Unlocking de novo antibody design with generative artificial intelligence.**  \nAmir Shanehsazzadeh, Sharrol Bachas, George Kasun, John M. Sutton, Andrea K. Steiger, Richard Shuai, Christa Kohnert, Alex Morehead, Amber Brown, Chelsea Chung, Breanna K. Luton, Nicolas Diaz, Matt McPartlon, Bailey Knight, Macey Radach, Katherine Bateman, David A. Spencer, Jovan Cejovic, Gaelin Kopec-Belliveau, Robel Haile, Edriss Yassine, Cailen McCloskey, Monica Natividad, Dalton Chapman, Luka Stojanovic, Goran Rakocevic, Gregory Hannum, Engin Yapici, Katherine Moran, Rodante Caguiat, Shaheed Abdulhaqq, Zheyuan Guo, Lillian R. Klug, Miles Gander, Joshua Meier.  \n*Preprint, January 2023.*  \n[[10.1101/2023.01.08.523187](https://doi.org/10.1101/2023.01.08.523187)]\n\n**De novo design of high-affinity protein binders to bioactive helical peptides.**  \nSusana Vázquez Torres, Philip J. Y. Leung, Isaac D. Lutz, Preetham Venkatesh, Joseph L. Watson, Fabian Hink, Huu-Hien Huynh, Andy Hsien-Wei Yeh, David Juergens, Nathaniel R. Bennett, Andrew N. Hoofnagle, Eric Huang, Michael J MacCoss, Marc Expòsit, Gyu Rie Lee, Paul M. Levine, Xinting Li, Mila Lamb, Elif Nihal Korkmaz, Jeff Nivala, Lance Stewart, Joseph M. Rogers, David Baker.  \n*Preprint, December 2022.*  \n[[10.1101/2022.12.10.519862](https://doi.org/10.1101/2022.12.10.519862)]\n\n**Deep learning-enabled design of synthetic orthologs of a signaling protein.**  \nXinran Lian, Niksa Praljak, Subu K. Subramanian, Sarah Wasinger, Rama Ranganathan, Andrew L. Ferguson.  \n*Preprint, December 2022.*  \n[[10.1101/2022.12.21.521443](https://doi.org/10.1101/2022.12.21.521443)]\n\n**A high-level programming language for generative protein design.**  \nBrian Hie, Salvatore Candido, Zeming Lin, Ori Kabeli, Roshan Rao, Nikita Smetanin, Tom Sercu, Alexander Rives.  \n*Preprint, December 2022.*  \n[[10.1101/2022.12.21.521526](https://doi.org/10.1101/2022.12.21.521526)]\n\n**Language models generalize beyond natural proteins.**  \nRobert Verkuil, Ori Kabeli, Yilun Du, Basile I. M. Wicky, Lukas F. Milles, Justas Dauparas, David Baker, Sergey Ovchinnikov, Tom Sercu, Alexander Rives.  \n*Preprint, December 2022.*  \n[[10.1101/2022.12.21.521521](https://doi.org/10.1101/2022.12.21.521521)]\n\n**Deep Generative Design of Epitope-Specific Binding Proteins by Latent Conformation Optimization.**  \nRaphael R. Eguchi, Christian A. Choe, Udit Parekh, Irene S. Khalek, Michael D. Ward, Neha Vithani, Gregory R. Bowman, Joseph G. Jardine, Po-Ssu Huang.  \n*Preprint, December 2022.*  \n[[10.1101/2022.12.22.521698](https://doi.org/10.1101/2022.12.22.521698)]\n\n**Illuminating protein space with a programmable generative model.**  \nJohn Ingraham, Max Baranov, Zak Costello, Vincent Frappier, Ahmed Ismail, Shan Tie, Wujie Wang, Vincent Xue, Fritz Obermeyer, Andrew Beam, Gevorg Grigoryan.  \n*Preprint, December 2022.*  \n[[10.1101/2022.12.01.518682](https://doi.org/10.1101/2022.12.01.518682)]\n\n**Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models.**  \nJoseph L. Watson, David Juergens, Nathaniel R. Bennett, Brian L. Trippe, Jason Yim, Helen E. Eisenach, Woody Ahern, Andrew J. Borst, Robert J. Ragotte, Lukas F. Milles, Basile I. M. Wicky, Nikita Hanikel, Samuel J. Pellock, Alexis Courbet, William Sheffler, Jue Wang, Preetham Venkatesh, Isaac Sappington, Susana Vázquez Torres, Anna Lauko, Valentin De Bortoli, Emile Mathieu, Regina Barzilay, Tommi S. Jaakkola, Frank DiMaio, Minkyung Baek, David Baker.  \n*Preprint, December 2022.*  \n[[10.1101/2022.12.09.519842](https://doi.org/10.1101/2022.12.09.519842)]\n\n**De novo PROTAC design using graph-based deep generative models.**  \nDivya Nori, Connor W. Coley, Rocío Mercado.  \n*Preprint, November 2022.*  \n[[arxiv](https://arxiv.org/abs/2211.02660)]\n\n**Latent Space Diffusion Models of Cryo-EM Structures.**  \nKarsten Kreis, Tim Dockhorn, Zihao Li, Ellen Zhong.  \n*Preprint, November 2022.*  \n[[arxiv](https://arxiv.org/abs/2211.14169)]\n\n**Protein Sequence and Structure Co-Design with Equivariant Translation.**  \nChence Shi, Chuanrui Wang, Jiarui Lu, Bozitao Zhong, Jian Tang.  \n*Preprint, October 2022.* \n[[arxiv](https://arxiv.org/abs/2210.08761)] \n\n**Deep Generative Models of Protein Structure Uncover Distant Relationships Across a Continuous Fold Space.**  \nEli J. Draizen, Stella Veretnik, Cameron Mura, Philip E. Bourne.  \n*Preprint, August 2022.*  \n[[10.1101/2022.07.29.501943](https://doi.org/10.1101/2022.07.29.501943)]\n\n**Neural Network-Derived Potts Models for Structure-Based Protein Design using Backbone Atomic Coordinates and Tertiary Motifs.**  \nAlex J. Li, Mindren Lu, Israel Desta, Vikram Sundar, Gevorg Grigoryan, Amy E. Keating.  \n*Preprint, August 2022.*  \n[[10.1101/2022.08.02.501736](https://doi.org/10.1101/2022.08.02.501736)]\n\n**ProtGPT2 is a deep unsupervised language model for protein design.**  \nNoelia Ferruz, Steffen Schmidt \u0026 Birte Höcker.  \n*Nature Communications, July 2022.*. \n[[10.1038/s41467-022-32007-7](https://doi.org/10.1038/s41467-022-32007-7)]\n\n**ProteinSGM: Score-based generative modeling for de novo protein design.**  \nJin Sub Lee, Philip M. Kim.  \n *Preprint, July 2022.*  \n[[10.1101/2022.07.13.499967](https://doi.org/10.1101/2022.07.13.499967)]\n\n**Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models.**\nShitong Luo, Yufeng Su, Xingang Peng, Sheng Wang, Jian Peng, Jianzhu Ma.  \n*Preprint, July 2022.*  \n[[10.1101/2022.07.10.499510](https://doi.org/10.1101/2022.07.10.499510)]\n\n**End-to-End deep structure generative model for protein design.**  \nBoqiao Lai, Matthew McPartlon, Jinbo Xu.  \n*Preprint, July 2022.*  \n[[10.1101/2022.07.09.499440](https://doi.org/10.1101/2022.07.09.499440)]\n\n**Predicting the antigenic evolution of SARS-COV-2 with deep learning.**  \nWenkai Han, Ningning Chen, Shiwei Sun, Xin Gao.  \n*Preprint, June 2022.*  \n[[10.1101/2022.06.23.497375](https://doi.org/10.1101/2022.06.23.497375)]\n\n**Hallucinating protein assemblies.**  \nB. I. M. Wicky, L. F. Milles, A. Courbet, R. J. Ragotte, J. Dauparas, E. Kinfu, S. Tipps, R. D. Kibler, M. Baek, F. DiMaio, X. Li, L. Carter, A. Kang, H. Nguyen, A. K. Bera, D. Baker.  \n*Preprint, June 2022.*  \n[[10.1101/2022.06.09.493773](https://doi.org/10.1101/2022.06.09.493773)]\n\n**ProGen2: Exploring the Boundaries of Protein Language Models.**  \nErik Nijkamp, Jeffrey Ruffolo, Eli N. Weinstein, Nikhil Naik, Ali Madani.  \n*Preprint, June 2022.*  \n[[arxiv](https://arxiv.org/abs/2206.13517)]\n\n**DiffMD: A Geometric Diffusion Model for Molecular Dynamics Simulations.**  \nFang Wu, Stan Z. Li.  \n*Preprint, April 2022.*  \n[[arxiv](https://arxiv.org/abs/2204.08672)]\n\n**Fragment-Based Ligand Generation Guided By Geometric Deep Learning On Protein-Ligand Structure.**  \nAlexander S. Powers, Helen H. Yu, Patricia Suriana, Ron O. Dror.  \n*Preprint, March 2022.*  \n[[10.1101/2022.03.17.484653](https://doi.org/10.1101/2022.03.17.484653)]\n\n**Design in the DARK: Learning Deep Generative Models for De Novo Protein Design.**  \nLewis Moffat, Shaun M. Kandathil, David T. Jones.  \n*Preprint, January 2022.*  \n[[10.1101/2022.01.27.478087](https://doi.org/10.1101/2022.01.27.478087)]\n\n**Sampling the conformational landscapes of transporters and receptors with AlphaFold2.**  \nDiego del Alamo, Davide Sala, Hassane S. Mchaourab, Jens Meiler.  \n*Preprint, November 2021.*  \n[[10.1101/2021.11.22.469536](https://doi.org/10.1101/2021.11.22.469536)]\n\n**Benchmarking deep generative models for diverse antibody sequence design.**  \nIgor Melnyk, Payel Das, Vijil Chenthamarakshan, Aurelie Lozano.  \n*Preprint, November 2021.*  \n[[arxiv](https://arxiv.org/abs/2111.06801v1)]\n\n**Efficient generative modeling of protein sequences using simple autoregressive models.**  \nJeanne Trinquier, Guido Uguzzoni, Andrea Pagnani, Francesco Zamponi \u0026 Martin Weigt.  \n*Nature Communications, October 2021.*  \n[[10.1038/s41467-021-25756-4](https://doi.org/10.1038/s41467-021-25756-4)]\n\n**Navigating the amino acid sequence space between functional proteins using a deep learning framework.**  \nTristan Bitard-Feildel​.  \n*PeerJ Computer Science, September 2021.*  \n[[10.7717/peerj-cs.684](https://doi.org/10.7717/peerj-cs.684)]\n\n**BioPhi: A platform for antibody design, humanization and humanness evaluation based on natural antibody repertoires and deep learning.**  \nDavid Prihoda, Jad Maamary, Andrew Waight, Veronica Juan, Laurence Fayadat-Dilman, Daniel Svozil, Danny A. Bitton.  \n*Preprint, August 2021.*  \n[[10.1101/2021.08.08.455394](https://doi.org/10.1101/2021.08.08.455394)]\n\n**Ancestral Sequence Reconstruction for Co-evolutionary models.**  \nEdwin Rodríguez Horta, Alejandro Lage-Castellanos, Roberto Mulet.  \n*Preprint, August 2021.*. \n[[arxiv](https://arxiv.org/abs/2108.03801)]\n\n**AMaLa: Analysis of Directed Evolution Experiments via Annealed Mutational approximated Landscape.**  \nLuca Sesta, Guido Uguzzoni, Jorge Fernandez-de-Cossio Diaz, Andrea Pagnani.  \n*International Journal of Molecular Sciences, August 2021.*  \n[[10.3390/ijms222010908](https://doi.org/10.3390/ijms222010908)]\n\n**Modeling sequence-space exploration and emergence of epistatic signals in protein evolution.**  \nMatteo Bisardi, Juan Rodriguez-Rivas, Francesco Zamponi, Martin Weigt.  \n*Preprint, June 2021.*  \n[[arxiv](https://arxiv.org/abs/2106.02441)]\n\n**Generative AAV capsid diversification by latent interpolation.**  \nSam Sinai, Nina Jain, George M Church, Eric D Kelsic.  \n*Preprint, April 2021.*  \n[[10.1101/2021.04.16.440236](https://doi.org/10.1101/2021.04.16.440236)]\n\n**Protein design and variant prediction using autoregressive generative models.**  \nJung-Eun Shin, Adam Riesselman, Kollasch, Conor McMahon, Elana Simon, Chris Sander, Aashish Manglik, Andrew Kruse,  Debora Marks.  \n*Nature Communications, April 2021.*  \n[[10.1038/s41467-021-22732-w](https://doi.org/10.1038/s41467-021-22732-w)] \n\n**Expanding functional protein sequence spaces using generative adversarial networks.**  \nDonatas Repecka, Vykintas Jauniskis, Laurynas Karpus, Elzbieta Rembeza, Jan Zrimec, Simona Poviloniene, Irmantas Rokaitis, Audrius Laurynenas, Wissam Abuajwa, Otto Savolainen, Rolandas Meskys, Martin K. M. Engqvist, Aleksej Zelezniak.  \n*Nature Machine Intelligence, March 2021.*  \n[[10.1038/s42256-021-00310-5](https://doi.org/10.1038/s42256-021-00310-5)]\n\n**Generating functional protein variants with variational autoencoders.**  \nAlex Hawkins-Hooker, Florence Depardieu, Sebastien Baur, Guillaume Couairon, Arthur Chen, David Bikard.  \n*PLOS Computational Biology, February 2021.*  \n[[10.1371/journal.pcbi.1008736](https://doi.org/10.1371/journal.pcbi.1008736)]\n\n**Generating novel protein sequences using Gibbs sampling of masked language models.**  \nSean R. Johnson, Sarah Monaco, Kenneth Massie, Zaid Syed.  \n*Preprint, January 2021.*  \n[[10.1101/2021.01.26.428322](https://doi.org/10.1101/2021.01.26.428322)]\n\n**The structure-fitness landscape of pairwise relations in generative sequence models.**  \n*Preprint, November 2020.*  \nDylan Marshall, Haobo Wang, Michael Stiffler, Justas Dauparas, Peter Koo, Sergey Ovchinnikov.  \n[[10.1101/2020.11.29.402875](https://doi.org/10.1101/2020.11.29.402875)]\n\n**De Novo Protein Design for Novel Folds Using Guided Conditional Wasserstein Generative Adversarial Networks.**  \nMostafa Karimi, Shaowen Zhu, Yue Cao, Yang Shen.  \n*Journal of Chemical Information and Modeling, September 2020.*  \n[[10.1021/acs.jcim.0c00593](https://doi.org/10.1021/acs.jcim.0c00593)]\n\n**Deep learning enables the design of functional de novo antimicrobial proteins.**  \nJavier Caceres-Delpiano, Roberto Ibañez, Patricio Alegre, Cynthia Sanhueza, Romualdo Paz-Fiblas, Simon Correa, Pedro Retamal, Juan Cristóbal Jiménez, Leonardo Álvarez.  \n*Preprint, August 2020.*  \n[[10.1101/2020.08.26.266940](https://doi.org/10.1101/2020.08.26.266940)]\n\n**Generative probabilistic biological sequence models that account for mutational variability.**  \nEli N. Weinstein, Debora S. Marks.  \n*Preprint, August 2020.*  \n[[10.1101/2020.07.31.231381](https://doi.org/10.1101/2020.07.31.231381)]\n\n**IG-VAE: Generative Modeling of Immunoglobulin Proteins by Direct 3D Coordinate Generation.**  \nRaphael R. Eguchi, Namrata Anand, Christian A. Choe, Po-Ssu Huang.  \n*Preprint, August 2020.*  \n[[10.1101/2020.08.07.242347](https://doi.org/10.1101/2020.08.07.242347)]\n\n**A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences.**\nJohannes Linder, Nicholas Bogard, Alexander B. Rosenberg, Georg Seelig\n*Cell Systems, July 2020*\n[[10.1016/j.cels.2020.05.007](https://doi.org/10.1016/j.cels.2020.05.007)]\n\n**Signal Peptides Generated by Attention-Based Neural Networks.**  \nZachary Wu, Kevin Kaichuang Yang, Michael Liszka, Alycia Lee, Alina Batzilla, David Wernick, David P Weiner, Frances H Arnold.  \n*ACS Synthetic Biology, July 2020.*  \n[[10.1021/acssynbio.0c00219](https://doi.org/10.1021/acssynbio.0c00219)]\n\n**Bio-informed Protein Sequence Generation for Multi-class Virus Mutation Prediction.**  \nYuyang Wang, Prakarsh Yadav, Rishikesh Magar, Amir Barati Farimani.  \n*Preprint, June 2020.*  \n[[10.1101/2020.06.11.146167](https://doi.org/10.1101/2020.06.11.146167)]\n\n**Designing Feature-Controlled Humanoid Antibody Discovery Libraries Using Generative Adversarial Networks.**  \nTileli Amimeur, Jeremy M. Shaver, Randal R. Ketchem, J. Alex Taylor, Rutilio H. Clark, Josh Smith, Danielle Van Citters, Christine C. Siska, Pauline Smidt, Megan Sprague, Bruce A. Kerwin, Dean Pettit.\n*Preprint, April 2020.*\n[[10.1101/2020.04.12.024844](https://doi.org/10.1101/2020.04.12.024844)]\n\n**ProGen: Language Modeling for Protein Generation.**  \nAli Madani, Bryan McCann, Nikhil Naik, Nitish Shirish Keskar, Namrata Anand, Raphael R. Eguchi, Po-Ssu Huang, Richard Socher.  \n*Preprint, March 2020.*  \n[[10.1101/2020.03.07.982272](https://doi.org/10.1101/2020.03.07.982272)]\n\n**De Novo Protein Design for Novel Folds using Guided Conditional Wasserstein Generative Adversarial Networks (gcWGAN).**  \nMostafa Karimi, Shaowen Zhu, Yue Cao, Yang Shen.  \n*Preprint, September 2019.*  \n[[10.1101/769919](https://doi.org/10.1101/769919)]\n\n**Reconstructing continuous distributions of 3D protein structure from cryo-EM images.**  \nEllen D. Zhong, Tristan Bepler, Joseph H. Davis, Bonnie Berger.  \n*Preprint, September 2019.*\n[[arXiv](https://arxiv.org/abs/1909.05215)]\n\n**Deep generative models for T cell receptor protein sequences.**  \nKristian Davidsen, Branden J. Olson, William S. DeWitt III, Jean Feng, Elias Harkins, Philip Bradley, Frederick A. Matsen IV.  \n*eLife, September 2019*.  \n[[10.7554/eLife.46935.001](https://doi.org/10.7554/eLife.46935.001)]\n\n**Generative Models for Graph-Based Protein Design.**  \nJohn Ingraham, Vikas K. Garg, Regina Barzilay, Tommi Jaakkola.  \n*ICLR workshop on Deep Generative Models for Highly Structured Data, May 2019.*  \n[[OpenReview](https://openreview.net/pdf?id=SJgxrLLKOE)]\n\n**How to Hallucinate Functional Proteins.**  \nZak Costello, Hector Garcia Martin  \n*Preprint, March 2019*  \n[[arxiv](https://arxiv.org/abs/1903.00458)]  \n\n**Conditioning by adaptive sampling for robust design.**  \nDavid H. Brookes, Hahnbeom Park, Jennifer Listgarten.  \n*Preprint, January 2019.*  \n[[arxiv](https://arxiv.org/abs/1901.10060)]\n\n**Generative modeling for protein structures.**  \nNamrata Anand, Po-Ssu Huang.  \n*NeurIPS, December 2018.*  \n[[NeurIPS](https://papers.nips.cc/paper/7978-generative-modeling-for-protein-structures.pdf)]\n\n**Design of metalloproteins and novel protein folds using variational autoencoders.**  \nJoe G. Greener, Lewis Moffat, David T Jones.  \n*Scientific Reports, November 2018.*  \n[[10.1038/s41598-018-34533-1](https://doi.org/10.1038/s41598-018-34533-1)]\n\n**Design by adaptive sampling.**  \nDavid H. Brookes, Jennifer Listgarten.  \n*Preprint, October 2018.*  \n[[arxiv](https://arxiv.org/abs/1810.03714)]\n\n**Deep generative models of genetic variation capture the effects of mutations.**  \nAdam J Riesselman, John B Ingraham, Debora S. Marks   \n*Nature Methods, September 2018*  \n[[10.1038/s41592-018-0138-4](https://doi.org/10.1038/s41592-018-0138-4)] \n\n**Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions.**  \nAnvita Gupta, James Zou.  \n*Preprint, April 2018.*  \n[[arxiv](https://arxiv.org/abs/1804.01694)]\n\n**Recurrent Neural Network Model for Constructive Peptide Design.**  \nAlex T. Müller, Jan A. Hiss, and Gisbert Schneider.  \n*Journal of Chemical Information and Modeling, January 2018*  \n[[10.1021/acs.jcim.7b00414](https://doi.org/10.1021/acs.jcim.7b00414)]\n\n**Variational auto-encoding of protein sequences.**  \nSam Sinai, Eric Kelsic, George M. Church, Martin A. Nowak  \n*Preprint, December 2017*  \n[[arxiv](https://arxiv.org/abs/1712.03346)]\n\n### Biophysics\n\n**Accurate Conformation Sampling via Protein Structural Diffusion.**  \nJiahao Fan, Ziyao Li, Eric Alcaide, Guolin Ke, Huaqing Huang, E Weinan. \n*Preprint, May 2024.*  \n[[10.1101/2024.05.20.594916](https://doi.org/10.1101/2024.05.20.594916)] \n\n**ForceGen: End-to-end de novo protein generation based on nonlinear mechanical unfolding responses using a protein language diffusion model.**  \nBo Ni, David L. Kaplan, Markus J. Buehler.  \n*Preprint, October 2023.*  \n[[arxiv](https://arxiv.org/abs/2310.10605)]\n\n**Chemically Transferable Generative Backmapping of Coarse-Grained Proteins.**   \nSoojung Yang, Rafael Gómez-Bombarelli.  \n*Preprint, March 2023.*  \n[[arxiv](https://arxiv.org/abs/2303.01569)]\n\n**Direct generation of protein conformational ensembles via machine learning.**  \nGiacomo Janson, Gilberto Valdes-Garcia, Lim Heo \u0026 Michael Feig.  \n*Nature Communications, February 2023.*  \n[[10.1038/s41467-023-36443-x](https://doi.org/10.1038/s41467-023-36443-x)]Matching receptor to odorant with protein language and graph neural networks\n\n**Machine Learning Coarse-Grained Potentials of Protein Thermodynamics.**  \nMaciej Majewski, Adrià Pérez, Philipp Thölke, Stefan Doerr, Nicholas E. Charron, Toni Giorgino, Brooke E. Husic, Cecilia Clementi, Frank Noé, Gianni De Fabritiis.  \n*Preprint, December 2022.*  \n[[arxiv](https://arxiv.org/abs/2212.07492)]\n\n**Skipping the Replica Exchange Ladder with Normalizing Flows.**  \nMichele Invernizzi, Andreas Krämer, Cecilia Clementi, Frank Noé.  \n*Preprint, October 2022.*  \n[[arxiv](https://arxiv.org/abs/2210.14104)]\n\n**From data to noise to data for mixing physics across temperatures with generative artificial intelligence.**  \nYihang Wang, Lukas Herron, and Pratyush Tiwary.  \n*PNAS, August 2022.*  \n[[10.1073/pnas.2203656119](https://doi.org/10.1073/pnas.2203656119)]\n\n**Molecular dynamics without molecules: searching the conformational space of proteins with generative neural networks.**  \nGregory Schwing, Luigi L. Palese, Ariel Fernández, Loren Schwiebert, Domenico L. Gatti.  \n*Preprint, June 2022.*  \n[[arxiv](https://arxiv.org/abs/2206.04683)]\n\n### Predicting stability\n\n**The genetic architecture of protein stability.**  \nAndre J. Faure, Aina Martí-Aranda, Cristina Hidalgo-Carcedo, Jörn M. Schmiedel, Ben Lehner.  \n*Preprint, October 2023.*  \n[[10.1101/2023.10.27.564339](https://doi.org/10.1101/2023.10.27.564339)]\n\n**New mega dataset combined with deep neural network makes a progress in predicting impact of mutation on protein stability.**  \nMarina A Pak, Nikita V Dovidchenko, Satyarth Mishra Sharma, Dmitry N Ivankov.  \n*Preprint, January 2023.*  \n[[10.1101/2022.12.31.522396](https://doi.org/10.1101/2022.12.31.522396)]\n\n**PROSTATA: Protein Stability Assessment using Transformers.**  \nDmitriy Umerenkov, Tatiana I. Shashkova, Pavel V. Strashnov, Fedor Nikolaev, Maria Sindeeva, Nikita V. Ivanisenko, Olga L. Kardymon.  \n*Preprint, December 2022.*  \n[[10.1101/2022.12.25.521875](https://doi.org/10.1101/2022.12.25.521875)]\n\n**Rapid protein stability prediction using deep learning representations.**  \nLasse M. Blaabjerg, Maher M. Kassem, Lydia L. Good, Nicolas Jonsson, Matteo Cagiada, Kristoffer E. Johansson, Wouter Boomsma, Amelie Stein, Kresten Lindorff-Larsen.  \n*Preprint, August 2022.*  \n[[10.1101/2022.07.14.500157](https://doi.org/10.1101/2022.07.14.500157)]\n\n**Artificial Neural Network to Predict Structure-based Protein-protein Free Energy of Binding from Rosetta-calculated Properties.**  \nMatheus Ferraz, José Neto, Roberto Lins, Erico Teixeira.  \n*Preprint, August 2022.*  \n[[10.26434/chemrxiv-2022-zhd87](https://doi.org/10.26434/chemrxiv-2022-zhd87)]\n\n**Construction of a Deep Neural Network Energy Function for Protein Physics.**  \nHuan Yang, Zhaoping Xiong, Francesco Zonta.  \n*J. Chem. Theory Comput., August 2008.*  \n[[10.1021/acs.jctc.2c00069](https://doi.org/10.1021/acs.jctc.2c00069)]\n\n**Towards generalizable prediction of antibody thermostability using machine learning on sequence and structure features.**  \nAmeya Harmalkar, Roshan Rao, Jonas Honer, Wibke Deisting, Jonas Anlahr, Anja Hoenig, Julia Czwikla, Eva Sienz-Widmann, Doris Rau, Austin Rice, Timothy P. Riley, Danqing Li, Hannah B. Catterall, Christine E. Tinberg, Jeffrey J. Gray, Kathy Y. Wei.  \n*Preprint, June 2022.*  \n[[10.1101/2022.06.03.494724](https://doi.org/10.1101/2022.06.03.494724)]\n\n**Learning deep representations of enzyme thermal adaptation.**  \nGang Li, Filip Buric, Jan Zrimec, Sandra Viknander, Jens Nielsen, Aleksej Zelezniak, Martin K. M. Engqvist.  \n*Preprint, March 2022.*  \n[[10.1101/2022.03.14.484272](https://doi.org/10.1101/2022.03.14.484272)]\n\n**Evaluating Protein Engineering Thermostability Prediction Tools Using an Independently Generated Dataset.**\nPeishan Huang, Simon K. S. Chu, Henrique N. Frizzo, Morgan P. Connolly, Ryan W. Caster, and Justin B. Siegel.  \n*ACS Omega, March 2020.*  \n[[10.1021/acsomega.9b04105](https://doi.org/10.1021/acsomega.9b04105)]\n\n**Predicting changes in protein thermostability upon point mutation with deep 3D convolutional neural networks.**  \nBian Li, Yucheng T. Yang, John A. Capra, Mark B. Gerstein.  \n*Preprint, February 2020.*  \n[[10.1101/2020.02.28.959874](https://doi.org/10.1101/2020.02.28.959874)]\n\n**Machine Learning for Prioritization of Thermostabilizing Mutations for G-protein Coupled Receptors.**  \nS. Muk, S. Ghosh, S. Achuthan, X. Chen, X. Yao, M. Sandhu, M. C. Griffor, K. F. Fennell, Y. Che, V. Shanmugasundaram, X. Qiu, C. G. Tate, N. Vaidehi.  \n*Preprint, July 2019.*  \n[[10.1101/715375](https://doi.org/10.1101/715375)]\n\n**Machine Learning Applied to Predicting Microorganism Growth Temperatures and Enzyme Catalytic Optima**\nGang Li, Kersten S. Rabe, Jens Nielsen, Martin K. M. Engqvist.  \n*ACS Synthetic Biology, May 2019*  \n[[10.1021/acssynbio.9b00099](https://doi.org/10.1021/acssynbio.9b00099)]\n\n**mGPfusion: predicting protein stability changes with Gaussian process kernel learning and data fusion.**\nEmmi Jokinen, Markus Heinonen, Harri Lähdesmäki.  \n*Bioinformatics, July 2018.*  \n[[10.1093/bioinformatics/bty238](https://doi.org/10.1093/bioinformatics/bty238)]\n\n**Structure Based Thermostability Prediction Models for Protein Single Point Mutations with Machine Learning Tools.**  \nLei Jia , Ramya Yarlagadda, Charles C. Reed.  \n*PLOS One, September 2015.*  \n[[10.1371/journal.pone.0138022](https://doi.org/10.1371/journal.pone.0138022)]\n\n**NeEMO: a method using residue interaction networks to improve prediction of protein stability upon mutation.**  \nManuel Giollo, Alberto J. M. Martin†, Ian Walsh, Carlo Ferrari, Silvio C. E. Tosatto.  \n*BMC Genomics, May 2014.*  \n[[10.1186/1471-2164-15-S4-S7](https://doi.org/10.1186/1471-2164-15-S4-S7)]\n\n**mCSM: predicting the effects of mutations in proteins using graph-based signatures.**  \nDouglas E. V. Pires, David B. Ascher, Tom L. Blundell.  \n*Bioinformatics, February 2014.*  \n[[10.1093/bioinformatics/btt691](https://doi.org/10.1093/bioinformatics/btt691)]\n\n**PROTS-RF: A Robust Model for Predicting Mutation-Induced Protein Stability Changes.**  \nYunqi Li, Jianwen Fang.  \n*PLOS One, October 2012.*  \n[[10.1371/journal.pone.0047247](https://doi.org/10.1371/journal.pone.0047247)]\n\n**Predicting changes in protein thermostability brought about by single- or multi-site mutations.**  \nJian Tian, Ningfeng Wu, Xiaoyu Chu, Yunliu Fan.  \n*BMC Bioinformatics, July 2010.*  \n[[10.1186/1471-2105-11-370](https://doi.org/10.1186/1471-2105-11-370)]\n\n**Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0.**  \nYves Dehouck, Aline Grosfils, Benjamin Folch, Dimitri Gilis, Philippe Bogaerts, Marianne Rooman.  \n*Bioinformatics, October 2009.*  \n[[10.1093/bioinformatics/btp445](https://doi.org/10.1093/bioinformatics/btp445)]\n\n**Prediction of protein stability changes for single‐site mutations using support vector machines.**  \nJianlin Cheng, Arlo Randall, Pierre Baldi.  \n*Protei","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyangkky%2FMachine-learning-for-proteins","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyangkky%2FMachine-learning-for-proteins","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyangkky%2FMachine-learning-for-proteins/lists"}