{"id":20837737,"url":"https://github.com/astrazeneca/diffabxl","last_synced_at":"2025-09-04T19:38:59.161Z","repository":{"id":259202248,"uuid":"869732635","full_name":"AstraZeneca/DiffAbXL","owner":"AstraZeneca","description":"The official implementation of DiffAbXL benchmarked in the paper \"Benchmarking Generative Models for Antibody Design\".","archived":false,"fork":false,"pushed_at":"2024-10-22T09:59:09.000Z","size":371,"stargazers_count":21,"open_issues_count":0,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-10-23T11:44:39.641Z","etag":null,"topics":["antibody-design","binding-affinity","diffusion-models","generative-ai","graph-neural-networks","in-silico-design","llm-models","log-likelihood"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AstraZeneca.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.md","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-08T19:42:52.000Z","updated_at":"2024-10-22T09:59:13.000Z","dependencies_parsed_at":"2024-10-23T13:21:34.751Z","dependency_job_id":null,"html_url":"https://github.com/AstraZeneca/DiffAbXL","commit_stats":null,"previous_names":["astrazeneca/diffabxl"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AstraZeneca%2FDiffAbXL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AstraZeneca%2FDiffAbXL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AstraZeneca%2FDiffAbXL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AstraZeneca%2FDiffAbXL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AstraZeneca","download_url":"https://codeload.github.com/AstraZeneca/DiffAbXL/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253144223,"owners_count":21861023,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["antibody-design","binding-affinity","diffusion-models","generative-ai","graph-neural-networks","in-silico-design","llm-models","log-likelihood"],"created_at":"2024-11-18T01:08:25.816Z","updated_at":"2025-09-04T19:38:59.147Z","avatar_url":"https://github.com/AstraZeneca.png","language":"Python","readme":"![Maturity-level-0](https://img.shields.io/badge/Maturity%20Level-ML--0-red)\n\n# DiffAbXL: \n##### Author: Talip Ucar (ucabtuc@gmail.com)\n\nThe implementation of DiffAbXL benchmarked in the paper: [Exploring Log-Likelihood Scores for Ranking Antibody Sequence Designs](https://www.biorxiv.org/content/10.1101/2024.10.07.617023v4.full.pdf).\n\n- Please note that the paper was originally titled \"Benchmarking Generative Models for Antibody Design\" but we decided to change it to better highlight its core contributions.\n- This is a re-implementation of the original work, DiffAb: [[Paper](https://www.biorxiv.org/content/10.1101/2022.07.10.499510v5.abstract) and [Code](https://github.com/luost26/diffab/tree/main?tab=readme-ov-file)]\n\n\n## Table of Contents:\n\n1. [Current Leaderboard](#current-leaderboard)\n2. [Benchmarking results from the paper](#benchmarking-results)\n3. [How to Build an Interface for Benchmarking Models](#how-to-build-an-interface-for-benchmarking-models)\n4. [Training](#training)\n5. [Structure of the repo](#structure-of-the-repo)\n6. [Experiment tracking](#experiment-tracking)\n7. [Bechmarking datasets and licenses](#bechmarking-datasets-and-licenses)\n8. [Citing the paper](#citing-the-paper)\n9. [Citing this repo](#citing-this-repo)\n\n\n## Current Leaderboard\n\n\u003ctable border=\"1\"\u003e\n  \u003ctr\u003e\n    \u003cth rowspan=\"2\"\u003eRank\u003c/th\u003e\n    \u003cth rowspan=\"2\"\u003eModels\u003c/th\u003e\n    \u003cth colspan=\"2\"\u003eAbsci HER2\u003c/th\u003e\n    \u003cth colspan=\"2\"\u003eNature\u003c/th\u003e\n    \u003cth rowspan=\"2\"\u003eAZ Target-2\u003c/th\u003e\n    \u003cth rowspan=\"2\"\u003eAve. 𝜌\u003c/th\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003cth\u003eZero Shot\u003c/th\u003e\n    \u003cth\u003eSPR Control\u003c/th\u003e\n    \u003cth\u003eHEL\u003c/th\u003e\n    \u003cth\u003eHER2\u003c/th\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e1\u003c/td\u003e\n    \u003ctd\u003eDiffAbXL-A-DN\u003c/td\u003e\n    \u003ctd\u003e0.43\u003c/td\u003e\n    \u003ctd\u003e0.22\u003c/td\u003e\n    \u003ctd\u003e0.62\u003c/td\u003e\n    \u003ctd\u003e0.37\u003c/td\u003e\n    \u003ctd\u003e0.41\u003c/td\u003e\n    \u003ctd\u003e0.41\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e2\u003c/td\u003e\n    \u003ctd\u003eDiffAbXL-A-SG\u003c/td\u003e\n    \u003ctd\u003e0.46\u003c/td\u003e\n    \u003ctd\u003e0.22\u003c/td\u003e\n    \u003ctd\u003e0.64\u003c/td\u003e\n    \u003ctd\u003e-0.38\u003c/td\u003e\n    \u003ctd\u003e0.43\u003c/td\u003e\n    \u003ctd\u003e0.274\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e3\u003c/td\u003e\n    \u003ctd\u003eDiffAbXL-H3-DN\u003c/td\u003e\n    \u003ctd\u003e0.49\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e0.52\u003c/td\u003e\n    \u003ctd\u003e-0.08\u003c/td\u003e\n    \u003ctd\u003e0.37\u003c/td\u003e\n    \u003ctd\u003e0.26\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e4\u003c/td\u003e\n    \u003ctd\u003eIgBlend (struct. only)\u003c/td\u003e\n    \u003ctd\u003e0.40\u003c/td\u003e\n    \u003ctd\u003e0.21\u003c/td\u003e\n    \u003ctd\u003e0.54\u003c/td\u003e\n    \u003ctd\u003e-0.30\u003c/td\u003e\n    \u003ctd\u003e0.31\u003c/td\u003e\n    \u003ctd\u003e0.232\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e5\u003c/td\u003e\n    \u003ctd\u003eAntifold\u003c/td\u003e\n    \u003ctd\u003e0.43\u003c/td\u003e\n    \u003ctd\u003e0.22\u003c/td\u003e\n    \u003ctd\u003e0.4\u003c/td\u003e\n    \u003ctd\u003e-0.47\u003c/td\u003e\n    \u003ctd\u003e0.38\u003c/td\u003e\n    \u003ctd\u003e0.192\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e6\u003c/td\u003e\n    \u003ctd\u003eDiffAbXL-H3-SG\u003c/td\u003e\n    \u003ctd\u003e0.48\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e0.4\u003c/td\u003e\n    \u003ctd\u003e-0.41\u003c/td\u003e\n    \u003ctd\u003e0.29\u003c/td\u003e\n    \u003ctd\u003e0.152\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e7\u003c/td\u003e\n    \u003ctd\u003eESM\u003c/td\u003e\n    \u003ctd\u003e0.29\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e0.18\u003c/td\u003e\n    \u003ctd\u003e0.27\u003c/td\u003e\n    \u003ctd\u003e0.148\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e8\u003c/td\u003e\n    \u003ctd\u003eDiffAb\u003c/td\u003e\n    \u003ctd\u003e0.34\u003c/td\u003e\n    \u003ctd\u003e0.21\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e-0.14\u003c/td\u003e\n    \u003ctd\u003e0.22\u003c/td\u003e\n    \u003ctd\u003e0.126\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e9\u003c/td\u003e\n    \u003ctd\u003eAbLang2\u003c/td\u003e\n    \u003ctd\u003e0.3\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e-0.07\u003c/td\u003e\n    \u003ctd\u003e0.36\u003c/td\u003e\n    \u003ctd\u003e0.118\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e10\u003c/td\u003e\n    \u003ctd\u003eIgBlend (seq. only)\u003c/td\u003e\n    \u003ctd\u003e0.27\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e-0.1\u003c/td\u003e\n    \u003ctd\u003e0.36\u003c/td\u003e\n    \u003ctd\u003e0.106\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e11\u003c/td\u003e\n    \u003ctd\u003eAbLang\u003c/td\u003e\n    \u003ctd\u003e0.3\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e-0.13\u003c/td\u003e\n    \u003ctd\u003e0.35\u003c/td\u003e\n    \u003ctd\u003e0.104\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e12\u003c/td\u003e\n    \u003ctd\u003edyMEAN\u003c/td\u003e\n    \u003ctd\u003e0.37\u003c/td\u003e\n    \u003ctd\u003e0.15\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e0.104\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e13\u003c/td\u003e\n    \u003ctd\u003eAbX\u003c/td\u003e\n    \u003ctd\u003e0.28\u003c/td\u003e\n    \u003ctd\u003e0.19\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e0.094\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e14\u003c/td\u003e\n    \u003ctd\u003eAntiBERTy\u003c/td\u003e\n    \u003ctd\u003e0.26\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e-0.17\u003c/td\u003e\n    \u003ctd\u003e0.35\u003c/td\u003e\n    \u003ctd\u003e0.088\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e15\u003c/td\u003e\n    \u003ctd\u003eMEAN\u003c/td\u003e\n    \u003ctd\u003e0.36\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e0.02\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e0.076\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e16\u003c/td\u003e\n    \u003ctd\u003eESM-IF\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e-0.27\u003c/td\u003e\n    \u003ctd\u003e0\u003c/td\u003e\n    \u003ctd\u003e-0.53\u003c/td\u003e\n    \u003ctd\u003e0.42\u003c/td\u003e\n    \u003ctd\u003e-0.076\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n- **Note-1:** Ave. 𝜌 refers to average Spearman correlation across five datasets. The leaderboard above is based on five target datasets, with a score of zero assigned to models that did not demonstrate statistically significant correlation or were not suitable for score computation (e.g., requiring an antigen).\n- **Note-2:** Log-likelihood scores in this work are computed using a naive approach, as outlined in Equation-11 in the paper, to maintain consistency across models. However, it is worth noting that more principled methods exist for calculating these scores, which may vary depending on the model type (e.g., autoregressive vs. masked language models). We plan to investigate these alternative approaches in future work.\n\n## Benchmarking Results\n#### 1- Correlation between DiffAbXL's log-likelihood and binding affinity across different targets \n\n![Results-1](./assets/diffabxl_results1.png)\n\n**Figure-1: Results for DiffAbXL:** **a)** DiffAbXL-H3-DN for Absci zero-shot HER2 data **b)** DiffAbXL-A-SG for AZ Target-2, **c)** DiffAbXL-A-SG for Nature HEL, **d)** DiffAbXL-A-DN for Nature HER2.\n\n#### 2- Comparing Diffusion-based, LLM-based and Graph-based models     \n\n![Results-2](./assets/diffabxl_results2.png)\n\n**Table-1:** Summary of the results for Spearman correlation. Abbreviations: DN: De Novo mode, SG: Structure Guidance mode, NA: Epitope or complex structure required, but not available. *, **, *** indicate p-values under 0.05, 0.01 and 1e-4 respectively. \n\n## How to Build an Interface for Benchmarking Models\nTo make it easier for us to benchmark your model, we recommend that you implement an interface as a Python method in a class that we can easily integrate with our evaluation pipeline. The method should accept the following inputs:\n1. **Antibody sequences**: A list of antibody sequences.\n2. **Optional structure information**: If applicable, structure data (i.e. PDB file) related to the sequences.\n3. **Additional model-specific parameters**: Any other inputs your model requires.\n\nThe method should return a dictionary containing:\n1. **Log-likelihood scores**: For ranking antibody sequences based on their predicted binding affinity.\n2. **Other relevant metrics**: Such as RMSD, pAE, or any model-specific outputs you believe are relevant.\n\nHere's a basic template in Python for implementing this interface:\n\n```python\n    def benchmark(self, sequences, structure=None, mask=None, **kwargs):\n        \"\"\"\n        Benchmark the model on provided antibody sequences and structures.\n\n        Parameters:\n        sequences (list of str): List of antibody sequences.\n        structure (optional): Path to a PDB file. Currently, only one PDB file is provided per target dataset.\n                              The PDB file may contain either just the antibody or an antibody-antigen complex,\n                              depending on the dataset.\n        mask (optional): Binary list or array indicating the regions of interest in the sequences for metric calculations.\n        kwargs (optional): Additional parameters required by the model.\n\n        Returns:\n        dict: A dictionary containing log-likelihood scores and other relevant metrics.\n        \"\"\"\n        pass\n```\n\nPlease make sure that your model outputs the log-likelihood scores in a format we can use directly for benchmarking antibody sequence designs. This will help us compare your model's performance across our datasets efficiently.\n\n\n\n## Training\nThere is one configuration file: sabdab.yaml, which can be used to change any of the parameters. You can train the model by using:\n\n```\npython train.py # For training. \n```\n\n\n## Structure of the repo\n\n\u003cpre\u003e\n- train.py\n\n- src\n    |-model.py\n    \n- config\n    |-sabdab.yaml\n    \n- utils\n    |-load_data.py\n    |-arguments.py\n    |-model_utils.py\n    |-loss_functions.py\n    ...\n    \n- data\n    |-her2\n    ...\n\u003c/pre\u003e\n\n\n\n## Experiment tracking\nWeight \u0026 Biases can be used to track experiments. It is turned off by default, but can be turned on by changing option in the config file in ```./config/sabdab.yaml```\n\n\n## Bechmarking datasets and licenses\nBenchmarking datasets and their corresponding licenses can be found in ./benchmarking_datasets folder. The original Absci datasets can be found at:\n- Absci IgDesign Datasets: https://github.com/AbSciBio/igdesign/\n- Absci Her2 Datasets: https://github.com/AbSciBio/unlocking-de-novo-antibody-design\n\n## Citing the paper\n\n```\n@article {Ucar2024.10.07.617023,\n\tauthor = {Ucar, Talip and Malherbe, Cedric and Gonzalez Hernandez, Ferran},\n\ttitle = {Exploring Log-Likelihood Scores for Ranking Antibody Sequence Designs},\n\telocation-id = {2024.10.07.617023},\n\tyear = {2024},\n\tdoi = {10.1101/2024.10.07.617023},\n\tpublisher = {Cold Spring Harbor Laboratory},\n\tURL = {https://www.biorxiv.org/content/early/2024/10/24/2024.10.07.617023},\n\teprint = {https://www.biorxiv.org/content/early/2024/10/24/2024.10.07.617023.full.pdf},\n\tjournal = {bioRxiv}\n}\n```\n\n## Citing this repo\nIf you use DiffAbXL in your own studies, and work, please cite it by using the following:\n\n```\n@Misc{talip_ucar_2024_DiffAbXL,\n\tauthor =   {Talip Ucar},\n\ttitle = {Exploring Log-Likelihood Scores for Ranking Antibody Sequence Designs},\n\tURL = {https://github.com/AstraZeneca/DiffAbXL},\n\tmonth = {October},\n\tyear = {since 2024}\n}\n```\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrazeneca%2Fdiffabxl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fastrazeneca%2Fdiffabxl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrazeneca%2Fdiffabxl/lists"}