{"id":20690269,"url":"https://github.com/merck/ablef","last_synced_at":"2025-04-22T16:56:01.829Z","repository":{"id":211984571,"uuid":"727401755","full_name":"Merck/AbLEF","owner":"Merck","description":"Antibody Langauge Ensemble Fusion - fuses antibody structural ensemble and language representation for property prediction","archived":false,"fork":false,"pushed_at":"2024-04-23T15:09:12.000Z","size":8960,"stargazers_count":11,"open_issues_count":0,"forks_count":2,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-03-29T16:51:14.403Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Merck.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE/gnu-gpl-v3.0.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-12-04T19:40:17.000Z","updated_at":"2025-02-02T05:02:51.000Z","dependencies_parsed_at":"2024-04-23T16:44:27.321Z","dependency_job_id":null,"html_url":"https://github.com/Merck/AbLEF","commit_stats":null,"previous_names":["merck/ablef"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Merck%2FAbLEF","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Merck%2FAbLEF/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Merck%2FAbLEF/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Merck%2FAbLEF/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Merck","download_url":"https://codeload.github.com/Merck/AbLEF/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250284110,"owners_count":21405288,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-16T23:12:28.152Z","updated_at":"2025-04-22T16:56:01.809Z","avatar_url":"https://github.com/Merck.png","language":"Python","readme":"# [AbLEF: Antibody Langauge Ensemble Fusion](https://doi.org/10.1093/bioinformatics/btae268)\n\nfuses antibody 3D conformational ensemble and language representation for property prediction\n\ncurrent models include:\n- language -- AbLang, ProtBERT, ProtBERT-BFD\n- 3D conformational ensemble -- LEF (CNN transformer)\n![image info](pics/fig1_rev.png)\n```\n@article{rollins2024,\n        title = {{AbLEF}: {Antibody} {Language} {Ensemble} {Fusion} for {thermodynamically} {empowered} {property} {predictions}},\n        journal = {Bioinformatics},\n        author = {Rollins, Zachary A and Widatalla, Talal and Waight, Andrew and Cheng, Alan C and Metwally, Essam},\n\turl = {https://doi.org/10.1093/bioinformatics/btae268},\n        month = apr,\n        year = {2024}}\n```\n```\n@article{rollins2023,\n        title = {{AbLEF}: {Antibody} {Language} {Ensemble} {Fusion} for {thermodynamically} {empowered} {property} {predictions}},\n        journal = {The NeurIPS Workshop on New Frontiers of AI for Drug Discovery and Development (AI4D3 2023)},\n        author = {Rollins, Zachary A and Widatalla, Talal and Waight, Andrew and Cheng, Alan C and Metwally, Essam},\n\turl = {https://ai4d3.github.io/papers/55.pdf},\n        month = dec,\n        year = {2023}}\n```\n## requirements\n- [git lfs](https://git-lfs.com/) for locally stored language models\n    - [protbert](https://huggingface.co/Rostlab/protbert) requires local installation to 'config/'\n    - [protbert-bfd](https://huggingface.co/Rostlab/protbert-bfd) requires local installation to 'config/' \n    - [ablang](https://github.com/oxpig/AbLang) is locally installed with .yaml file\n```\n    conda env create --name ablef --file alef.yaml\n```\n\n## preprocess data\n### 1. ensemble generation\n- Boltzmann imitator for multi-structure ensemble generation saved as pdb files (e.g., [LowModeMD](https://pubs.acs.org/doi/10.1021/ci900508k), [MD](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005659))\n- AbLEF manuscript uses LowModeMD in MOE and requires a license that can be acquired from [CCG](https://www.chemcomp.com/)\n\t- input sequence fasta file with variable fragment (Fv) into MOE\n \t- homology model by running MOE Antibody Modeler Application (default settings)\n  \t- run MOE Stochastic Titration Application (nconf=50, T=300 or 400K, salt_conc=0.1) \n- We also provide an open-source alternative to researchers using ImmuneBuilder and OpenMM\n\t- input heavy (--h) and light (--l) chain sequence to generate mAb with [ImmuneBuilder](https://github.com/oxpig/ImmuneBuilder)\n \t- run [OpenMM simulation engine](https://github.com/openmm/openmm) to generate ensemble with implicit solvent\n```\n    python ./data/ensemble.py --pdb='/pathway/to/input/mAb.pdb' --output='openmm_step_mAb.pdb' --T=300 --conc=0.1 --steps=50000\n```\n### 2. cluster structures from ensemble\n- pdb files from ensemble generation can be clustered using density based spatial clustering on the backbone atom distance matrices\n```\n    python ./cluster/main.py input='/pathway/to/pdbs/' output='/pathway/to/pdbs/results' cpu_threads=28 noh=true method=dbscan eps=1.9 min_samples=1 \n```\n### 3. data storage \u0026 processing \n- to utilize multi-structure ensemble fusion (LEF) pdb files in data directories are converted to pairwise distance tensors and saved as numpy arrays\n- fasta files are converted to txt files for the heavy and light chain using IMGT canonical alignment (padded as zeros)\n- gif below depicts an ensemble of pairwise distance tensors used for training AbLEF\n```\n    python ./data/preprocess.py\n```\n\n![Alt text](pics/Ab.gif)\n\n## train and hyperparameter tune\n- training and tuning execution is specified by the configuration files: 'config/setup.json'\n- ensemble length (i.e., L or ens_L) is specified during training and inference: setup['training']['ens_L']\n```\n    python ./src/train_tune.py\n```\n\n### hyperparameter tune\n- setup[\"training\"][\"ray_tune\"] == True\n- specify hyperparameter search space in the '__main__' of ./src/train_tune.py\n- ray cluster must be initialized before hyperparameter tuning execution\n- submit PBS script with specified num_cpus and num_gpus\n- start ray cluster\n```\n    ray start --head --num-cpus=8 --num-gpus=4 --temp-dir=\"/absolute/path/to/temporary/storage/\"\n    python ./src/train_tune.py\n```\n\n### inference and holdout\n- test trained/validated models on holdout by specifying config/setup.json\n- AbLEF models trained on hicrt and tagg are located in models/weights\n- setup['holdout']['model_path']\n- setup['holdout']['holdout_data_path]\n```\n    python ./src/holdout.py\n```\n\n### logging information and model storage\n- train_tune.log files are recorded and saved for every time stamped batch run\n- runs are also recorded on tensorboard\n- ***** = unqiue file identifier (e.g., time stamp or number)\n```\n    logs/batch_*****/train_tune.log\n    logs/batch_*****/events.out.tfevents.***** (setup[\"training\"][\"ray_tune\"] == False)\n    logs/batch_*****/ray_tune/hp_tune_*****/checkpoint_*****/events.out.tfevents.***** (setup[\"training\"][\"ray_tune\"] == True)\n```\n\n- hyperparameter tune runs are implemented by ray tune and models are stored\n- non-hyperparameter tuned models are also stored\n```\n    logs/batch_*****/ray_tune/hp_tune_*****/checkpoint_*****/dict_checkpoint.pkl (setup[\"training\"][\"ray_tune\"] == True)\n    models/weights/batch_*****/ALEF*****.pth (setup[\"training\"][\"ray_tune\"] == False)\n```\n## AbPROP integration\n\n![image info](pics/abprop.png)\n\n```\n@article{widatalla2023,\n\ttitle = {{AbPROP}: {Language} and {Graph} {Deep} {Learning} for {Antibody} {Property} {Prediction}},\n\tjournal = {ICML Workshop on Computational Biology},\n\tauthor = {Widatalla, Talal and Rollins, Zachary A and Chen, Ming-Tang and Waight, Andrew and Cheng, Alan},\n\turl = {https://icml-compbio.github.io/2023/papers/WCBICML2023_paper53.pdf},\n\tmonth = jul,\n\tyear = {2023}}\n```\n\n- we also integrated the [AbPROP codebase](https://github.com/merck/abprop)\n- [AbPROP methods](https://icml-compbio.github.io/2023/papers/WCBICML2023_paper53.pdf) are used as baselines to compare the AbLEF results with graph neural netowrks + language fusion\n- graph neural networks are currently only single-structure molecular representations\n- to utilize graph neural networks pdb files are converted and saved as torch geometric Data objects for GVP \u0026 GAT\n```\n    python ./data/preprocess_graphs/graph_structs.py\n```\n\n# License\n    AbLEF fuses antibody language and structural ensemble representations for property prediction.\n    Copyright © 2023 Merck \u0026 Co., Inc., Rahway, NJ, USA and its affiliates. All rights reserved.\n\n    This program is free software: you can redistribute it and/or modify\n    it under the terms of the GNU General Public License as published by\n    the Free Software Foundation, either version 3 of the License, or\n    (at your option) any later version.\n\n    This program is distributed in the hope that it will be useful,\n    but WITHOUT ANY WARRANTY; without even the implied warranty of\n    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n    GNU General Public License for more details.\n\n    You should have received a copy of the GNU General Public License\n    along with this program.  If not, see \u003chttp://www.gnu.org/licenses/\u003e.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmerck%2Fablef","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmerck%2Fablef","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmerck%2Fablef/lists"}