{"id":19682807,"url":"https://github.com/ml-jku/hyper-dti","last_synced_at":"2025-04-29T05:30:44.444Z","repository":{"id":41178301,"uuid":"508615426","full_name":"ml-jku/hyper-dti","owner":"ml-jku","description":"HyperPCM: Robust task-conditioned modeling of drug-target interactions","archived":false,"fork":false,"pushed_at":"2024-10-01T17:02:01.000Z","size":79518,"stargazers_count":36,"open_issues_count":0,"forks_count":5,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-05T13:23:34.107Z","etag":null,"topics":["deep-learning","drug-discovery","drug-target-interactions","hypernetworks","proteochemometrics","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ml-jku.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-06-29T08:50:58.000Z","updated_at":"2024-10-28T07:05:47.000Z","dependencies_parsed_at":"2024-04-08T12:35:45.068Z","dependency_job_id":null,"html_url":"https://github.com/ml-jku/hyper-dti","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-jku%2Fhyper-dti","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-jku%2Fhyper-dti/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-jku%2Fhyper-dti/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-jku%2Fhyper-dti/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ml-jku","download_url":"https://codeload.github.com/ml-jku/hyper-dti/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251444132,"owners_count":21590426,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","drug-discovery","drug-target-interactions","hypernetworks","proteochemometrics","pytorch"],"created_at":"2024-11-11T18:12:15.858Z","updated_at":"2025-04-29T05:30:39.436Z","avatar_url":"https://github.com/ml-jku.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HyperPCM\n\n[![Python 3.9](https://img.shields.io/badge/Python-3.9-blue.svg)](https://www.python.org/downloads/release/python-390/)\n[![Pytorch](https://img.shields.io/badge/PyTorch-1.9-red.svg)](https://pytorch.org/get-started/previous-versions/)\n![Licence](https://img.shields.io/github/license/ml-jku/hyper-dti)\n\n**[Dependencies](#dependencies)**\n| **[Installation](#installation)**\n| **[Data](#data)**\n| **[Encoders](#encoders)**\n| **[Usage](#usage)**\n| **[Citation](#citation)**\n\n### Robust Task-Conditioned Modeling of Drug-Target Interactions\n\nEmma Svensson\u003csup\u003e1, 2\u003c/sup\u003e, Pieter-Jan Hoedt\u003csup\u003e1\u003c/sup\u003e, Sepp Hochreiter\u003csup\u003e1, 3\u003c/sup\u003e, Günter Klambauer\u003csup\u003e1\u003c/sup\u003e\n\n\u003csup\u003e1\u003c/sup\u003e ELLIS Unit Linz, Institute for Machine Learning, Johannes Kepler University Linz, 4040 Austria  \n\u003csup\u003e2\u003c/sup\u003e Molecular AI, Discovery Sciences, R\u0026D, AstraZeneca, Gothenburg, 431 83 Sweden\\\n\u003csup\u003e3\u003c/sup\u003e Institute of Advanced Research in Artificial Intelligence (IARAI), Vienna, 1030 Austria\n\nA central problem in drug discovery is to identify the interactions between drug-like compounds and protein targets. Over the past decades, various quantitative structure-activity relationship (QSAR) and proteo-chemometric (PCM) approaches have been developed to model and predict these interactions. While QSAR approaches solely utilize representations of the drug compound, PCM methods incorporate both representations of the protein target and the drug compound, enabling them to achieve above-chance predictive accuracy on previously unseen protein targets. Both QSAR\nand PCM approaches have recently been improved by machine learning and deep neural networks, that allow the development of drug-target interaction prediction models from measurement data. However, deep neural networks typically require large amounts of\ntraining data and cannot robustly adapt to new tasks, such as predicting interaction for unseen protein targets at inference time. In this work, we propose to use HyperNetworks (Schmidhuber, et al., 1992; Ha, et al., 2017) to efficiently transfer information between tasks during inference and thus to accurately\npredict drug-target interactions on unseen protein targets. Our HyperPCM model demonstrates state-of-the-art performance compared to previous methods on multiple\nwell-known benchmarks, including Davis, DUD-E, and a ChEMBL-derived dataset, particularly excelling in zero-shot inference involving unseen protein targets.\n\nRead the full paper in [Journal of Chemical Information and Modeling](https://pubs.acs.org/doi/10.1021/acs.jcim.3c01417).\nWorkshop versions are also available on OpenReview from [NeurIPS 2022 AI4Science](https://openreview.net/forum?id=dIX34JWnIAL) and [ELLIS ML4Molecules 2022](https://openreview.net/forum?id=MrUwwGKRhOM).\n\n![plot](figures/hyper-dti.png)\n\nOverview of the model architecture, including a) the context module proposed by Schimunek, et al. (2023) that enriches the \nembeddings of protein targets through an associative memory in the form of a Modern Hopfield Network, and b) the weight \ninitialization strategy, PWI, proposed by Chang, et al. (2021). \n## Dependencies\n\nMain requirements are,\n- CUDA \u003e= 11.1\n- PyTorch \u003e= 1.9\n\nAdditional packages: sklearn, [hopfield-layers](https://github.com/ml-jku/hopfield-layers), PyTDC\n\n**Logging** is supported with: wandb\n\n**Data preparation** and drug/target encoding require: rdkit, [bio-embeddings](https://github.com/sacdallago/bio_embeddings), [cddd](https://github.com/jrwnter/cddd.git), [molbert](https://github.com/BenevolentAI/MolBERT)\n\nTabular baseline XGBoost requires: xgboost\n\n## Installation\n\nThe recommended way to install the software is to use `pip/pip3`:\n\n```bash\n$ pip3 install git+https://github.com/ml-jku/hopfield-layers\n$ pip3 install git+https://github.com/ml-jku/hyper-dti\n```\n\nAfter installation, the HyperPCM model can be used by supplying the choice of drug_encoder (CDDD or MolBert) and target_encoder (SeqVec, UniRep, ProtBert, ProtT5, or ESM1b) as well as remaining arguments. If the context module should be used a memory, i.e. context, should also be provided.\n\n```python\nfrom hyper_dti.models.hyper_pcm import HyperPCM\n\nhyperpcm = HyperPCM(\n    drug_endorer='CDDD', \n    target_encoder='SeqVec', \n    args={\n        'hyper_fcn': ..., # HyperNetwork\n        'hopfield': ...,  # Context Module\n        'main_cls': ...   # QSAR Model\n    }\n)\n```\n\n## Data\nCurrently supported datasets are,\n- **Lenselink**, et al. (2017) benchmark derived from [ChEMBL](https://www.ebi.ac.uk/chembl/). \nPrepared data with exact folds for 10-fold cross-validation used is available in [data.pickle](hyper_dti/data/Lenselink/processed/data.pickle), use \nflag ```--data_dir hyper_dti/data``` to directly reproduce experiments on this dataset.\n- **Davis**, et al. (2011) benchmark supplied through [Therapeutics Data Commons](https://tdcommons.ai/multi_pred_tasks/dti/#davis). \nExact folds for 5-fold cross-validation are automatically generated in the supplied data module.\n- **KIBA** benchmark from Tang, et al. (2014) supplied through [Therapeutics Data Commons](https://tdcommons.ai/multi_pred_tasks/dti/#kiba). \nExact folds for 5-fold cross-validation are automatically generated in the supplied data module.\n- **DUD-E**, the Database of Useful Decoys: Enhanced benchmark from Mysinger, et al. (2012), available at https://dude.docking.org/. Prepared data with exact folds for 3-fold cross-validation, as proposed in DrugVQA (Zheng, et al., 2020), is available in [data/DUDE](hyper_dti/data/DUDE/raw/). Use flag ```--data_dir hyper_dti/data``` to directly reproduce experiments on this dataset.\n\nThe HyperPCM model is specifically developed to work for few- and zero-shot inference as illustrated in the following figure. \n![plot](figures/pcm.png)\n\n## Encoders\nCurrently supported encoders for drugs and targets respectively include the following pre-trained open-source models.\nAll drug encoders take the SMILES strings of the molecules as input and all target encoders take the amino-acid sequences \nof the proteins as input.\n\n**Drugs**\n\n- **CDDD**, Continuous and Data-Driven Descriptors proposed by Winter, et al. (2019) available at [github](https://github.com/jrwnter/cddd).\n- **MolBERT**, Molecular representation learning with the BERT language model proposed by Fabian, et al. (2020) available at [github](https://github.com/BenevolentAI/MolBERT).\n\n**Targets**\n\n- **SeqVec** proposed by Heinzinger, et al. (2019) is available through [bio_embeddings](https://github.com/sacdallago/bio_embeddings).\n- **UniRep** proposed by Alley, et al. (2019) is available through [bio_embeddings](https://github.com/sacdallago/bio_embeddings).\n- **ProtBERT** proposed by Elnaggar, et al. (2021) is available through [bio_embeddings](https://github.com/sacdallago/bio_embeddings).\n- **ProtT5** proposed by Elnaggar, et al. (2021) is available through [bio_embeddings](https://github.com/sacdallago/bio_embeddings).\n- **ESM-1b** proposed by Rives, et al. (2021) is available through [bio_embeddings](https://github.com/sacdallago/bio_embeddings).\n\n## Usage\nUse this repository to train and evaluate our HyperPCM model, or the baseline DeepPCM, with\n\n```bash\n$ python main.py --name experiment1 --architecture [model] --dataset [dataset] --split random --drug_encoder CDDD --target_encoder SeqVec\n```\nOptionally, specify `--wandb_username` to log runs in Weights \u0026 Biases and find other flags for hyperparameters and settings in [config.py](https://github.com/ml-jku/hyper-dti/blob/main/settings/config.py).\n\n### Pre-compute embeddings\nEmbeddings for drug compounds and protein targets can either be computed directly during runtime or be prepared in advance. \nTo pre-compute them run the following script for the drug and target encoders of interest respectively. \n```bash\n$ python precompute_embeddings.py --dataset Lenselink --input_type Drug --encoder_name CDDD\n```\n\n### Reproducibility\nReproduce full benchmarking of either the HyperPCM, DeepPCM, XGBoost, or RandomForest model \nfor any pair of encoders in either of the four settings of the two benchmarks Lenselink or Davis using\n```bash\n$ python reproduce_experiments.py --model HyperPCM --dataset Lenselink --split leave-protein-out --drug_encoder CDDD --target_encoder SeqVec\n```\nOptionally, specify `--wandb_username` to log runs in Weights \u0026 Biases.\n\n## Citation\n\nPlease cite our work using the following reference.\n```bibtex\n@article{svensson2024hyperpcm,\n    title={{HyperPCM: Robust Task-Conditioned Modeling of Drug--Target Interactions}},\n    author={Svensson, Emma and Hoedt, Pieter-Jan and Hochreiter, Sepp and Klambauer, G{\\\"u}nter},\n    journal={Journal of Chemical Information and Modeling},\n    volume = {64},\n    number = {7},\n    pages = {2539-2553},\n    year = {2024},\n    doi = {10.1021/acs.jcim.3c01417},\n    publisher={ACS Publications}\n}\n```\n\n\u003ci\u003eAccepted oral,\u003c/i\u003e\n\nSvensson, E., Hoedt, P.-J., Hochreiter, S., Klambauer, G. Task-conditioned modeling of drug-target interactions. In\nELLIS Machine Learning for Molecule Discovery Workshop, 2022.\n\n\u003ci\u003eAccepted posters,\u003c/i\u003e\n\nSvensson, E., Hoedt, P.-J., Hochreiter, S., Klambauer, G. Task-conditioned modeling of drug-target interactions. Poster presented at: Women in Machine Learning (WiML). Thirty-sixth Conference on Neural Information Processing Systems; 2022 Nov 28- Dec 9; New Orleans, LA.\n\nSvensson, E., Hoedt, P.-J., Hochreiter, S., Klambauer, G. Robust task-specific adaption of drug-target interaction models. Poster presented at: Women in Machine Learning (WiML). Thirty-ninth International Conference on Machine Learning; 2022 Jun 17-23; Baltimore, MD.\n\n## Funding\n\nThis work has received funding from the European Union’s Horizon 2020\nresearch and innovation programme under the Marie Skłodowska-Curie\nActions, grant agreement “Advanced machine learning for Innovative Drug\nDiscovery (AIDD)” No 956832”. [Homepage](https://ai-dd.eu/).\n\n![plot](figures/aidd.png)\n\n## References\n\nSchmidhuber, J., “Learning to control fast-weight memories: An alternative to dynamic recurrent networks.” Neural Computation, 1992.\n\nDavis, M. I., et al. \"Comprehensive analysis of kinase inhibitor selectivity.\" Nature Biotechnology 29.11 (2011): 1046-1051.\n\nMysinger, M. M., et al. \"Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking.\" Journal of medicinal chemistry 55.14 (2012): 6582-6594.\n\nTang, J., et al. \"Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis.\" Journal of Chemical Information and Modeling 54.3 (2014): 735-743.\n\nLenselink, E. B., et al. \"Beyond the hype: Deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set.\" Journal of Cheminformatics 9.1 (2017): 1-14.\n\nHa, D., et al. “HyperNetworks”. ICLR, 2017.\n\nAlley, E. C., et al. \"Unified rational protein engineering with sequence-based deep representation learning.\" Nature Methods 16.12 (2019): 1315-1322.\n\nChang, O., et al., “Principled weight initialization for hypernetworks.” International Conference on Learning Representations, 2019.\n\nHeinzinger, M., et al. \"Modeling aspects of the language of life through transfer-learning protein sequences.\" BMC Bioinformatics 20.1 (2019): 1-17.\n\nWinter, R., et al. \"Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations.\" Chemical Science 10.6 (2019): 1692-1701.\n\nFabian, B., et al. \"Molecular representation learning with language models and domain-relevant auxiliary tasks.\" Workshop for ML4Molecules (2020).\n\nZheng, S., et al. \"Predicting drug–protein interaction using quasi-visual question answering system.\" Nature Machine Intelligence 2.2 (2020): 134-140.\n\nElnaggar, A., et al. \"ProtTrans: Toward understanding the language of life through self-supervised learning.\" IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (2021): 7112–7127.\n\nRives, A., et al. \"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.\" Proceedings of the National Academy of Sciences 118.15 (2021): e2016239118.\n\nKim, P. T., et al. \"Unsupervised Representation Learning for Proteochemometric Modeling.\" International Journal of Molecular Sciences 22.23 (2021): 12882.\n\nSchimunek, J., et al., “Context-enriched molecule representations improve few-shot drug discovery.” International Conference on Learning Representations, 2023.\n\n## Keywords\nHyperNetworks, zero-shot, Modern Hopfield Networks, deep learning, drug-target interaction prediction, proteo-chemometrics, drug discovery\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fml-jku%2Fhyper-dti","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fml-jku%2Fhyper-dti","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fml-jku%2Fhyper-dti/lists"}