{"id":13752231,"url":"https://github.com/tencent-ailab/DrugOOD","last_synced_at":"2025-05-09T18:33:27.001Z","repository":{"id":37968350,"uuid":"469560781","full_name":"tencent-ailab/DrugOOD","owner":"tencent-ailab","description":"OOD Dataset Curator and Benchmark for AI-aided Drug Discovery","archived":false,"fork":false,"pushed_at":"2022-10-23T19:07:12.000Z","size":503,"stargazers_count":158,"open_issues_count":6,"forks_count":24,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-04-13T06:37:45.306Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tencent-ailab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-03-14T02:57:53.000Z","updated_at":"2025-04-11T09:22:48.000Z","dependencies_parsed_at":"2023-01-19T14:46:04.851Z","dependency_job_id":null,"html_url":"https://github.com/tencent-ailab/DrugOOD","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tencent-ailab%2FDrugOOD","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tencent-ailab%2FDrugOOD/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tencent-ailab%2FDrugOOD/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tencent-ailab%2FDrugOOD/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tencent-ailab","download_url":"https://codeload.github.com/tencent-ailab/DrugOOD/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253303268,"owners_count":21886918,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T09:01:02.007Z","updated_at":"2025-05-09T18:33:22.449Z","avatar_url":"https://github.com/tencent-ailab.png","language":"Python","funding_links":[],"categories":["药物发现、药物设计","Ranked by starred repositories"],"sub_categories":["网络服务_其他"],"readme":"# :fire:DrugOOD:fire::  OOD Dataset Curator and Benchmark for AI Aided Drug Discovery\n\n\nThis is the official implementation of the DrugOOD project, this is the project page: \u003chttps://drugood.github.io/\u003e  \n\n\n## Environment Installation\n\nYou can install the conda environment using the drugood.yaml file provided: \n\n```shell\n!git clone https://github.com/tencent-ailab/DrugOOD.git\n!cd DrugOOD\n!conda env create --name drugood --file=drugood.yaml\n!conda activate drugood\n```   \nThen you can go to the demo at `demo/demo.ipynb` which gives a quick practice on how to use DrugOOD.\n\n\n## Demo\n\nFor a quick practice on using DrugOOD for dataset curation and OOD benchmarking, one can refer to the `demo/demo.ipynb`.   \n\n## Dataset Curator\n\nFirst, you need to generate the required DrugOOD dataset with our code. The dataset curator currently focusing on  generating datasets from CHEMBL. It supports the following two tasks:\n\n- Ligand Based Affinity Prediction (LBAP).\n- Structure Based Affinity Prediction (SBAP).\n\nFor OOD domain annotations, it supports the following 5 choices.\n\n- Assay.\n- Scaffold.\n- Size.\n- Protein. (only for SBAP task)\n- Protein Family. (only for SBAP task)\n\nFor noise annotations, it supports the following three noise levels. Datasets with different\nnoises are implemented by filters with different levels of strictness.\n\n- Core.\n- Refined.\n- General.\n\nAt the same time, due to the inconvenient conversion between different measurement type (E.g. IC50, EC50, Ki, Potency),   one needs to specify the measurement type when generating the dataset.\n\n### How to Run and Reproduce the 96 Datasets?\n\nFirstly, specifiy the path of CHEMBL database and the directory to save the data in the configuration\nfile: `configs/_base_/curators/lbap_defaults.py` for LBAP task  or    `configs/_base_/curators/sbap_defaults.py` for SBAP task.   \nThe `source_root=\"YOUR_PATH/chembl_29_sqlite/chembl_29.db\"` means the path to the \nchembl29 sqllite file.  The `target_root=\"data/\"` specifies the folder to save the generated data.   \n\nNote that you can download the original chembl29 database with sqllite format from `http://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_29/chembl_29_sqlite.tar.gz`.\n\n\nThe built-in configuration files are located in:    \n`configs/curators/`. Here we provide the 96 config files to __reproduce__ the 96 datasets  in our paper.  Meanwhile, \nyou can also customize your own datasets by changing the config files.  \n\nRun `tools/curate.py` to generate dataset. Here are some examples:\n\nGenerate datasets for the LBAP task, with `assay` as domain, `core` as noise\nlevel, `IC50` as measurement type, `LBAP` as task type.:\n\n```shell\npython tools/curate.py --cfg configs/curators/lbap_core_ic50_assay.py\n```\n\nGenerate datasets for the SBAP task, with `protein` as domain, `refined` as noise level, `EC50` as\nmeasurement type, `SBAP` as task type.:\n\n```shell\npython tools/curate.py --cfg configs/curator/sbap_refined_ec50_protein.py\n```\n\n## Benchmarking SOTA OOD Algorithms\n\nCurrently we support 6 different baseline algorithms:\n\n- ERM\n- IRM\n- GroupDro\n- Coral\n- MixUp\n- DANN\n\nMeanwhile, we support various GNN backbones:\n\n- GIN\n- GCN\n- Weave\n- ShcNet\n- GAT\n- MGCN\n- NF\n- ATi-FPGNN\n- GTransformer\n\nAnd different backbones for protein sequence modeling:\n\n- Bert\n- ProteinBert\n\n### How to Run?\n\nFirstly, run the following command to install.\n\n```shell\npython setup.py develop\n```\n\nRun the LBAP task with ERM algorithm:\n\n```shell\npython tools/train.py configs/algorithms/erm/lbap_core_ec50_assay_erm.py\n```                                                        \n\nIf you would like to run ERM on other datasets, change the corresponding options inside the above\nconfig file. For example,  `ann_file = 'data/lbap_core_ec50_assay.json'`   specifies the input data.  \n\nSimilarly, run the SBAP task with ERM algorithm: \n\n```shell\npython tools/train.py configs/algorithms/erm/sbap_core_ec50_assay_erm.py\n``` \n\n\n## Reference\n\n:smile:If you find this repo is useful, please consider to cite our paper:\n\n```\n@ARTICLE{2022arXiv220109637J,\n    author = {{Ji}, Yuanfeng and {Zhang}, Lu and {Wu}, Jiaxiang and {Wu}, Bingzhe and {Huang}, Long-Kai and {Xu}, Tingyang and {Rong}, Yu and {Li}, Lanqing and {Ren}, Jie and {Xue}, Ding and {Lai}, Houtim and {Xu}, Shaoyong and {Feng}, Jing and {Liu}, Wei and {Luo}, Ping and {Zhou}, Shuigeng and {Huang}, Junzhou and {Zhao}, Peilin and {Bian}, Yatao},\n    title = \"{DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations}\",\n    journal = {arXiv e-prints},\n    keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Quantitative Biology - Quantitative Methods},\n    year = 2022,\n    month = jan,\n    eid = {arXiv:2201.09637},\n    pages = {arXiv:2201.09637},\n    archivePrefix = {arXiv},\n    eprint = {2201.09637},\n    primaryClass = {cs.LG}\n}\n```     \n\n## Disclaimer \nThis is not an officially supported Tencent product.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencent-ailab%2FDrugOOD","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftencent-ailab%2FDrugOOD","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencent-ailab%2FDrugOOD/lists"}