{"id":44355177,"url":"https://github.com/chembl/chembl_multitask_model","last_synced_at":"2026-02-11T16:05:10.074Z","repository":{"id":46301425,"uuid":"346730798","full_name":"chembl/chembl_multitask_model","owner":"chembl","description":"Target prediction multitask neural network, with examples running it in Python, C++, Julia and JS","archived":false,"fork":false,"pushed_at":"2025-09-01T13:56:05.000Z","size":93246,"stargazers_count":18,"open_issues_count":1,"forks_count":10,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-01T15:24:51.154Z","etag":null,"topics":["cheminformatics","chemistry","machine-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chembl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-03-11T14:32:20.000Z","updated_at":"2025-09-01T13:56:09.000Z","dependencies_parsed_at":"2025-02-26T15:36:11.832Z","dependency_job_id":null,"html_url":"https://github.com/chembl/chembl_multitask_model","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/chembl/chembl_multitask_model","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chembl%2Fchembl_multitask_model","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chembl%2Fchembl_multitask_model/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chembl%2Fchembl_multitask_model/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chembl%2Fchembl_multitask_model/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chembl","download_url":"https://codeload.github.com/chembl/chembl_multitask_model/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chembl%2Fchembl_multitask_model/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29337022,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-11T16:00:30.228Z","status":"ssl_error","status_checked_at":"2026-02-11T16:00:25.398Z","response_time":97,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cheminformatics","chemistry","machine-learning"],"created_at":"2026-02-11T16:05:09.370Z","updated_at":"2026-02-11T16:05:10.069Z","avatar_url":"https://github.com/chembl.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ChEMBL Multitask Neural Network model\n\nSmall and fast target prediction model trained on a panel of targets using ChEMBL data. The model can be used in off-target prediction scenarios with large collections of compounds. \n\n- Based on the blogpost: http://chembl.blogspot.com/2019/05/multi-task-neural-network-on-chembl.html\n- Model available in KNIME thanks to Greg Landrum: https://www.knime.com/blog/interactive-bioactivity-prediction-with-multitask-neural-networks\n\nThe model is exported to the ONNX format so it can be used in any programming language able to generate fingerprints with RDKit\n\n# Try the model online!\n\nUsing both RDKit Javascript MinimalLib and ONNX.js. Hosted in github pages: https://chembl.github.io/chembl_multitask_model\n\n# Data Extraction\n\n```bash\npython extract_format_dataset.py --chembl_version 36 --output_dir ./chembl_36/\n```\n\nActivities in ChEMBL with the following requirements are extracted\n\n- activities.standard_units = 'nM'\n- activities.standard_type IN ('EC50', 'IC50', 'Ki', 'Kd', 'XC50', 'AC50', 'Potency')\n- activities.data_validity_comment IS NULL\n- activities.standard_relation IN ('=', '\u003c')\n- activities.potential_duplicate = 0 AND assays.confidence_score \u003e= 8\n- target_dictionary.target_type = 'SINGLE PROTEIN'\n\nKeeping targets\n\n- with at least 100 active and 100 inactive compounds\n- mentioned in at least 2 publications\n\nUsing [IDG protein family activity thresholds](https://druggablegenome.net/IDGProteinFamilies)\n\n- Kinases: \u003c= 30nM\n- GPCRs: \u003c= 100nM\n- Nuclear Receptors: \u003c= 100nM\n- Ion Channels: \u003c= 10μM\n- Non-IDG Family Targets: \u003c= 1μM\n\nWhen multiple measurements for a target-pair are found, the one with the lowest concentration is selected. This intentionally biases the model toward sensitivity.\n\n\n# Model training\n\n```bash\npython train_chembl_multitask.py --chembl_version 36 --data_file ./chembl_36/mt_data_36_all.h5 --output_dir ./chembl_36/\n```\n\n# Extract Kinase data and train a Kinase specific model\n\n```bash\npython extract_format_dataset.py --chembl_version 36 --protein_family kinase --output_dir ./kinase/ \u0026\u0026 python train_chembl_multitask.py --chembl_version 36 --data_file ./kinase/mt_data_36_kinase.h5 --output_dir ./kinase/\n```\n\n# Example to predict in Python using the ONNX Runtime\n\n```Python\nimport onnxruntime\nimport numpy as np\nfrom rdkit import Chem\nfrom rdkit.Chem import rdMolDescriptors\n\nFP_SIZE = 1024\nRADIUS = 2\n\ndef calc_morgan_fp(smiles):\n    mol = Chem.MolFromSmiles(smiles)\n    fp = rdMolDescriptors.GetMorganFingerprintAsBitVect(\n        mol, RADIUS, nBits=FP_SIZE)\n    a = np.zeros((0,), dtype=np.float32)\n    Chem.DataStructs.ConvertToNumpyArray(fp, a)\n    return a\n\ndef format_preds(preds, targets):\n    preds = np.concatenate(preds).ravel()\n    np_preds = [(tar, pre) for tar, pre in zip(targets, preds)]\n    dt = [('chembl_id','|U20'), ('pred', '\u003cf4')]\n    np_preds = np.array(np_preds, dtype=dt)\n    np_preds[::-1].sort(order='pred')\n    return np_preds\n\n# load the model\nort_session = onnxruntime.InferenceSession(\"trained_models/chembl_34_model/chembl_34_multitask.onnx\", providers=['CPUExecutionProvider'])\n\n# calculate the FPs\nsmiles = 'CN(C)CCc1c[nH]c2ccc(C[C@H]3COC(=O)N3)cc12'\ndescs = calc_morgan_fp(smiles)\n\n# run the prediction\nort_inputs = {ort_session.get_inputs()[0].name: descs}\npreds = ort_session.run(None, ort_inputs)\n\n# example of how the output of the model can be formatted\npreds = format_preds(preds, [o.name for o in ort_session.get_outputs()])\n```\n\n# In Julia using [RDKitMinimalLib.jl](https://github.com/eloyfelix/RDKitMinimalLib.jl) and [ONNX.jl](https://github.com/FluxML/ONNX.jl)\n\n```julia\nimport RDKitMinimalLib: get_mol, get_morgan_fp\nimport Umlaut: play!\nimport ONNX\nimport JSON\n\npath = \"chembl_31_multitask.onnx\"\ntargets = JSON.parsefile(\"targets_31.json\")\n\n# dummy input\ndummy = rand(Float32, 1024, 1)\n# load the model\nmt_chembl = ONNX.load(path, dummy)\n\n# load molecule and calc morgan fingerprint\nmol = get_mol(\"CC(=O)Oc1ccccc1C(=O)O\")\nfp_details = Dict{String, Any}(\"nBits\" =\u003e 1024, \"radius\" =\u003e 2)\nmfp = get_morgan_fp(mol, fp_details)\n\n# convert the bitstring to a 1024×1 Matrix{Float32}\nmfp = map(x-\u003eparse(Float32,string(x)),collect(mfp))\nmfp = reshape(mfp, (length(mfp), 1))\n\n# test a molecule\npred = play!(mt_chembl, mfp)\npred = collect(Iterators.flatten(pred))\n\nres = tuple.(targets, pred)\nres = sort(res, by=res-\u003eres[2], rev=true)\n```\n\n# C++ REST microservice\n\nhttps://github.com/eloyfelix/pistache_predictor\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchembl%2Fchembl_multitask_model","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchembl%2Fchembl_multitask_model","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchembl%2Fchembl_multitask_model/lists"}