{"id":21677291,"url":"https://github.com/iit-demokritos/drug_id_mapping","last_synced_at":"2025-10-12T05:48:51.830Z","repository":{"id":83882628,"uuid":"408409580","full_name":"iit-Demokritos/drug_id_mapping","owner":"iit-Demokritos","description":"Cross matching drugs from different databases ( Drugbank, STITCH, ChEMBL PubChem, UMLS, TTD, KEGG, ZINC ) and saving in a structured TSV file.","archived":false,"fork":false,"pushed_at":"2024-01-22T18:18:22.000Z","size":5145,"stargazers_count":23,"open_issues_count":0,"forks_count":5,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-19T12:10:02.708Z","etag":null,"topics":["database-mapping","drug-identification"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iit-Demokritos.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-09-20T11:05:48.000Z","updated_at":"2025-04-16T16:40:22.000Z","dependencies_parsed_at":"2024-01-22T19:39:28.799Z","dependency_job_id":"f54a68c6-744a-475f-ad71-a70ed58350ea","html_url":"https://github.com/iit-Demokritos/drug_id_mapping","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/iit-Demokritos/drug_id_mapping","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iit-Demokritos%2Fdrug_id_mapping","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iit-Demokritos%2Fdrug_id_mapping/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iit-Demokritos%2Fdrug_id_mapping/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iit-Demokritos%2Fdrug_id_mapping/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iit-Demokritos","download_url":"https://codeload.github.com/iit-Demokritos/drug_id_mapping/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iit-Demokritos%2Fdrug_id_mapping/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279010329,"owners_count":26084737,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-12T02:00:06.719Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database-mapping","drug-identification"],"created_at":"2024-11-25T14:19:03.747Z","updated_at":"2025-10-12T05:48:51.803Z","avatar_url":"https://github.com/iit-Demokritos.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Mapping drug ids from Drugbank, STITCH, UMLS, KEGG, PubChem, ChEMBL and other databases:\n\nThis project includes various transformation tools that create and enrich a [TSV file](https://github.com/iit-Demokritos/drug_id_mapping/blob/main/drug-mappings.tsv), which lists thousand of known drugs and all the available ids that could be found in drug databases.\n\nIn particular, we start from retrieving the drug information included in the latest Drugbank [1] (VERSION 5.1.8, RELEASED ON 2021-01-03) as well as in the latest Therapeutic Target Database [2] (VERSION 7.1.01, RELEASED ON 2019.07.14) in a file.\nWe then enrich the drug fields by querying the following sources:\n-  the web services API of ChEMBL Database [3][4]\n-  the PUG REST API of PubChem Database [5]\n-  the drugs file in the FTP server of the KEGG Database [6][7][8]\n-  the UMLS Metathesaurus vocabulary Database[9], using the MetamorphoSys tool \n-  the mapping files of the STITCH Database \n\n\n## Licence \u0026 Required Citation\nFor any use of the drug-mappings.tsv file in your work, **a citation to the following paper is expected:**\n\n*Aisopos, F., Paliouras, G. Comparing methods for drug–gene interaction prediction on the biomedical literature knowledge graph: performance versus explainability. BMC Bioinformatics 24, 272 (2023), [DOI](https://doi.org/10.1186/s12859-023-05373-2).*\n\ndrug_id_mapping - NCSR Demokritos module Copyright 2021 Fotis Aisopos\nThe Java code and TSV file are provided **only for academic/research use and are licensed under the Apache License, Version 2.0 (the \"License\")**; you may not use this file except in compliance with the License. You may obtain a copy of the License at: https://www.apache.org/licenses/LICENSE-2.0 .\n\n## Mapping TSV file data format\n\nThe resulting file ([drug-mappings.tsv](https://github.com/iit-Demokritos/drug_id_mapping/blob/main/drug-mappings.tsv)) includes a tab-separated entry for each drug, including multiple ids that could be found and crossed-checked from the aforementioned databases.\nFor ids not found in none of the above sources, 'null' string is added. Multiple CUIs for a specific drug are separated with a comma separator(,).\nAn example of the format of the TSV data file is as follows:\n\n```sh\ndrugbankId\tname\tttd_id\tpubchem_cid\tcas_num\tchembl_id\tzinc_id\tchebi_id\tkegg_cid\tkegg_id\tbindingDB_id\tUMLS_cuis stitch_id\nDB01149\tNefazodone\tDAP000042\t4449\t83366-66-9\tCHEMBL623\tZINC000000538065\t7494\tC07256\tD08257\t50069447\tC0068485  CID000004449\nDB01157\tTrimetrexate\tDAP000635\t5583\t52128-35-5\tCHEMBL119\tZINC000000598852\t9737\tC11154\tD06238\t18268\tC0085176  CID100005582\nDB01248\tDocetaxel\tDAP000590\t148124\t114977-28-5\tCHEMBL92\tZINC000085537053\t4672\tC11231\tD02165\t36351\tC0246415,C0771375 CID100003143\nDB02579\tAcrylic Acid\tD0E3MA\t6581\t79-10-7\tCHEMBL1213529\tZINC000000895281\t18308\tC00511\tnull\tnull\tnull  null\n...\n```\n\n## Java Project File Structure \u0026 running\n\nThe code includes the basic package gr.demokritos.tranformations with various classes serving different functionalities, e.g.:\n- CreateDrugMappings class: main class, can be used to call all other classes of interest\n- xx_DrugbankMapper class: Maps the ids of xx Database to Drugbank\n- xxIdTransformer classes: Transforms the ids of xx Database to Drugbank and retrieve the respective UMLS_cuis\n- OpenXML class: Parses the Drugbank XML file and retrieves all information of interest\n- MetathesaurusAPIticketService class: creates a new TGT and API key, in order to query UMLS REST API (alternatively to using MetamorphoSys tool)\n\nTo run the aforementioned Java project, it is obvious that we need to have access to the following sources:\n- Drugbank (to download the latest XML file)\n- TTD (to download the drugs' information file in raw format)\n- Entrez Programming Utilities (E-utilities) API (query PUG for PubChem ids and obtain a token to query for a TGT and an API key)\n- KEGG (to download the KEGG drug file)\n- UniChem API (to query for ChEMBL ids)\n- DrugBank-Sider_mapping files (to query for STITCH ids)\nand also include needed jar libraries in the CLASSPATH.\n\n## References\n[1]: Wishart, D. S., Knox, C., Guo, A. C., Shrivastava, S., Hassanali, M., Stothard, P., ... \u0026 Woolsey, J. (2006). DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic acids research, 34(suppl_1), D668-D672.\n\n[2]: Y. X. Wang, S. Zhang, F. C. Li, Y. Zhou, Y. Zhang, R. Y. Zhang, J. Zhu, Y. X. Ren, Y. Tan, C. Qin, Y. H. Li, X. X. Li, Y. Z. Chen* and F. Zhu*. Therapeutic Target Database 2020: enriched resource for facilitating research and early development of targeted therapeutics. Nucleic Acids Research. 48(D1): D1031-D1041 (2020). PubMed ID: 31691823\n\n[3]: Mendez, D., Gaulton, A., Bento, A. P., Chambers, J., De Veij, M., Félix, E., ... \u0026 Leach, A. R. (2019). ChEMBL: towards direct deposition of bioassay data. Nucleic acids research, 47(D1), D930-D940.\n\n[4]: Davies, M., Nowotka, M., Papadatos, G., Dedman, N., Gaulton, A., Atkinson, F., ... \u0026 Overington, J. P. (2015). ChEMBL web services: streamlining access to drug discovery data and utilities. Nucleic acids research, 43(W1), W612-W620.\n\n[5]: Kim, S., Thiessen, P. A., Cheng, T., Yu, B., \u0026 Bolton, E. E. (2018). An update on PUG-REST: RESTful interface for programmatic access to PubChem. Nucleic acids research, 46(W1), W563-W570.\n\n[6]: Kanehisa, M., \u0026 Goto, S. (2000). KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research, 28(1), 27-30.\n\n[7]: Kanehisa, M. (2019). Toward understanding the origin and evolution of cellular organisms. Protein Science, 28(11), 1947-1951.\n\n[8]: Kanehisa, M., Furumichi, M., Sato, Y., Ishiguro-Watanabe, M., \u0026 Tanabe, M. (2021). KEGG: integrating viruses and cellular organisms. Nucleic acids research, 49(D1), D545-D551.\n\n[9]: Bodenreider, O. (2004). The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research, 32(suppl_1), D267-D270.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiit-demokritos%2Fdrug_id_mapping","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiit-demokritos%2Fdrug_id_mapping","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiit-demokritos%2Fdrug_id_mapping/lists"}