{"id":33122539,"url":"https://github.com/gnn4dr/DRKG","last_synced_at":"2025-11-19T23:00:57.639Z","repository":{"id":38404918,"uuid":"254278386","full_name":"gnn4dr/DRKG","owner":"gnn4dr","description":"A knowledge graph and a set of tools for drug repurposing","archived":false,"fork":false,"pushed_at":"2022-04-19T16:44:16.000Z","size":20069,"stargazers_count":565,"open_issues_count":20,"forks_count":153,"subscribers_count":26,"default_branch":"master","last_synced_at":"2024-07-07T14:34:35.750Z","etag":null,"topics":["dgl","dgl-ke","drug-repurposing","graph-neural-networks","knowledge-graph","knowledge-graph-embeddings"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gnn4dr.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-09T05:21:18.000Z","updated_at":"2024-07-03T12:23:37.000Z","dependencies_parsed_at":"2022-07-12T17:28:52.212Z","dependency_job_id":null,"html_url":"https://github.com/gnn4dr/DRKG","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gnn4dr/DRKG","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnn4dr%2FDRKG","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnn4dr%2FDRKG/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnn4dr%2FDRKG/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnn4dr%2FDRKG/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gnn4dr","download_url":"https://codeload.github.com/gnn4dr/DRKG/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnn4dr%2FDRKG/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":285342137,"owners_count":27155385,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-19T02:00:05.673Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dgl","dgl-ke","drug-repurposing","graph-neural-networks","knowledge-graph","knowledge-graph-embeddings"],"created_at":"2025-11-15T05:00:42.186Z","updated_at":"2025-11-19T23:00:57.632Z","avatar_url":"https://github.com/gnn4dr.png","language":"Jupyter Notebook","funding_links":[],"categories":["Databases"],"sub_categories":["Interaction"],"readme":"# Drug Repurposing Knowledge Graph (DRKG)\nDrug Repurposing Knowledge Graph (DRKG) is a comprehensive biological knowledge graph relating genes, compounds, diseases, biological processes, side effects and symptoms. DRKG includes information from six existing databases including DrugBank, Hetionet, GNBR, String, IntAct and DGIdb, and data collected from recent publications particularly related to Covid19. It includes 97,238 entities belonging to 13 entity-types; and 5,874,261 triplets belonging to 107 edge-types. These 107 edge-types show a type of interaction between one of the 17 entity-type pairs (multiple types of interactions are possible between the same entity-pair), as depicted in the figure below. It also includes a bunch of notebooks about how to explore and analysis the DRKG using statistical methodologies or using machine learning methodologies such as knowledge graph embedding.\n\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"connectivity.png\" alt=\"DRKG schema\" width=\"600\"\u003e\n  \u003cbr\u003e\n  \u003cb\u003eFigure\u003c/b\u003e: Interactions in the DRKG. The number next to an edge indicates the number of relation-types for that entity-pair in DRKG.\n\u003c/p\u003e\n\n## Statistics of DRKG\nThe type-wise distribution of the entities in DRKG and their original data-source(s) is shown in following table. \n\n| Entity type         | Drugbank | GNBR  | Hetionet | STRING | IntAct | DGIdb | Bibliography | Total Entities |\n|:--------------------|---------:|------:|---------:|-------:|-------:|------:|-------------:|---------------:|\n| Anatomy             | \\-       | \\-    | 400      | \\-     | \\-     | \\-    | \\-           | 400            |\n| Atc                 | 4,048     | \\-    | \\-       | \\-     | \\-     | \\-    | \\-           | 4,048           |\n| Biological Process  | \\-       | \\-    | 11,381    | \\-     | \\-     | \\-    | \\-           | 11,381          |\n| Cellular Component  | \\-       | \\-    | 1,391     | \\-     | \\-     | \\-    | \\-           | 1,391           |\n| Compound            | 9,708     | 11,961 | 1,538     | \\-     | 153    | 6,348  | 6,250         | 24,313          |\n| Disease             | 1,182     | 4,746  | 257      | \\-     | \\-     | \\-    | 33           | 5,103           |\n| Gene                | 4,973     | 27,111 | 19,145    | 18,316  | 16,321  | 2,551  | 3,181         | 39,220          |\n| Molecular Function  | \\-       | \\-    | 2,884     | \\-     | \\-     | \\-    | \\-           | 2,884           |\n| Pathway             | \\-       | \\-    | 1,822     | \\-     | \\-     | \\-    | \\-           | 1,822           |\n| Pharmacologic Class | \\-       | \\-    | 345      | \\-     | \\-     | \\-    | \\-           | 345            |\n| Side Effect         | \\-       | \\-    | 5,701     | \\-     | \\-     | \\-    | \\-           | 5,701           |\n| Symptom             | \\-       | \\-    | 415      | \\-     | \\-     | \\-    | \\-           | 415            |\n| Tax                 | \\-       | 215   | \\-       | \\-     | \\-     | \\-    | \\-           | 215            |\n| Total               | 19,911   | 44,033 | 45,279    | 18,316  | 16,474  | 8,899  | 9,464         | 97,238          |\n\n\nThe following table shows the number of triplets between different entity-type pairs in DRKG for DRKG and various datasources.\n\n| Entity\\-type pair                     | Drugbank | GNBR   | Hetionet | STRING  | IntAct | DGIdb | Bibliography | Total interactions |\n|:--------------------------------------|-----------:|-------:|---------:|--------:|-------:|------:|-------------:|-------------------:|\n| \\(Gene, Gene\\)                    | \\-         | 66,722  | 474,526   | 1,496,708 | 254,346 | \\-    | 58,629        | 2,350,931            |\n| \\(Compound, Gene\\)                | 24,801      | 80,803  | 51,429    | \\-      | 1,805   | 26,290 | 25,666        | 210,794             |\n| \\(Disease, Gene\\)                 | \\-         | 95,399  | 27,977    | \\-      | \\-     | \\-    | 461          | 123,837             |\n| \\(Atc, Compound\\)                 | 15,750      | \\-     | \\-       | \\-      | \\-     | \\-    | \\-           | 15,750              |\n| \\(Compound, Compound\\)            | 1,379,271    | \\-     | 6,486     | \\-      | \\-     | \\-    | \\-           | 1,385,757            |\n| \\(Compound, Disease\\)             | 4,968        | 77,782  | 1,145     | \\-      | \\-     | \\-    | \\-           | 83,895              |\n| \\(Gene, Tax\\)                     | \\-         | 14,663  | \\-       | \\-      | \\-     | \\-    | \\-           | 14,663              |\n| \\(Biological Process, Gene\\)      | \\-         | \\-     | 559,504   | \\-      | \\-     | \\-    | \\-           | 559,504             |\n| \\(Disease, Symptom\\)              | \\-         | \\-     | 3,357     | \\-      | \\-     | \\-    | \\-           | 3,357               |\n| \\(Anatomy, Disease\\)              | \\-         | \\-     | 3,602     | \\-      | \\-     | \\-    | \\-           | 3,602               |\n| \\(Disease, Disease\\)              | \\-         | \\-     | 543      | \\-      | \\-     | \\-    | \\-           | 543                |\n| \\(Anatomy, Gene\\)                 | \\-         | \\-     | 726,495   | \\-      | \\-     | \\-    | \\-           | 726,495             |\n| \\(Gene, Molecular Function\\)      | \\-         | \\-     | 97,222    | \\-      | \\-     | \\-    | \\-           | 97,222              |\n| \\(Compound, Pharmacologic Class\\) | \\-         | \\-     | 1,029     | \\-      | \\-     | \\-    | \\-           | 1,029               |\n| \\(Cellular Component, Gene\\)      | \\-         | \\-     | 73,566    | \\-      | \\-     | \\-    | \\-           | 73,566              |\n| \\(Gene, Pathway\\)                 | \\-         | \\-     | 84,372    | \\-      | \\-     | \\-    | \\-           | 84,372              |\n| \\(Compound, Side Effect\\)         | \\-         | \\-     | 138,944   | \\-      | \\-     | \\-    | \\-           | 138,944             |\n| Total                                 | 1,424,790    | 335,369 | 2,250,197  | 1,496,708 | 256,151 | 26,290 | 84,756        | 5,874,261            |\n\n\n## Download DRKG\nTo analyze DRKG, you can directly download drkg by following commands:\n```\nwget https://dgl-data.s3-us-west-2.amazonaws.com/dataset/DRKG/drkg.tar.gz\n```\nIf you use our notebooks provided in this repository, you don't need to download the file manually. The notebooks can automatically download the file for you.\n\nWhen you untar `drkg.tar.gz`, you will see the following files:\n\n```\n./drkg.tsv\n./entity2src.tsv\n./relation_glossary.tsv\n./embed\n./embed/DRKG_TransE_l2_relation.npy\n./embed/relations.tsv\n./embed/entities.tsv\n./embed/Readme.md\n./embed/DRKG_TransE_l2_entity.npy\n./embed/mol_contextpred.npy\n./embed/mol_masking.npy\n./embed/mol_infomax.npy\n./embed/mol_edgepred.npy\n```\n\n### DRKG dataset\nThe whole dataset contains four part:\n - drkg.tsv, a tsv file containing  the original drkg in the format of (h, r, t) triplets.\n - embed, a folder containing the pretrained Knowledge Graph Embedding using the entire drkg.tsv as the training set and pretrained GNN-based molecule embeddings from [molecule SMILES](./drugbank_info/drugbank_smiles.txt)\n - entity2src.tsv, a file mapping entities in drkg to their original sources.\n - relation_glossary.tsv, a file containing rge glossary of the relations in DRKG, and other associated information with sources (if available).\n\n### Pretrained DRKG embedding\nThe DRKG mebedding is trained using TransE\\_l2 model with dimention size of 400, there are four files:\n\n - DRKG\\_TransE\\_l2\\_entity.npy, NumPy binary data, storing the entity embedding\n - DRKG\\_TransE\\_l2\\_relation.npy, NumPy binary data, storing the relation embedding\n - entities.tsv, mapping from entity\\_name to tentity\\_id.\n - relations.tsv, mapping from relation\\_name to relation\\_id\n \nTo use the pretrained embedding, one can use np.load to load the entity embeddings and relation embeddings separately:\n\n```\nimport numpy as np\nentity_emb = np.load('./embed/DRKG_TransE_l2_entity.npy')\nrel_emb = np.load('./embed/DRKG_TransE_l2_relation.npy')\n```\n\n### Pretrained Molecule Embedding\n\nWe also provide molecule embeddings for most small-molecule drugs in DrugBank using pre-trained GNNs. In particular, \n[Strategies for Pre-training Graph Neural Networks](https://arxiv.org/abs/1905.12265) develops multiple approaches for \npre-training GNN-based molecular representations, combining supervised molecular property prediction with \nself-supervised learning approaches. We employ their method to compute four variants of molecule embeddings \nusing [DGL-LifeSci](https://github.com/awslabs/dgl-lifesci/tree/master/examples/molecule_embeddings).\n\n- `mol_contextpred.npy`: From a model pre-trained to predict surrounding graph structures of molecular subgraphs\n- `mol_infomax.npy`: From a model pre-trained to maximize the mutual information between local node representations \nand a global graph representation\n- `mol_edgepred.npy`: From a model pre-trained to encourage nearby nodes to have similar representations and enforcing \ndisparate notes to have distinct representations\n- `mol_masking.npy`: From a model pre-trained to predict randomly masked node and edge attributes\n\n## Tools to analyze DRKG\nWe analyze DRKG with some deep learning frameworks, including [DGL](https://github.com/dmlc/dgl) (a framework for graph neural networks) and [DGL-KE](https://github.com/awslabs/dgl-ke) (a library for computing knowledge graph embeddings). Please follow the instructions below to install the deep learning frameworks.\n\n### Install PyTorch\nCurrently all notebooks use PyTorch as Deep Learning backend. For install other version of pytorch please goto [Install PyTorch](https://pytorch.org/)\n```\nsudo pip3 install torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html\n```\n\n### Install DGL \nPlease install [DGL](https://www.dgl.ai/) (a framework for graph neural networks) with the following command. It installs DGL with CUDA support.\n```\nsudo pip3 install dgl-cu101\n```\nFor installing other versions of DGL, please go to [Install DGL](https://docs.dgl.ai/en/latest/install/index.html)\n\n### Install DGL-KE\nIf you want to training the model with notebooks (e.g., using Train_embeddings.ipynb or Edge_score_analysis.ipynb) at [knowledge-graph-embedding-based-analysis-of-drkg], you need to install both DGL and [DGL-KE](https://github.com/awslabs/dgl-ke) package here.\nDGL-KE can work with DGL \u003e= 0.4.3 (either CPU or GPU)\n```\nsudo pip3 install dglke\n```\n\n## Notebooks for analyzing DRKG\nWe provide a set of notebooks to analyze DRKG. Some of the notebooks use the tools installed in the previous section.\n\n### Basic Graph Analysis of DRKG\nTo evaluate the structural similarity among a pair of relation types we compute their Jaccard similarity coefficient and the overlap among the two edge types via the overlap coeffcient. This analysis is given in\n - [Jaccard_scores_among_all_edge_types_in_DRKG.ipynb](raw_graph_analysis/Jaccard_scores_among_all_edge_types_in_DRKG.ipynb)\n\n### Knowledge Graph Embedding Based Analysis of DRKG\nWe analyze the extracted DRKG by learning a TransE KGE model that utilizes the ![$\\ell_2$](https://render.githubusercontent.com/render/math?math=%24%5Cell_2%24) distance. As DRKG combines information from different data sources, we want to verify that meaningful entity and relation embeddings can be generated using knowledge graph embedding technology.\n\nWe split the edge triplets in training, validation and test sets as follows 90%, 5%, and 5% and train the KGE model as shown in following notebook:\n- [Train_embeddings.ipynb](embedding_analysis/Train_embeddings.ipynb)\n\nFinally, we obtain the entity and relation embeddings for the DRKG. We can do various embedding based analysis as provided in the following notebooks:\n - [Relation_similarity_analysis.ipynb](embedding_analysis/Relation_similarity_analysis.ipynb), analyzing the generate relation embedding similarity.\n - [Entity_similarity_analysis.ipynb](embedding_analysis/Entity_similarity_analysis.ipynb), analyzing the generate entity embedding similarity.\n - [Edge_score_analysis.ipynb](embedding_analysis/Edge_score_analysis.ipynb), evaluating whether the learned KGE model can predict the edges of DRGK\n - [Edge_similarity_based_on_link_recommendation_results.ipynb](embedding_analysis/Edge_similarity_based_on_link_recommendation_results.ipynb), evaluating how similar are the predicted links among different relation types.\n\n### Drug Repurposing Using Pretrained Model for COVID-19\nWe present an example of using pretrained DRKG model for drug repurposing for COVID-19. In the example, we directly use the pretrained model provided at [DRKG dataset](#drkg-dataset) and proposed 100 drugs for COVID-19. The following notebook provides the details:\n\n - [COVID-19_drug_repurposing.ipynb](drug_repurpose/COVID-19_drug_repurposing.ipynb)\n\n### DRKG with DGL\nWe provide a notebook, with example of using DRKG with Deep Graph Library (DGL).\n\nThe following notebook provides an example of building a heterograph from DRKG in DGL; and some examples of queries on the DGL heterograph:\n - [loading_drkg_in_dgl.ipynb](drkg_with_dgl/loading_drkg_in_dgl.ipynb)\n\n## Additional Information for DrugBank\n\nSome additional information about compounds from DrugBank is included in [drugbank_info](/drugbank_info), including the \ntype and weight of drugs, and the SMILES of small-molecule drugs.\n\n## Licence\nThis project is licensed under the Apache-2.0 License. However, the DRKG integrates data from many resources and users should consider the licensing of each source (see this [table](https://github.com/shuix007/COVID-19-KG/blob/master/licenses/Readme.md)) . We apply a license attribute on a per node and per edge basis for sources with defined licenses. \n\n## Cite\n\nPlease cite our dataset if you use this code and data in your work.\n\n```\n@misc{drkg2020,\n  author = {Ioannidis, Vassilis N. and Song, Xiang and Manchanda, Saurav and Li, Mufei and Pan, Xiaoqin\n            and Zheng, Da and Ning, Xia and Zeng, Xiangxiang and Karypis, George},\n  title = {DRKG - Drug Repurposing Knowledge Graph for Covid-19},\n  howpublished = \"\\url{https://github.com/gnn4dr/DRKG/}\",\n  year = {2020}\n}\n```\nA preprint describing this work will be available soon.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgnn4dr%2FDRKG","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgnn4dr%2FDRKG","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgnn4dr%2FDRKG/lists"}