Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ckoenigs/PharMeBINet

Build PharMeBINet from different sources.
https://github.com/ckoenigs/PharMeBINet

database graph-database neo4j

Last synced: 2 months ago
JSON representation

Build PharMeBINet from different sources.

Awesome Lists containing this project

README

        

# PharMeBINet: The heterogeneous pharmacological medical biochemical network
Heterogeneous biomedical pharmacological databases are important for multiple fields in bioinformatics.
The Hetionet database already covers many different entities. Therefore, it was used as the basis for this project. 40 different pharmacological medical and biological databases such as CTD, DrugBank, and ClinVar are parsed and integrated into Neo4j. Afterward, the information is merged into Hetionet. Different mapping methods were used such as external identification systems or name mapping.
The resulting open-source Neo4j database PharMeBINet has 5,819,147 different nodes with 80 labels and 23,796,799 relationships with 277 edge types. It is a heterogeneous database containing interconnected information on ADRs, diseases, drugs, genes, gene variations, proteins, and more. Relationships between these entities represent, for example, drug-drug interactions or drug-causes-ADR relations. It has much potential for developing further data analyses including machine learning applications. A web application for accessing the database is free to use for everyone and available at https://pharmebi.net. Additionally, the database is deposited on Zenodo at https://doi.org/10.5281/zenodo.5816976.

First, Hetionet (http://het.io) as a starting point was updated to Neo4j database service 5.3.0.
Afterward, the different data sources are parsed and integrated into Hetionet.
In the next step, the different data sources are mapped and merged into the Hetionet structure.
The final PharMeBINet database is then generated by removing the data source specific sub-graphs, leaving only the merged structure.

Included Data Sources:

| Data source | Version | License | URL |
|-----------------|:-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------|
| Adverse Drug Reaction Classification System (ADReCS) | 2023-03-16 | [CC BY-SA 4.0 DEED](https://creativecommons.org/licenses/by-sa/4.0/deed.en) | [Link](http://www.bio-add.org/ADReCS/) |
| Adverse Event Open Learning through Universal Standardization (AEOLUS) | 2017-04-08 | [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/) | [Link](http://datadryad.org/resource/doi:10.5061/dryad.8q0s4) |
| ATC | 2024-01-10 | Use of all or parts of the material requires reference to the WHO Collaborating Centre for Drug Statistics Methodology. Copying and distribution for commercial purposes is not allowed. Changing or manipulating the material is not allowed. | [Link](https://www.whocc.no/atc_ddd_index/) |
| BindingDB | 2024-01-29 |[CC BY-SA 3.0 US Deed ](https://creativecommons.org/licenses/by-sa/3.0/us/deed.en) | [Link](https://www.bindingdb.org/rwd/bind/index.jsp) |
| BioGRID | 4.4.230 (2024-01-25) | MIT License | [Link](https://thebiogrid.org/) |
| ClinVar | 2024-01-19 | https://www.ncbi.nlm.nih.gov/home/about/policies/ | [Link](https://www.ncbi.nlm.nih.gov/clinvar/) |
| Comparative Toxicogenomics Database (CTD) | 2024-01-31 | © 2002–2012 MDI Biological Laboratory. All rights reserved. © 2012-2024 NC State University. All rights reserved. | [Link](http://ctdbase.org) |
| dbSNP | 2022-11-16 | https://www.ncbi.nlm.nih.gov/home/about/policies/ | [Link](https://www.ncbi.nlm.nih.gov/snp/) |
| DDinter | 2020-09-04 | [CC BY-NC-SA 4.0 DEED](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en) | [Link](http://ddinter.scbdd.com/) |
| Disease Ontology (DO) | 2024-01-31 | [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/) | [Link](https://disease-ontology.org) |
| DisGeNET | 2020-06 | [CC BY-NC-SA 4.0 DEED](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en) | [Link](https://www.disgenet.org/) |
| DrugBank | 5.1.11 (2024-01-03) | [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) | [Link](https://go.drugbank.com) |
| DrugCentral | 2023-11-01 | [CC BY-SA 4.0 LEGAL CODE](https://creativecommons.org/licenses/by-sa/4.0/legalcode) | [Link](https://drugcentral.org/) |
| Entrez Gene | 2024-02-04 | https://www.ncbi.nlm.nih.gov/home/about/policies/ | [Link](https://www.ncbi.nlm.nih.gov/gene) |
| Experimental Factor Ontology (EFO) | 3.62.0 (2024-01-15) | Apache-2 | [Link](https://www.ebi.ac.uk/efo/) |
| GenCC | 2024-02-02 | [CC0 1.0 Universell](https://creativecommons.org/publicdomain/zero/1.0/deed.de) | [Link](https://thegencc.org/) |
| Gene Ontology (GO) | 2024-01-17 | [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) | [Link](http://geneontology.org) |
| Hetionet | 1.0 | [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/) | [Link](https://het.io) |
| HUGO Gene Nomenclature Committee (HGNC) | 2024-02-02 | [CC0](https://creativecommons.org/public-domain/cc0/) | [Link](https://www.genenames.org/) |
| Human Integrated Protein–Protein Interaction rEference (HIPPIE) | 2022-04-29 | Free to use for academic purposes | [Link](http://cbdm-01.zdv.uni-mainz.de/~mschaefer/hippie/) |
| Human Metabolome Database (HMDB) | 2021-11-02 | [CC BY-NC-SA 4.0 DEED](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en) | [Link](https://hmdb.ca/) |
| Human Phenotype Ontology (HPO) | 2024-01-16 | This service/product uses the Human Phenotype Ontology (version information). Find out more at http://www.human-phenotype-ontology.org We request that the HPO logo be included as well. | [Link](https://hpo.jax.org) |
| IID | 2021-05 | Free to use for academic purposes | [Link](http://iid.ophid.utoronto.ca) |
| MED-RT | 2024-01-03 | UMLS license, available at https://uts.nlm.nih.gov/license. | [Link](https://evs.nci.nih.gov/ftp1/MED-RT/) |
| miRBase | 22.1 (2018-05-23) | CC0 with attribution | [Link](https://mirbase.org/) |
| MONDO | 2024-01-03 | [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) | [Link](https://github.com/monarch-initiative/mondo) |
| NDF-RT | 2018-02-05 | UMLS license, available at https://uts.nlm.nih.gov/license.html | [Link](https://evs.nci.nih.gov/ftp1/NDF-RT/) |
| OMIM | 2024-02-05 | https://www.omim.org/help/agreement | [Link](https://www.omim.org) |
| Pathway Commons | 12 (2019-10-20) | License of the different sources | [Link](https://www.pathwaycommons.org) |
| PharmGKB | 2024-02-05 | [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) | [Link](https://www.pharmgkb.org) |
| Reactome | 2023-11-30 | [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) | [Link](https://reactome.org) |
| RefSeq | 2023-10-11 | https://www.ncbi.nlm.nih.gov/home/about/policies/ | [Link](https://www.ncbi.nlm.nih.gov/refseq/) |
| RNAdisease | 2022-07-03|Provide data for non-commercial use, distribution, or reproduction in any medium, only if you properly cite the original work. | [Link](http://www.rnadisease.org/) |
| RNAinter | 2021-10-12| Provide data for non-commercial use, distribution, or reproduction in any medium, only if you properly cite the original work. | [Link](http://www.rnainter.org/) |
| Side Effect Resource (SIDER) | 4.1 | [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) | [Link](http://sideeffects.embl.de) |
| Small Molecule Pathway Database (SMPDB) | 2018-09-14 | SMPDB is offered to the public as a freely available resource. Use and re-distribution of the data, in whole or in part, for commercial purposes requires explicit permission of the authors and explicit acknowledgment of the source material (SMPDB) and the original publication (see below). We ask that users who download significant portions of the database cite the SMPDB paper in any resulting publications. | [Link](https://www.smpdb.ca/) |
| Therapeutic target database (TTD) | 2024-01-10| no license | [Link](https://db.idrblab.net/ttd/) |
| Uberon | 2024-01-18 | [Attribution 3.0 Unported (CC BY 3.0)](https://creativecommons.org/licenses/by/3.0/) | [Link](http://obophenotype.github.io/uberon/) |
UniProt | 2024-1 (2024-01-24) | [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) | [Link](https://www.uniprot.org) |
| WikiPathway | 2024-01-11 | [CC BY 3.0](https://creativecommons.org/licenses/by/3.0/) | [Link](https://www.wikipathways.org) |

Data Sources used for mapping:

| Data source | Version |License | URL |
|-----------------|-------------|------------|--------|
| FDA UNII | 2023-01-27 |Except as otherwise noted, data is provided as Public Domain. | [Link](https://precision.fda.gov/uniisearch) |
| IUPHAR | 2023-11-29 |[CC BY-SA 4.0 Deed](https://creativecommons.org/licenses/by-sa/4.0/) | [Link](https://www.guidetopharmacology.org/) |
| PubChem | 2024-01 |Therefore, NCBI itself places no restrictions on the use or distribution of the data contained therein. However, some submitters of the original data may claim patent, copyright, or other intellectual property rights in all or a portion of the data they have submitted. NCBI is not in a position to assess the validity of such claims and, therefore, cannot provide comment or unrestricted permission concerning the use, copying, or distribution of the information contained in the molecular databases. | [Link](https://pubchem.ncbi.nlm.nih.gov/) |
| RxNorm | 2023-11 |UMLS license, available at https://uts.nlm.nih.gov/license.html | [Link](https://www.nlm.nih.gov/research/umls/rxnorm/index.html) |
| STITCH | 2020-02-07 |STITCH is available for licensing - both for commercial and for academic institutions. [CC BY 4.0 Deed] (https://creativecommons.org/licenses/by/4.0/) for the 3 used files | [Link](http://stitch.embl.de/) |
| UMLS | 2023-11 |UMLS license, available at https://uts.nlm.nih.gov/license.html | [Link](https://www.nlm.nih.gov/research/umls/index.html) |

The shell script does the integration into neo4j and the mapping and merging to Hetionet.

```bash
./script_to_execute_all.sh /mnt/aba90170-e6a0-4d07-929e-1200a6bfc6e1/databases/neo4j/neo4j-community-4.2.5/bin /home/cassandra/Documents/Project/master_database_change/ > output.txt 2>&1 &

./script_to_execute_all.sh /mnt/aba90170-e6a0-4d07-929e-1200a6bfc6e1/databases/neo4j/neo4j-community-4.2.13/bin /home/cassandra/Documents/Project/master_database_change/ > output.txt 2>&1 &

./script_to_execute_all.sh /mnt/aba90170-e6a0-4d07-929e-1200a6bfc6e1/databases/neo4j/neo4j-community-5.3.0/bin /home/cassandra/Documents/Project/master_database_change/ > output.txt 2>&1 &

./script_to_execute_all.sh /mnt/aba90170-e6a0-4d07-929e-1200a6bfc6e1/databases/neo4j/neo4j-community-5.15.0/bin /home/cassandra/Documents/Project/master_database_change/ > output.txt 2>&1 &

./script_to_execute_all.sh {path to neo4j bin} {globaler path to project}
```

## Manual stepps which are needed

It is executed on ubuntu.

First, download a Neo4j Service version 5.
The restart_neo4j.sh needs to be added to the Neo4j service bin. In the script, the path needs to be changed to the path to the conf of the neo4j service.
Download a version of BioDWH2 https://github.com/BioDWH2/BioDWH2 and put it into import_into_Neo4j and change in the integration_shell.sh the BioDWH2 version to the correct one. Download a version of Neo4j-GraphML-Importer https://github.com/BioDWH2/Neo4j-GraphML-Importer put it into this directory and change in the script_to_execute_all.sh that Neo4j-GraphML-Importer version to the correct one.
In script_to_execute_all.sh the password needs to be changed to your Neo4j password and change the path path_to_other_place_of_data to a path where a lot of memory exists.
RxNorm needs to be downloaded from https://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html and with the script the data should be imported into a MySQL database. Next, the UMLS Full release needs to be downloaded from https://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html and also imported with the script.
In import_into_Neo4j/readme, you see the databases that need manual steps. The accurate description is in the readmes of the different databases.
Also, in all readmes of the different databases, it explains if this is automatically updated or if you need to do some manual steps.
In the mapping_and_merging_into_hetionet/readme, it explains which databases need manual changes for the mapping like using external resources.
In create_connection_to_databases.py change the MySQL user name, MySQL password, MySQL db names, neo4j user, neo4j address, and neo4j password.

## Citing this work
If you find this resource useful, please do remember to cite:

```bib
@article{konigs2022heterogeneous,
title={The heterogeneous pharmacological medical biochemical network PharMeBINet},
author={K{\"o}nigs, Cassandra and Friedrichs, Marcel and Dietrich, Theresa},
journal={Scientific Data},
volume={9},
number={1},
pages={1--14},
year={2022},
publisher={Nature Publishing Group}
}
```

ALternatively, using plain text, you can use:

Königs C, Friedrichs M, Dietrich T. [The heterogeneous pharmacological medical biochemical network PharMeBINet](https://www.nature.com/articles/s41597-022-01510-3). Scientific Data. 2022;9(1): 393.