Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ckoenigs/PharMeBINet

Build PharMeBINet from different sources.
https://github.com/ckoenigs/PharMeBINet

database graph-database neo4j

Last synced: 3 months ago
JSON representation

Build PharMeBINet from different sources.

Awesome Lists containing this project

README

        

# PharMeBINet: The heterogeneous pharmacological medical biochemical network
Heterogeneous biomedical pharmacological databases are important for multiple fields in bioinformatics.
The Hetionet database already covers many different entities. Therefore, it was used as the basis for this project. 40 different pharmacological medical and biological databases such as CTD, DrugBank, and ClinVar are parsed and integrated into Neo4j. Afterward, the information is merged into Hetionet. Different mapping methods were used such as external identification systems or name mapping.
The resulting open-source Neo4j database PharMeBINet has 5,819,147 different nodes with 80 labels and 23,796,799 relationships with 277 edge types. It is a heterogeneous database containing interconnected information on ADRs, diseases, drugs, genes, gene variations, proteins, and more. Relationships between these entities represent, for example, drug-drug interactions or drug-causes-ADR relations. It has much potential for developing further data analyses including machine learning applications. A web application for accessing the database is free to use for everyone and available at https://pharmebi.net. Additionally, the database is deposited on Zenodo at https://doi.org/10.5281/zenodo.5816976.

First, Hetionet (http://het.io) as a starting point was updated to Neo4j database service 5.3.0.
Afterward, the different data sources are parsed and integrated into Hetionet.
In the next step, the different data sources are mapped and merged into the Hetionet structure.
The final PharMeBINet database is then generated by removing the data source specific sub-graphs, leaving only the merged structure.

Included Data Sources:

| Data source | Version | License | URL |
|-----------------|:-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------|
| Adverse Drug Reaction Classification System (ADReCS) | 2023-03-16 | [CC BY-SA 4.0 DEED](https://creativecommons.org/licenses/by-sa/4.0/deed.en) | [Link](http://www.bio-add.org/ADReCS/) |
| Adverse Event Open Learning through Universal Standardization (AEOLUS) | 2017-04-08 | [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/) | [Link](http://datadryad.org/resource/doi:10.5061/dryad.8q0s4) |
| ATC | 2024-01-10 | Use of all or parts of the material requires reference to the WHO Collaborating Centre for Drug Statistics Methodology. Copying and distribution for commercial purposes is not allowed. Changing or manipulating the material is not allowed. | [Link](https://www.whocc.no/atc_ddd_index/) |
| BindingDB | 2024-01-29 |[CC BY-SA 3.0 US Deed ](https://creativecommons.org/licenses/by-sa/3.0/us/deed.en) | [Link](https://www.bindingdb.org/rwd/bind/index.jsp) |
| BioGRID | 4.4.230 (2024-01-25) | MIT License | [Link](https://thebiogrid.org/) |
| ClinVar | 2024-01-19 | https://www.ncbi.nlm.nih.gov/home/about/policies/ | [Link](https://www.ncbi.nlm.nih.gov/clinvar/) |
| Comparative Toxicogenomics Database (CTD) | 2024-01-31 | © 2002–2012 MDI Biological Laboratory. All rights reserved. © 2012-2024 NC State University. All rights reserved. | [Link](http://ctdbase.org) |
| dbSNP | 2022-11-16 | https://www.ncbi.nlm.nih.gov/home/about/policies/ | [Link](https://www.ncbi.nlm.nih.gov/snp/) |
| DDinter | 2020-09-04 | [CC BY-NC-SA 4.0 DEED](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en) | [Link](http://ddinter.scbdd.com/) |
| Disease Ontology (DO) | 2024-01-31 | [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/) | [Link](https://disease-ontology.org) |
| DisGeNET | 2020-06 | [CC BY-NC-SA 4.0 DEED](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en) | [Link](https://www.disgenet.org/) |
| DrugBank | 5.1.11 (2024-01-03) | [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) | [Link](https://go.drugbank.com) |
| DrugCentral | 2023-11-01 | [CC BY-SA 4.0 LEGAL CODE](https://creativecommons.org/licenses/by-sa/4.0/legalcode) | [Link](https://drugcentral.org/) |
| Entrez Gene | 2024-02-04 | https://www.ncbi.nlm.nih.gov/home/about/policies/ | [Link](https://www.ncbi.nlm.nih.gov/gene) |
| Experimental Factor Ontology (EFO) | 3.62.0 (2024-01-15) | Apache-2 | [Link](https://www.ebi.ac.uk/efo/) |
| GenCC | 2024-02-02 | [CC0 1.0 Universell](https://creativecommons.org/publicdomain/zero/1.0/deed.de) | [Link](https://thegencc.org/) |
| Gene Ontology (GO) | 2024-01-17 | [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) | [Link](http://geneontology.org) |
| Hetionet | 1.0 | [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/) | [Link](https://het.io) |
| HUGO Gene Nomenclature Committee (HGNC) | 2024-02-02 | [CC0](https://creativecommons.org/public-domain/cc0/) | [Link](https://www.genenames.org/) |
| Human Integrated Protein–Protein Interaction rEference (HIPPIE) | 2022-04-29 | Free to use for academic purposes | [Link](http://cbdm-01.zdv.uni-mainz.de/~mschaefer/hippie/) |
| Human Metabolome Database (HMDB) | 2021-11-02 | [CC BY-NC-SA 4.0 DEED](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en) | [Link](https://hmdb.ca/) |
| Human Phenotype Ontology (HPO) | 2024-01-16 | This service/product uses the Human Phenotype Ontology (version information). Find out more at http://www.human-phenotype-ontology.org We request that the HPO logo be included as well. | [Link](https://hpo.jax.org) |
| IID | 2021-05 | Free to use for academic purposes | [Link](http://iid.ophid.utoronto.ca) |
| MED-RT | 2024-01-03 | UMLS license, available at https://uts.nlm.nih.gov/license. | [Link](https://evs.nci.nih.gov/ftp1/MED-RT/) |
| miRBase | 22.1 (2018-05-23) | CC0 with attribution | [Link](https://mirbase.org/) |
| MONDO | 2024-01-03 | [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) | [Link](https://github.com/monarch-initiative/mondo) |
| NDF-RT | 2018-02-05 | UMLS license, available at https://uts.nlm.nih.gov/license.html | [Link](https://evs.nci.nih.gov/ftp1/NDF-RT/) |
| OMIM | 2024-02-05 | https://www.omim.org/help/agreement | [Link](https://www.omim.org) |
| Pathway Commons | 12 (2019-10-20) | License of the different sources | [Link](https://www.pathwaycommons.org) |
| PharmGKB | 2024-02-05 | [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) | [Link](https://www.pharmgkb.org) |
| Reactome | 2023-11-30 | [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) | [Link](https://reactome.org) |
| RefSeq | 2023-10-11 | https://www.ncbi.nlm.nih.gov/home/about/policies/ | [Link](https://www.ncbi.nlm.nih.gov/refseq/) |
| RNAdisease | 2022-07-03|Provide data for non-commercial use, distribution, or reproduction in any medium, only if you properly cite the original work. | [Link](http://www.rnadisease.org/) |
| RNAinter | 2021-10-12| Provide data for non-commercial use, distribution, or reproduction in any medium, only if you properly cite the original work. | [Link](http://www.rnainter.org/) |
| Side Effect Resource (SIDER) | 4.1 | [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) | [Link](http://sideeffects.embl.de) |
| Small Molecule Pathway Database (SMPDB) | 2018-09-14 | SMPDB is offered to the public as a freely available resource. Use and re-distribution of the data, in whole or in part, for commercial purposes requires explicit permission of the authors and explicit acknowledgment of the source material (SMPDB) and the original publication (see below). We ask that users who download significant portions of the database cite the SMPDB paper in any resulting publications. | [Link](https://www.smpdb.ca/) |
| Therapeutic target database (TTD) | 2024-01-10| no license | [Link](https://db.idrblab.net/ttd/) |
| Uberon | 2024-01-18 | [Attribution 3.0 Unported (CC BY 3.0)](https://creativecommons.org/licenses/by/3.0/) | [Link](http://obophenotype.github.io/uberon/) |
UniProt | 2024-1 (2024-01-24) | [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) | [Link](https://www.uniprot.org) |
| WikiPathway | 2024-01-11 | [CC BY 3.0](https://creativecommons.org/licenses/by/3.0/) | [Link](https://www.wikipathways.org) |

Data Sources used for mapping:

| Data source | Version |License | URL |
|-----------------|-------------|------------|--------|
| FDA UNII | 2023-01-27 |Except as otherwise noted, data is provided as Public Domain. | [Link](https://precision.fda.gov/uniisearch) |
| IUPHAR | 2023-11-29 |[CC BY-SA 4.0 Deed](https://creativecommons.org/licenses/by-sa/4.0/) | [Link](https://www.guidetopharmacology.org/) |
| PubChem | 2024-01 |Therefore, NCBI itself places no restrictions on the use or distribution of the data contained therein. However, some submitters of the original data may claim patent, copyright, or other intellectual property rights in all or a portion of the data they have submitted. NCBI is not in a position to assess the validity of such claims and, therefore, cannot provide comment or unrestricted permission concerning the use, copying, or distribution of the information contained in the molecular databases. | [Link](https://pubchem.ncbi.nlm.nih.gov/) |
| RxNorm | 2023-11 |UMLS license, available at https://uts.nlm.nih.gov/license.html | [Link](https://www.nlm.nih.gov/research/umls/rxnorm/index.html) |
| STITCH | 2020-02-07 |STITCH is available for licensing - both for commercial and for academic institutions. [CC BY 4.0 Deed] (https://creativecommons.org/licenses/by/4.0/) for the 3 used files | [Link](http://stitch.embl.de/) |
| UMLS | 2023-11 |UMLS license, available at https://uts.nlm.nih.gov/license.html | [Link](https://www.nlm.nih.gov/research/umls/index.html) |

The shell script does the integration into neo4j and the mapping and merging to Hetionet.

```bash
./script_to_execute_all.sh /mnt/aba90170-e6a0-4d07-929e-1200a6bfc6e1/databases/neo4j/neo4j-community-4.2.5/bin /home/cassandra/Documents/Project/master_database_change/ > output.txt 2>&1 &

./script_to_execute_all.sh /mnt/aba90170-e6a0-4d07-929e-1200a6bfc6e1/databases/neo4j/neo4j-community-4.2.13/bin /home/cassandra/Documents/Project/master_database_change/ > output.txt 2>&1 &

./script_to_execute_all.sh /mnt/aba90170-e6a0-4d07-929e-1200a6bfc6e1/databases/neo4j/neo4j-community-5.3.0/bin /home/cassandra/Documents/Project/master_database_change/ > output.txt 2>&1 &

./script_to_execute_all.sh /mnt/aba90170-e6a0-4d07-929e-1200a6bfc6e1/databases/neo4j/neo4j-community-5.15.0/bin /home/cassandra/Documents/Project/master_database_change/ > output.txt 2>&1 &

```

## Citing this work
If you find this resource useful, please do remember to cite:

```bib
@article{konigs2022heterogeneous,
title={The heterogeneous pharmacological medical biochemical network PharMeBINet},
author={K{\"o}nigs, Cassandra and Friedrichs, Marcel and Dietrich, Theresa},
journal={Scientific Data},
volume={9},
number={1},
pages={1--14},
year={2022},
publisher={Nature Publishing Group}
}
```

ALternatively, using plain text, you can use:

Königs C, Friedrichs M, Dietrich T. [The heterogeneous pharmacological medical biochemical network PharMeBINet](https://www.nature.com/articles/s41597-022-01510-3). Scientific Data. 2022;9(1): 393.