https://github.com/ayushnoori/metapaths
Similarity search in heterogeneous knowledge graphs using meta paths.
https://github.com/ayushnoori/metapaths
knowledge-graph metapath network r
Last synced: 8 months ago
JSON representation
Similarity search in heterogeneous knowledge graphs using meta paths.
- Host: GitHub
- URL: https://github.com/ayushnoori/metapaths
- Owner: ayushnoori
- License: mpl-2.0
- Created: 2021-07-01T15:57:51.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2023-03-11T18:14:10.000Z (over 3 years ago)
- Last Synced: 2024-12-22T15:40:01.538Z (over 1 year ago)
- Topics: knowledge-graph, metapath, network, r
- Language: R
- Homepage: https://www.ayushnoori.com/metapaths
- Size: 19.3 MB
- Stars: 24
- Watchers: 4
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# `metapaths`: Meta-Path-Based Network Similarity Search in KGs 
[](https://www.ayushnoori.com/metapaths)
[](https://zitniklab.hms.harvard.edu/projects/metapaths/)
[](https://opensource.org/licenses/MIT)
[](https://zenodo.org/badge/latestdoi/382086164)
Large-scale heterogeneous biomedical knowledge graphs (KGs) use graph structures to represent and study multi-typed relational information in biological systems. Network relationships in a KG can be quantified by similarity search methods; however, such methods must consider the diversity of node types contained within that KG. To distinguish between node types, we leverage meta paths, a general graph-theoretic approach for flexible similarity search in large networks. Meta paths are defined as sequences of node types which define a walk from the origin node to the destination node, and are widely used in biomedical network analysis.
To support meta paths in R, we present `metapaths`, the first R software package to perform meta path-based similarity search in heterogeneous KGs. The `metapaths` package offers various in-built similarity metrics for node pair comparison by querying KGs represented as either edge or adjacency lists, as well as auxiliary aggregation methods to measure set-level relationships. This framework facilitates the scalable and flexible modeling of network similarities in KGs with applications across biomedical KG learning.
## Installation
`metapaths` is designed for the [R](https://www.r-project.org/) programming language and statistical computing environment. To install the latest version of this package, please run the following line in your R console:
```{r}
devtools::install_github("ayushnoori/metapaths")
```
## Custom Similarity Metrics
In addition to the in-built similarity metrics, users may also define their own custom metrics. To define a custom similarity metric, please complete the following steps:
1. Add a new function to [`similarity-metrics.R`](https://github.com/ayushnoori/metapaths/blob/master/R/similarity-metrics.R) with the `get_()` nomenclature.
2. Edit the `get_similarity_function()` function to add your metric to the list of allowed similarity metrics.
3. Submit a pull request for approval.
## Custom Aggregation Methods
Akin to custom similarity metrics, users may also define custom aggregation methods for set-level comparison. To define a custom aggregation method, please complete the following steps:
1. Add a new function to [`aggregation-methods.R`](https://github.com/ayushnoori/metapaths/blob/master/R/aggregation-methods.R) with the `get_()` nomenclature.
2. Edit the `get_aggregation_function()` function to add your metric to the list of allowed aggregation methods.
3. Submit a pull request for approval.
## Evaluation on a Biomedical KG
Evaluation of the `metapaths` package on `ogbl-biokg`, an open-source biomedical KG available from the [Open Graph Benchmark](https://ogb.stanford.edu), recovered meaningful drug and disease-associated relationships as quantified by high similarity scores. For example, the meta path traversal function identified three paths following the specified meta path that connect donepezil – a drug used to treat Alzheimer’s disease (AD) – with the regulation of amyloid fibril formation pathway, which is implicated in AD.

Additional usage examples are available in the [`ogbl-biokg` vignette](https://www.ayushnoori.com/metapaths/articles/biokg-demo.html).
## Documentation
* The `metapaths` R package is freely available under MPL 2.0 [via GitHub](https://github.com/ayushnoori/metapaths).
* Package documentation and usage examples are available [here](https://www.ayushnoori.com/metapaths).
* For more information, please visit the `metapaths` [project website](https://zitniklab.hms.harvard.edu/projects/metapaths/).
## Citation
If you find `metapaths` useful, please cite our forthcoming paper:
```
@article{noori2022metapaths,
title={metapaths: similarity search in heterogeneous knowledge graphs via meta paths},
author={Noori, Ayush and Tan, Amelia L.M. and Li, Michelle M. and Zitnik, Marinka},
journal={arXiv: 2209.0000},
volume={},
number={},
pages={},
year={2022},
publisher={}
}
```
## Contact
Should any questions arise, please [open a GitHub issue](https://github.com/ayushnoori/metapaths/issues/new) or contact [anoori@college.harvard.edu](anoori@college.harvard.edu).