https://github.com/benedekrozemberczki/tigerlily

TigerLily: Finding drug interactions in silico with the Graph.
https://github.com/benedekrozemberczki/tigerlily

biology ddi deep-learning drug-drug-interaction embedding gradient-boosting graph graph-database graph-embedding graph-machine-learning heterogeneous-graph knowledge-graph machine-learning network-science node node-embedding pharmaceuticals tigergraph unsupervised-learning

Last synced: over 1 year ago
JSON representation

TigerLily: Finding drug interactions in silico with the Graph.

Host: GitHub
URL: https://github.com/benedekrozemberczki/tigerlily
Owner: benedekrozemberczki
License: apache-2.0
Created: 2022-02-28T21:48:19.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2022-12-17T16:11:50.000Z (over 3 years ago)
Last Synced: 2025-03-25T11:21:35.020Z (over 1 year ago)
Topics: biology, ddi, deep-learning, drug-drug-interaction, embedding, gradient-boosting, graph, graph-database, graph-embedding, graph-machine-learning, heterogeneous-graph, knowledge-graph, machine-learning, network-science, node, node-embedding, pharmaceuticals, tigergraph, unsupervised-learning
Language: Jupyter Notebook
Homepage:
Size: 14.3 MB
Stars: 99
Watchers: 2
Forks: 9
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

README

          [pypi-image]: https://badge.fury.io/py/tigerlily.svg

[pypi-url]: https://pypi.python.org/pypi/tigerlily

[size-image]: https://img.shields.io/github/repo-size/benedekrozemberczki/tigerlily.svg

[size-url]: https://github.com/benedekrozemberczki/tigerlily/archive/main.zip

[build-image]: https://github.com/benedekrozemberczki/tigerlily/workflows/CI/badge.svg

[build-url]: https://github.com/benedekrozemberczki/tigerlily/actions?query=workflow%3ACI

[docs-image]: https://readthedocs.org/projects/tigerlily/badge/?version=latest

[docs-url]: https://tigerlily.readthedocs.io/en/latest/?badge=latest

[coverage-image]: https://codecov.io/gh/benedekrozemberczki/tigerlily/branch/main/graph/badge.svg?token=30XLVBUIEH

[coverage-url]: https://codecov.io/github/benedekrozemberczki/tigerlily?branch=main

[![PyPI Version][pypi-image]][pypi-url]

[![Docs Status][docs-image]][docs-url]

[![Code Coverage][coverage-image]][coverage-url]

[![Build Status][build-image]][build-url]

[![Arxiv](https://img.shields.io/badge/ArXiv-2204.08206-orange.svg)](https://arxiv.org/abs/2204.08206)



  



----------------------------------------------------------------------

### **Drug Interaction Prediction with Tigerlily**

**[Documentation](https://tigerlily.readthedocs.io)** | **[Example Notebook](https://github.com/benedekrozemberczki/tigerlily/blob/main/example_notebook.ipynb)** |  **[Youtube Video](https://www.youtube.com/watch?v=fEWcor96tt8)** |  **[Project Report](http://arxiv.org/abs/2204.08206)** 

**Tigerlily** is a [TigerGraph](https://www.tigergraph.com/) based system designed to solve the [drug interaction prediction task](https://arxiv.org/abs/2111.02916). In this machine learning task, we want to predict whether two drugs have an adverse interaction. Our framework allows us to solve this **[highly relevant real-world problem](https://www.newscientist.com/article/2143486-side-effects-kill-thousands-but-our-data-on-them-is-flawed/)** using graph mining techniques in these steps: 

- **(a)** Using [PyTigergraph](https://github.com/pyTigerGraph/pyTigerGraph) we create a heterogeneous biological graph of drugs and proteins.

- **(b)** We calculate the [personalized PageRank](https://github.com/tigergraph/gsql-graph-algorithms/blob/master/algorithms/Centrality/pagerank/personalized/multi_source/tg_pagerank_pers.gsql) scores of drug nodes in the [TigerGraph Cloud](https://tgcloud.io/).

- **(c)** We embed the nodes using [sparse non-negative matrix factorization](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html) of the personalized PageRank matrix.

- **(d)** Using the node embeddings we train a [gradient boosting](https://lightgbm.readthedocs.io/en/latest/) based drug interaction predictor.

--------------------------------------------------------------------------------

### (A) **Creating and populating a Graph**



  



As a first step, the basic **TigerLily** tools are imported, and we load the example dataset that integrated DrugBankDDI and the BioSNAP datasets. We create a ``PersonalizedPageRankMachine`` and connect to the host with the ``Graph``. The settings of this machine should be the appropriate user credentials and details; a secret is obtained in the **TigerGraph Graph Studio**. We install the default Personalized PageRank query and upload the edges of the example dataset used in our demonstrations. This graph has **drug** and **protein** nodes, **drug-protein** and **protein-protein** interactions. Our goal is to predict the **drug-drug** interactions.

```python

from tigerlily.dataset import ExampleDataset

from tigerlily.embedding import EmbeddingMachine

from tigerlily.operator import hadamard_operator

from tigerlily.pagerank import PersonalizedPageRankMachine

dataset = ExampleDataset()

edges = dataset.read_edges()

target = dataset.read_target()

machine = PersonalizedPageRankMachine(host="host_name",

                                      graphname="graph_name",

                                      username="username_value",

                                      secret="secret_value",

                                      password="password_value")

                           

machine.connect()

machine.install_query()

machine.upload_graph(new_graph=True, edges=edges)

```

### (B) **Computing the Approximate Personalized PageRank vectors**



  



We are only interested in describing the neighbourhood of drug nodes in the biological graph. Because of this, we only retrieve the neighbourhood of the drugs - for each drug we retrieve those nodes (top-k closest neighbors) which are the closest based on the Personalized PageRank scores. We are going to learn the drug embeddings based on these scores.  

```python

drug_node_ids = machine.connection.getVertices("drug")

pagerank_scores = machine.get_personalized_pagerank(drug_node_ids)

```

### (C) Learning the Drug Embeddings and Drug Pair Feature Generation



  



We create an embedding machine that creates drug node representations. The embedding machine instance has a random seed, a dimensions hyperparameter (this sets the number of factors), and a maximal iteration count for the factorization. An embedding is learned from the Personalized PageRank scores and using the drug features we create drug pair features with the operator function.

```python

embedding_machine = EmbeddingMachine(seed=42,

                                     dimensions=32,

                                     max_iter=100)

embedding = embedding_machine.fit(pagerank_scores)

drug_pair_features = embedding_machine.create_features(target, hadamard_operator)

```

### (D) Predicting Drug Interactions and Inference



  



We load a gradient boosting-based classifier, an evaluation metric for binary classification, and a function to create train-test splits. We create a train and test portion of the drug pairs using 80% of the pairs for training. A gradient boosted tree model is trained, score the model on the test set. We compute an AUROC score on the test portion of the dataset and print it out.

```python

from lightgbm import LGBMClassifier

from sklearn.metrics import roc_auc_score

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(drug_pair_features,

                                                    target,

                                                    train_size=0.8,

                                                    random_state=42)

model = LGBMClassifier(learning_rate=0.01,

                       n_estimators=100)

model.fit(X_train,y_train["label"])

predicted_label = model.predict_proba(X_test)

auroc_score_value = roc_auc_score(y_test["label"], predicted_label[:,1])

print(f'AUROC score: {auroc_score_value :.4f}')

```

Head over to the [documentation](https://tigerlily.readthedocs.io) to find out more about installation and a full API reference.

For a quick start, check out the [example notebook](https://github.com/benedekrozemberczki/tigerlily/blob/main/example_notebook.ipynb). If you notice anything unexpected, please open an [issue](github.com/benedekrozemberczki/tigerlily/issues).

--------------------------------------------------------------------------------

**Citing**

If you find *Tigerlily* useful in your research, please consider adding the following citation:

```bibtex

@misc{tigerlily2022,

  author = {Benedek Rozemberczki},

  title = {TigerLily: Finding drug interactions in silico with the Graph},

  year = {2022},

  publisher = {GitHub},

  journal = {GitHub repository},

  howpublished = {\url{https://github.com/benedekrozemberczki/tigerlily}},

}

```

--------------------------------------------------------------------------------

**Installation**

To install tigerlily, simply run:

```sh

pip install tigerlily

```

**Running tests**

Running tests requires that you run:

```

$ tox -e py

```

--------------------------------------------------------------------------------

**License**

- [Apache 2.0 License](https://github.com/benedekrozemberczki/tigerlily/blob/main/LICENSE)

--------------------------------------------------------------------------------

**Credit**

The **TigerLily** logo and the high level machine learning workflow image are based on:

- [Kubos Origami Font](https://www.fontspace.com/kubos-origami-font-f29538)

- [Noun Project Icons](https://thenounproject.com/)

Benedek Rozemberczki has a yearly subscription to the Noun Project that allows the customization and commercial use of the icons.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/benedekrozemberczki/tigerlily

Awesome Lists containing this project

README