{"id":13678462,"url":"https://github.com/krishnanlab/node2vecplus_benchmarks","last_synced_at":"2026-01-16T10:14:51.704Z","repository":{"id":39998129,"uuid":"402636279","full_name":"krishnanlab/node2vecplus_benchmarks","owner":"krishnanlab","description":null,"archived":false,"fork":false,"pushed_at":"2023-01-26T15:35:03.000Z","size":78140,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-06-17T19:01:13.946Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/krishnanlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-09-03T03:34:27.000Z","updated_at":"2024-04-15T09:55:58.000Z","dependencies_parsed_at":"2023-02-14T19:00:46.790Z","dependency_job_id":null,"html_url":"https://github.com/krishnanlab/node2vecplus_benchmarks","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krishnanlab%2Fnode2vecplus_benchmarks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krishnanlab%2Fnode2vecplus_benchmarks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krishnanlab%2Fnode2vecplus_benchmarks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krishnanlab%2Fnode2vecplus_benchmarks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/krishnanlab","download_url":"https://codeload.github.com/krishnanlab/node2vecplus_benchmarks/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":213678304,"owners_count":15622491,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T13:00:53.841Z","updated_at":"2026-01-16T10:14:51.693Z","avatar_url":"https://github.com/krishnanlab.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Node2vec+ Benchmarks [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7573612.svg)](https://doi.org/10.5281/zenodo.7573612)\n\nThis repository contains data and scripts for reproducing evaluation results presented in \n[*Accurately modeling biased random walks on weighted networks using node2vec+*](https://www.biorxiv.org/content/early/2022/08/15/2022.08.14.503926).\nNode2vec+ is implemented as an extension to [PecanPy](https://github.com/krishnanlab/PecanPy), \na fast and memory efficient implementation of [node2vec](https://snap.stanford.edu/node2vec/). \n\n## Overview\n\nFollow the scripts below to execute full evaluation provaided in this repository. \nFor more details, check out the sections below. \n\n* [Set up conda environment](#setting-up-environment)\n* [Set up gene interaction network data](#download)\n* [Evaluate](#evaluation)\n\n***PROCEED WITH CAUTION: the full evaluation consumes significant amount of space and computational resources (via [SLURM](https://slurm.schedmd.com/overview.html))***\n\n```bash\n# Set up conda environment\nsource config.sh setup\n\n# Download and set up gene interaction network data\nsource config.sh download_ppis\n\n# Submit all evaluation jobs\nsh submit_all.sh\n```\n\nAfter all evaluation jobs are finished successfully, open the jupyter notebooks in [`plot/`](plot) and generate evaluation plots.\n\n## Setting up environment\n\nWe provide a simple script to set up the [conda](https://conda.io/projects/conda/en/latest/index.html) environemnt `node2vecplus-bench`:\n\n```bash\nsource config.sh setup\n```\n\nTo remove the environment, simply run\n\n```bash\nsource config.sh cleanup\n```\n\n### Set up manually\n\nAlternatively, user can set up the environment manually instead of using the `config.sh` script.\nAdditionally all the required dependencies can be found in [`requirements.txt`](requirements.txt).\n\n* **Step1.** Set up node2vecpluc-bench conda environment with Python 3.8\n\n    ```bash\n    conda create -n node2vecplus-bench python=3.8 \u0026\u0026 conda activate node2vecplus-bench\n    ```\n\n* **Step2.** Set up [PyTorch](https://pytorch.org) related packages with CUDA 10.2 (checkout the PyTorch website for other CUDA/CPU installation options)\n\n    ```bash\n    conda install pytorch=1.9 torchvision cudatoolkit=10.2 -c pytorch -y\n    pip install torch-geometric==2.0.0 torch-scatter torch-sparse torch-cluster -f https://data.pyg.org/whl/torch-1.9.0+cu102.html\n    ```\n\n* **Step3.** Install rest of the depencies for reproducing experiemnts\n\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n## Data\n\n* Hierarchical cluster graphs\n* Standard benchmarking datasets\n    * `BlogCatalog`\n    * `Wikipedia`\n* Human gene interaction networks (*need to download, see below*)\n    * `STRING`\n    * `HumanBase*`\n    * `GTExCoExp*`\n\nThe hierarchical cluster graphs are constructed by taking RBF of point coulds generated in the Euclidean space, \nand hence each graph natually exhibits a hierarchical community structure (more info in the supplementary materials of the paper). \nEach network is assocaited with two tasks, cluster classification and level classification.\n\nThe BlogCatalog and Wikipedia networks, along with the associated node labels, are obtained from [SNAP-node2vec](https://snap.stanford.edu/node2vec/). \nThe networks are processed by removing isolated nodes and converting to edge list tsv files.\n\n### Gene interaction networks [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7007164.svg)](https://doi.org/10.5281/zenodo.7007164)\n\n```bash\nsource config.sh download_ppis\n```\n\n#### Download\n\nUnder the root directory of the repository, download gene interaction networks from Zenodo\n\n```bash\ncurl -o node2vecplus_bench_ppis.tar.gz https://zenodo.org/record/7007164/files/node2vecplus_bench_ppis.tar.gz\n```\n\n(Recommended) Although Zenodo provide a nice feature for versioning datasets with DOI, downloading could be a bit slow.\nThus, we provide an alternative download option from Dropbox.\nThe file should be in sync with the latest dataset version on Zenodo.\n\n```bash\ncurl -L -o node2vecplus_bench_ppis.tar.gz https://www.dropbox.com/s/aettebq5lbgu1cu/node2vecplus_bench_ppis-v1.0.0.tar.gz?dl=1\n```\n\n#### Extract\n\nAfter the zipped tar ball is downloaded, extract and place them under `data/networks` by\n\n```bash\ntar -xzvf node2vecplus_bench_ppis.tar.gz --transform 's/node2vecplus_bench_ppis/ppi/' --directory data/networks\n```\n\n## Evaluation\n\nThis repository contains the following scripts for reproducing the evaluation results\n\n* [`eval_hcluster.py`](script/eval_hcluster.py) - evaluate performnace of node2vec(+) using hierarchical cluster graphs\n* [`eval_realworld_networks.py`](script/eval_realworld_networks.py) - evaluate performance of node2vec(+) using commonly benchmarked real-world datasets BlogCatalog and Wikipedia\n* [`eval_gene_classification_n2v.py`](script/eval_gene_classification_n2v.py) - evalute performance of node2vec(+) for gene classification tasks using gene interaction networks\n* [`eval_gene_classification_gnn.py`](script/eval_gene_classification_gnn.py) - evaluate performance of GNNs for gene classification tasks using gene interaction networks\n\nEach one of the above scripts can be run from command line, e.g.\n\n```bash\ncd script\n\n# example of evaluating K3L2 hierarchical cluster graph using node2vec with q=10\npython evalu_hcluster.py --network K3L2 --q 10 --nooutput\n\n# sample as above but using node2vec+\npython evalu_hcluster.py --netwokr K3L2 --q 10 --nooutput --extend\n\n# check other commandline keyward options \npython eval_hcluster.py --help\n```\n\nIf `--nooutput` is not specified, then the evaluation results are saved to [`result/`](result) as `.csv`.\n\n### Submitting evaluation jobs\n\nAlternatively, one can submit evaluation jobs using\n\n```bash\ncd slurm\n\n# submit all evaluations on hierarchical cluster graphs\nsbatch eval_hcluster_all.sb\n\n# submit all evaluations for BlogCatalog and Wikipedia\nsbatch eval_realworld_networks.sb\n\n# submit all evaluations for gene classifications using node2vec+\nsbatch eval_gene_classification_n2vplus.sb\n\n# submit all evaluations for gene classifications using node2vec\nsbatch eval_gene_classification_n2v.sb\n\n# submit all evaluations for gene classifications using GNNs\nsbatch eval_gene_classification_gnn.sb\n\n# submit all evaluations for tissue-specific gene classifications using node2vec+\nsbatch eval_tissue_gene_classification_n2vplus.sb\n\n# submit all evaluations for tissue-specific gene classifications using node2vec\nsbatch eval_tissue_gene_classification_n2v.sb\n```\n\nOr submitting all evaluations above by simply running\n\n```bash\nsh submit_all.sh\n```\n\nNote: depending on the your preference you can modify the nodes requirement in [`submit_all.sh`](submit_all.sh) for individual jobs script.\n\n#### Tuning GNNs\n\nFirst, tune the architecture of GNN (hidden dimension, number of layers, residual connection)\n\n```bash\ncd gnn_tuning\nsh tune_gnn_architecture.sb\n```\n\nThen, fix the best architecture and tune the rest of the training parameters (learning rate, dropout rate, weight decay)\n\n```bash\ncd gnn_tuning\nsh tune_gnn_params.sb\n```\n\nTo aggregate the gnn tuning results, use [`aggregate_tuning_results.py`](gnn_tuning/aggregate_tuning_results.py):\n\n```bash\npython gnn_tuning/aggregate_tuning_results.py\n```\n\nFinally, use the [GNN tuning notebook](plot/tune_gnn.ipynb) to analyze the results and find the optimal GNN configurations.\n\n## Dev notes\n\nExample test commands\n\n```bash\npython eval_gene_classification_n2v.py --gene_universe HBGTX --network HumanBaseTop-global --p 1 --q 1 --nooutput --test\n```\n\n### Setting up gene interaction network (from scratch)\n\n* [STRING](https://doi.org/10.5281/zenodo.3352323)\n* [HumanBase](script/get_humanbase/README.md)\n* [GTExCoExp](script/get_gtexcoexp/README.md)\n\n### Generating labeled data for gene classification\n\nInstall additional dev dependencies\n\n```bash\npip install -r requirements-dev.txt\n```\n\nOnce the network data are set up and placed under ``data/networks/ppi``, run\n\n```bash\nprocess_labels.py\n```\n\n### Update gene interaction network data on Zenodo\n\n1. Make new dataset version on zenodo and upload corresponding file\n1. Upload file to dropbox for alternative download option\n1. Update README (Zenodo DOI, Zenodo link, Dropbox link)\n1. Update ``config.sh`` Dropbox link\n\n## Cite us\n\nIf you find this work useful, please consider citing our paper:\n\n```bibtex\n@article {liu2022node2vecplus,\n\ttitle = {Accurately modeling biased random walks on weighted networks using node2vec+},\n\tauthor = {Liu, Renming and Hirn, Matthew and Krishnan, Arjun},\n\tyear = {2022},\n\tdoi = {10.1101/2022.08.14.503926},\n\tpublisher = {Cold Spring Harbor Laboratory},\n\tjournal = {bioRxiv}\n\tURL = {https://www.biorxiv.org/content/early/2022/08/15/2022.08.14.503926},\n\teprint = {https://www.biorxiv.org/content/early/2022/08/15/2022.08.14.503926.full.pdf},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrishnanlab%2Fnode2vecplus_benchmarks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkrishnanlab%2Fnode2vecplus_benchmarks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrishnanlab%2Fnode2vecplus_benchmarks/lists"}