{"id":22899628,"url":"https://github.com/menicgiulia/NetMedPy","last_synced_at":"2025-08-12T04:30:41.219Z","repository":{"id":242582363,"uuid":"800140672","full_name":"menicgiulia/NetMedPy","owner":"menicgiulia","description":null,"archived":false,"fork":false,"pushed_at":"2025-06-19T00:05:36.000Z","size":13294,"stargazers_count":13,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-06-21T06:46:18.265Z","etag":null,"topics":["data-science","information-extraction","network-medicine","network-science","null-models","systems-biology","systems-pharmacology"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/menicgiulia.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-05-13T19:23:18.000Z","updated_at":"2025-06-19T09:26:55.000Z","dependencies_parsed_at":"2024-07-31T05:14:28.193Z","dependency_job_id":"9f5871a5-679a-4751-8350-7134aa639c5e","html_url":"https://github.com/menicgiulia/NetMedPy","commit_stats":null,"previous_names":["menicgiulia/netmedpy"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/menicgiulia/NetMedPy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/menicgiulia%2FNetMedPy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/menicgiulia%2FNetMedPy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/menicgiulia%2FNetMedPy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/menicgiulia%2FNetMedPy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/menicgiulia","download_url":"https://codeload.github.com/menicgiulia/NetMedPy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/menicgiulia%2FNetMedPy/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270002819,"owners_count":24510713,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-12T02:00:09.011Z","response_time":80,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","information-extraction","network-medicine","network-science","null-models","systems-biology","systems-pharmacology"],"created_at":"2024-12-14T01:01:17.276Z","updated_at":"2025-08-12T04:30:41.196Z","avatar_url":"https://github.com/menicgiulia.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# NetMedPy: A Python package for Network Medicine \n[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](LICENSE)\n\n#### Authors: Andrés Aldana, Michael Sebek, Gordana Ispirova, Rodrigo Dorantes-Gilardi, Giulia Menichetti (giulia.menichetti@channing.harvard.edu)\n\n## Introduction\n\nNetwork medicine is a post-genomic discipline that harnesses network science principles to analyze the intricate interactions within biological systems, viewing diseases as localized disruptions in networks of genes, proteins, and other molecular entities \u003csup id=\"a1\"\u003e[1](#f1)\u003c/sup\u003e.\n\nThe structure of the biological network plays an essential role in the system’s ability to efficiently propagate signals and withstand random failures. Consequently, most analyses in Network Medicine focus on quantifying the efficiency of the communication between different regions of the interactome or protein-protein interaction network.\n\nNetMedPy evaluates network localization (statistical analysis of the largest connected component/subgraph or LCC) \u003csup id=\"a2\"\u003e[2](#f2)\u003c/sup\u003e, calculates proximity \u003csup id=\"a3\"\u003e[3](#f3)\u003c/sup\u003e and separation \u003csup id=\"a2\"\u003e[2](#f2)\u003c/sup\u003e between biological entities, and conducts screenings involving a large number of diseases and drug targets. NetMedPy extends the traditional Network Medicine analyses by providing four default network metrics (shortest paths, random walk, biased random walk, communicability) and four null models (perfect degree match, degree logarithmic binning, strength logarithmic binning, uniform). The user is allowed to introduce custom metrics and null models.\n\n\nThe pipeline workflow is depicted in the figure below.\n![Overview Pipeline](https://raw.githubusercontent.com/menicgiulia/NetMedPy/main/images/OverviewPipeline.png)\n\nThis Python implementation uses precomputed distance matrices to optimize calculations. With precalculated distances between every node pair, the code can rapidly compute proximity and separation.\n\n## Getting Started      \n\n### Setting up a work environment\n\nNetMedPy has specific requirements for compatibility and ease of use.\n\n### Python Version\n\nNetMedPy requires Python 3.8 or newer, but it is not compatible with Python 3.12 due to incompatibility with Ray. Ensure your Python version is between 3.8 and 3.11.9 inclusive.\n\n### Required Packages\nThe following Python packages are required to run NetMedPy:\n\n- Python (\u003e= 3.8, \u003c= 3.11.9)\n- numpy\n- pandas\n- ray\n- networkx\n- scipy\n- matplotlib\n- seaborn\n  \n### Installation steps\n\nUsers can install NetMedPy and its dependencies using PIP (recommended). Alternatively, the source code can be downloaded, allowing for manual installation of the required dependencies if more customization is needed.\n    \n#### A) Recommended\n\nWhile not essential, we recommend installing NetMedPy in a dedicated conda environment to ensure all dependencies are properly isolated.\n\n- Ensure you have Conda installed.\n\n- Download the environment.yml and navigate to the directory of your local/remote machine where the file is located.\n\n- Create a new conda environment with the `environment.yml` file:\n\n  ```bash\n  conda env create -f environment.yml\n  ```\n\n- Activate your new conda environment:\n\n  ```bash\n  conda activate netmedpy_environment\n  ```\n  \nNetMedPy is ready for use.\n\n#### B) Installing with PIP \n\nAlternatively, you can install the package with PIP (in an existing conda environment, or no conda environment).\n\n- Ensure the following dependencies are installed before proceeding:\n\n  ```bash\n  pip install networkx seaborn matplotlib numpy pandas ray scipy\n  ```\n\n- Install the package:\n\n  ```bash\n  pip install netmedpy\n  ```\n\n\n#### C) From source code\n\nIf none of the previous options worked, the package can be installed directly from the source code.\n\n#### C.1) Download Source Code\n\n1. Ensure you have Python \u003e= 3.8, \u003c= 3.11.9 installed.\n  \n2. Clone the git project to your local or remote machine. This project contains large files. **MAKE SURE YOU HAVE `git lfs` INSTALLED AND CONFIGURED BEFORE CLONING THE PROJECT\n.**\n\n  ```bash\n  git clone https://github.com/menicgiulia/NetMedPy.git\n  ```\n3. Navigate to the project directory:\n\n  ```bash\n  cd NetMedPy-main\n  ```\n\n   \n\n#### C.2) Install neccesary dependencies\n#### Option A: Dedicated Conda Environment\n\nWorking with Conda is recommended, but it is not essential. If you choose to work with Conda, these are the steps you need to take:\n\n1. Ensure you have Conda installed.\n\n2. Create a new conda environment with the `environment.yml` file:\n  \n  ```bash\n  conda env create -f environment.yml\n  ```\n\n3. Activate your new conda environment:\n\n  ```bash\n  conda activate netmedpy_environment\n  ```\n  \n##### Option B: working without Conda\n\n1. Ensure the following dependencies are installed before proceeding:\n\n  ```bash\n  pip install networkx seaborn matplotlib numpy pandas ray scipy\n  ```\n\n#### C.3) Configure Python Path \n\n5. Set up your PYTHONPATH (Replace `/user_path_to/NetMedPy-main/netmedpy` with the appropriate path of the package in your local/remote machine.):\n\n    _On Linux/Mac_:\n\n    ```bash\n    export PYTHONPATH=\"/user_path_to/NetMedPy-main/netmedpy\":$PYTHONPATH\n    ```\n      \n    _On Windows shell_:\n\n    ```bash\n    set PYTHONPATH=\"C:\\\\user_path_to\\\\NetMedPy-main\\\\netmedpy\";%PYTHONPATH%\n    ```\n      \n    _On Powershell_:\n\n    ```bash\n    $env:PYTHONPATH = \"C:\\\\user_path_to\\\\NetMedPy-main\\\\netmedpy;\" + $env:PYTHONPATH\n    ```\n    \n\n### Verifying the installation\n  \n1. Download the directory `examples` in this repository.\n   \n2. Navigate to the directory `examples` in your local machine:\n\n  ```bash\n  cd /user_path_to/examples\n  ```\n      \n3. Run the `Basic_example.py` script using Python 3 or higher:\n\n  ```bash\n  python Basic_example.py\n  ```\n\nDetails about each function (what it is used for, what the input parameters are, the possible values of the input parameters, what the output is) from the pipeline are available in `doc/build/html/NetMedPy.html` and in the `netmedpy/NetMedPy.py` script in the comments before each function. \n\n## Examples\n\nTo run the examples, make sure to clone the git project to your local machine. This project contains large files. **MAKE SURE YOU HAVE `git lfs` INSTALLED AND CONFIGURED BEFORE CLONING THE PROJECT\n.**\n\n  ```bash\n  git clone https://github.com/menicgiulia/NetMedPy.git\n  ```\n\nThen open the examples directory:\n  ```bash\n  cd path_to_git_project/examples\n  ```\n\n### Use Case 1: Exploring Vitamin D’s Impact on Autoimmune, Cardiovascular, and Cancer Diseases\n\nThis example evaluates the role of Vitamin D in the modulation of autoimmune diseases, \ncardiovascular diseases and cancer from a network medicine perspective and reproduces the results presented in the paper \n\"NetMedPy: A Python package for Large-Scale Network Medicine Screening\"\n\nThe scripts for this example are located in the `examples/VitaminD` directory. There are two files that you can use for testing:\n\n- A Python script: `VitD_pipeline.py`\n- A Jupyter notebook: `VitD_pipeline.ipynb`\n\nConsult these files for specifications on required packages before running.\n\n### Instructions on testing the Vitamin D example\n\n1. Download the `examples` directory:\nIf you haven't already done so, download this directory from the repository to your local machine. \n\n2. Prepare the Data:\nThe neccesary data to run this example are located in the subdirectory `VitaminD/data`. The output files will be stored in the subdirectory `VitaminD/output`.\n\n3. Navigate to the `VitaminD` directory:\n \n  ```bash\n  cd /user_path_to/examples/VitaminD\n  ```\n     \n4. Run the Example:\n\nYou can run this example through a Python script or a Jupyter Notebook.\n\n##### Option A: using the Python Script\n\n  ```bash\n  python VitD_pipeline.py\n  ```\n      \n##### Option B: using the Jupyter Notebook\n\n- Make sure you have the `jupyter` package installed.\n\n  ```bash\n  pip install jupyter\n  ```\n  \n- Start the Jupyter Kernel\n\n    a) If you are working on a local machine:\n  \n    ```bash\n    jupyter notebook --browser=\"browser_of_choice\"\n    ```\n    \n  Replace `browser_of_choice` with your preferred browser (e.g., chrome, firefox). The browser window should pop up automatically. If it doesn't, copy and paste the link provided in the terminal into your browser. The link should look something like this:\n\n    \n   * http://localhost:8889/tree?token=5d4ebdddaf6cb1be76fd95c4dde891f24fd941da909129e6\n    \n       \n    b) If you are working on a remote machine:\n  \n    ```bash\n    jupyter notebook --no-browser\n    ```\n    \n  Then copy and paste the link provided in the terminal in your local browser of choice. It should look something like this:\n\n    \n   * http://localhost:8888/?token=9feac8ff1d5ba3a86cf8c4309f4988e7db95f42d28fd7772\n    \n    \n- Navigate to the `VitD_pipeline.ipynb` in the Jupyter Notebook interface and start executing the cells.\n\n\n### Extract and evaluate disease modules\n\n  - From a dictionary of diseases `disease_genes` the function lcc_significance will calculate the statistical significance of the size of the Largest Connected Component (LCC) of a subgraph induced by the node set `genes` in the network `ppi`. This function generates a null model distribution for the LCC size by resampling nodes from the network while preserving their degrees (`null_model=\"log_binning\"`). The statistical significance of the observed LCC size is then determined by comparing it against this null model distribution.\n  \n  - The parameter `null_model` can be `degree_match`, `log_binning`, `uniform`, or `custom` (defined by the user).\n  \n```python\n#Load disease genes dictonary from the pickle file in `examples/VitaminD/data/disease_genes.pkl`\nwith open(\"examples/VitaminD/data/disease_genes.pkl\",\"rb\") as file:\n  disease_genes = pickle.load(file)\n\nlcc_size = pd.DataFrame(columns = [\"disease\",\"size\",\"zscore\",\"pval\"])\n\nfor d,genes in disease_genes.items():\n    data = netmedpy.lcc_significance(ppi, genes,\n                                     null_model=\"log_binning\",n_iter=10000)\n    new_line = [d,data[\"lcc_size\"],data[\"z_score\"],data[\"p_val\"]]\n    lcc_size.loc[len(lcc_size.index)] = new_line\n\n#Keep only diseases with an LCC larger than 10 and statistically significant\n#Filtering the disease sets to the LCC is optional and not mandatory for the subsequent analyses\nsignificant = lcc_size.query(\"size \u003e 10 and zscore \u003e 2 and pval\u003c0.05\")\ndisease_names = significant.disease\n```\n### Evaluate Average Minimum Shortest Path Length (AMSPL) between Vitamin D and Inflammation and between Vitamin D and Factor IX Deficiency disease\n\n  - The function proximity calculates the proximity between two sets of nodes in a given graph based on the approach described by Guney et al., 2016. The method computes either the average minimum shortest path length (AMSPL) or its symmetrical version (SASPL) between two sets of nodes.\n\n   In this example, the function calculates the proximity between the Vitamin D targets stored in `examples/VitaminD/data/vitd_targets.pkl` and the disease genes from the `examples/VitaminD/data/disease_genes.pkl` file for the two diseases: `Inflammation` and `Factor IX Deficiency`. The null model of choice, in this case, is `log_binning`.\n\n   - The function returns a dictionary containing various statistics related to proximity, including:\n       - 'd_mu': The average distance in the randomized samples.\n       - 'd_sigma': The standard deviation of distances in the randomized samples.\n       - 'z_score': The z-score of the actual distance in relation to the randomized samples.\n       - 'p_value_single_tail': One-tail P-value associated with the proximity z-score\n       - 'p_value_double_tail': Two-tail P-value associated with the proximity z-score\n       - 'p_val': P-value associated with the z-score.\n       - 'raw_amspl': The raw average minimum shortest path length between the two sets of interest.\n       - 'dist': A list containing distances from each randomization iteration.\n\n       \n```python\n#Load PPI network\nwith open(\"examples/VitaminD/data/ppi_network.pkl\",\"rb\") as file:\n  ppi = pickle.load(file)\n\n#Load drug targets\nwith open(\"examples/VitaminD/data/vitd_targets.pkl\",\"rb\") as file:\n  targets = pickle.load(file)\n\n#Load disease genes\nwith open(\"examples/VitaminD/data/disease_genes.pkl\",\"rb\") as file:\n  disease_genes = pickle.load(file)\n\ninflammation = netmedpy.proximity(ppi, targets,\n                                  dgenes[\"Inflammation\"], sp_distance,\n                                  null_model=\"log_binning\",n_iter=10000,\n                                  symmetric=False)\n\nfactorix = netmedpy.proximity(ppi, targets,\n                                  dgenes[\"Factor IX Deficiency\"], sp_distance,\n                                  null_model=\"log_binning\",n_iter=10000,\n                                  symmetric=False)\n\nplot_histograms(inflammation, factorix)\n```\n\n### Evaluate Average Minimum Shortest Path Length (AMSPL) under different distances \n\n- The function `all_pair_distances` calculates distances between every pair of nodes in a graph according to the specified method and returns a DistanceMatrix object. This function supports multiple distance calculation methods, including shortest path, various types of random walks, and user-defined methods.\n  \n- The function `screening` screens for relationships between sets of source and target nodes within a given network, evaluating proximity or separation. This function facilitates drug repurposing and other network medicine applications by allowing the assessment of network-based relationships.\n\n- In this example using the `all_pair_distances` function the distance between every pair of nodes in the protein-protein interaction network stored in the file `examples/VitaminD/data/ppi_network.pkl` are calculated, using different parameters for the method of calculation: `random_walk`, `biased_random_walk`, and `communicability`.\n\n- For each calculation of the distance matrix the AMSPL is calculated using the `screening` function evaluating `proximity`.\n\n```python\n#Load PPI network\nwith open(\"examples/VitaminD/data/ppi_network.pkl\",\"rb\") as file:\n  ppi = pickle.load(file)\n\n#Load drug targets\nwith open(\"examples/VitaminD/data/vitd_targets.pkl\",\"rb\") as file:\n  targets = pickle.load(file)\n\n#Load disease genes\nwith open(\"examples/VitaminD/data/disease_genes.pkl\",\"rb\") as file:\n  disease_genes = pickle.load(file)\n\n#Shortest Paths\namspl = {\"Shortest Path\":screen_data[\"raw_amspl\"]}\n\n#Random Walks\nsp_distance = netmedpy.all_pair_distances(ppi,distance=\"random_walk\")\nscreen_data = netmedpy.screening(vit_d, dgenes, ppi,\n                                 sp_distance,score=\"proximity\",\n                                 properties=[\"raw_amspl\"],\n                                 null_model=\"log_binning\",\n                                 n_iter=10,n_procs=20)\n\namspl[\"Random Walks\"] = screen_data[\"raw_amspl\"]\n\n#Biased Random Walks\nsp_distance = netmedpy.all_pair_distances(ppi,distance=\"biased_random_walk\")\nscreen_data = netmedpy.screening(vit_d, dgenes, ppi,\n                                 sp_distance,score=\"proximity\",\n                                 properties=[\"raw_amspl\"],\n                                 null_model=\"log_binning\",\n                                 n_iter=10,n_procs=20)\n\namspl[\"Biased Random Walks\"] = screen_data[\"raw_amspl\"]\n\n\n#Communicability\nsp_distance = netmedpy.all_pair_distances(ppi,distance=\"communicability\")\nscreen_data = netmedpy.screening(vit_d, dgenes, ppi,\n                                 sp_distance,score=\"proximity\",\n                                 properties=[\"raw_amspl\"],\n                                 null_model=\"log_binning\",\n                                 n_iter=10,n_procs=20)\n\namspl[\"Communicability\"] = screen_data[\"raw_amspl\"]\n```\n\n### Use Case 2: Robustness analysis\n\nThis notebook evaluates the robustness of network-based proximity calculations between Vitamin D targets and disease-associated genes.\n\nThe objective is to assess how perturbations in the input data—such as different **Protein-Protein Interaction (PPI) networks**, disease-associated genes, and Vitamin D targets—impact the results obtained in the previous case study.\n\n\n#### Data Preparation\n\nBefore running the PPI robustness analysis, you need to download and preprocess both the PPI networks and the Vitamin D target list. This is done in three steps:\n\n**1. Download and preprocess PPI networks**\n\nFor this step, we use the `examples/VitaminD/supplementary/sup_code/data_integration/BioNets.ipynb` notebook and the `BioNetTools.py` module, which provides helper functions to fetch raw PPI files from BioGrid and STRING databases (MITAB, gz, or zip), extract them, convert to a pandas DataFrame, and dump out ready‑to‑use network CSVs. Consult the `BioNets.ipyn` notebook for package requirements.\n    \nAll processed networks will end up in `xamples/VitaminD/supplementary/sup_data/alternative_ppi/`.\n\n**2. Vitamin D targets**\n\nUse the same Vitamin D targets as in the main Vitamin D pipeline, shown in the previous example.\nThese targets are generated by the `examples/VitaminD/supplementary/sup_code/data_integration/Vit_D_Targets.ipynb` Notebook, and stored in `examples/VitaminD/data/input/drug_targets/vitd_targets_cpie.pkl`.\n\n**3. Run robustness analysis**\n\nFinally, run the Notebook `examples/VitaminD/supplementary/sup_code/robustness/PPI_Robustness.ipynb`. Consult this notebook for specifications on required packages. It will:\n\n- Load your main PPI from `examples/VitaminD/data/input/ppi/ppi_network.pkl`.\n\n- Load each alternative PPI from `examples/VitaminD/supplementary/sup_data/alternative_ppi/` (e.g. ppi_biogrid.csv).\n\n- Load the Vitamin D target list from `examples/VitaminD/data/input/drug_targets/vitd_targets_cpie.pkl`.\n  \n- Load the disease-associated genes from `examples/VitaminD/data/input/disease_genes/disease_genes_merge.pkl`.\n\n- Compute and plot the network‐robustness metrics for each scenario.\n\n\n### Use Case 3: Introduction to Network Medicine and Data Generation. \n\nThis example introduces the core concepts of network medicine through a guided analysis of Vitamin D's relationship to several diseases using protein-protein interaction networks. The Jupyter notebook (`examples/NetworkMedicineIntro/Intro_Network_Medicine.ipynb`) provides a step-by-step workflow demonstrating how to build and analyze biological networks to uncover drug-disease relationships. Consult this notebook for specifications on required packages.\n\n### Notebook workflow - Steps in `Intro_Network_Medicine.ipynb`:\n\n**1. Download and filter STRING PPI data** \n\nThe notebook first defines the URL for STRING v12 and downloads the protein-protein interaction data:\n\n```python\n# Define the URL for the STRING PPI dataset\nstring_url = \"https://stringdb-downloads.org/download/protein.physical.links.v12.0/9606.protein.physical.links.v12.0.txt.gz\"\n\n# Define paths for temporary files\nstring_gz_path = './tmp_string/string.gz'\n\n# Download and extract STRING data\nprint(\"Downloading STRING dataset...\")\ntools.download_file(string_url, string_gz_path)\ntools.ungz_file(string_gz_path, \"./tmp_string/string_data\")\n```\n\nIt then processes the data by removing prefixes and converting Ensembl IDs to HGNC symbols:\n\n```python\nprint(\"Processing protein names...\")\nstring_df[\"protein1\"] = string_df[\"protein1\"].str.replace(\"9606.\", \"\", regex=False)\nstring_df[\"protein2\"] = string_df[\"protein2\"].str.replace(\"9606.\", \"\", regex=False)\n\n# Convert Ensembl IDs to HGNC symbols\nens_to_hgnc = tools.ensembl_to_hgnc(string_df)\nstring_df[\"HGNC1\"] = string_df[\"protein1\"].map(ens_to_hgnc)\nstring_df[\"HGNC2\"] = string_df[\"protein2\"].map(ens_to_hgnc)\n```\n\nFinally, it filters the network, extracts the largest connected component, and saves it:\n\n```python\nfiltered_df = string_df.query(\"weight \u003e 300\")\nG_string = nx.from_pandas_edgelist(filtered_df, 'HGNC1', 'HGNC2', create_using=nx.Graph())\n\nG_string = netmedpy.extract_lcc(G_string.nodes, G_string)\n\n# Save to CSV\ndf_edges = nx.to_pandas_edgelist(G_string)\ndf_edges.to_csv(\"output/string_ppi_filtered.csv\", index=False)\n```\n\n**2. Extract Vitamin D targets**\n   \nThe notebook extracts compound-protein databases - PubChem, Chembl, STITCH, CTD, DTC, BDB, DrugBank, OTP, DrugCentral from a pre-packaged zip file for their posterior integration with  [CPIExtract](https://github.com/menicgiulia/CPIExtract):\n\n```python\n# Define database directory path\ndata_path = \"./output/cpie_Databases\"\n\nif os.path.exists(data_path):\n    shutil.rmtree(data_path)\n\ntools.unzip_file(\"../VitaminD/supplementary/sup_data/cpie_databases/Databases.zip\", data_path)\n```\n\nIt then loads multiple databases into memory and searches for Vitamin D (Cholecalciferol) targets:\n\n```python\n# Store all databases in a dictionary\ndbs = {\n    'chembl': chembl_data,\n    'bdb': BDB_data,\n    'stitch': sttch_data,\n    'ctd': CTD_data,\n    'dtc': DTC_data,\n    'db': DB_data,\n    'dc': DC_data\n}\n\n# Cholecalciferol (PubChem CID: 5280795)\ncomp_id = 5280795\n\n# Initialize Comp2Prot\nC2P = Comp2Prot('local', dbs=dbs)\n\n# Search for interactions\ncomp_dat, status = C2P.comp_interactions(input_id=comp_id)\n\n# Extract HGNC symbols\nvd_targets = {\"Vitamin D\": list(comp_dat.hgnc_symbol)} \n\n# Save extracted targets\nwith open('./output/vd_targets.json', 'w') as f:\n    json.dump(vd_targets, f)\n```\n\n**3. Extract and filter disease gene associations**\n\nDisease-gene associations are loaded from DisGeNet files and filtered by confidence score:\n\n```python\n# Directory containing the disease genes\ndis_gene_path = \"input_data/disease_genes\"\n\ndisease_file_names = {\n    \"Huntington\":\"DGN_Huntington.csv\",\n    \"Inflammation\": \"DGN_inflammation.csv\",\n    \"Rickets\": \"DGN_Rickets.csv\",\n    \"Vit. D deficiency\": \"DGN_VDdeff.csv\"\n}\n\ndisease_genes = {}\n\n# Load files and filter for strong associations\nfor name,file_name in disease_file_names.items():\n    path = dis_gene_path + \"/\" + file_name\n\n    df = pd.read_csv(path)\n    df = df.query(\"Score_gda \u003e 0.1\")\n\n    disease_genes[name] =  list(df.Gene)\n\n# Save file\nwith open('./output/disease_genes.json', 'w') as f:\n    json.dump(disease_genes, f)\n```\n\n**4.  Verify network coverage**\n   \nThe notebook checks which disease genes and drug targets are found in the PPI network:\n\n```python\n# Load PPI network\nppi = pd.read_csv(\"output/string_ppi_filtered.csv\")\nppi = nx.from_pandas_edgelist(ppi, 'source', 'target', create_using=nx.Graph())\n\n# Keep only associations existing in the PPI\nnodes = set(ppi.nodes)\nfor name, genes in disease_genes.items():\n    disease_genes[name] = set(genes) \u0026 nodes\n    print(f\"{name}: {len(disease_genes[name])} associations in PPI\")\n\nfor name, targets in dtargets.items():\n    dtargets[name] = set(targets) \u0026 nodes\n    print(f\"{name}: {len(dtargets[name])} targets in PPI\")\n```\n\n**5. Compute random walk distances**\n    \nThe notebook calculates biased random walk distances between all nodes:\n\n\n```python\n# Calculate Random Walk based distance between all pair of genes\ndmat = netmedpy.all_pair_distances(\n    ppi,\n    distance='biased_random_walk',\n    reset = 0.3\n)\n\n# Save distances for further use\nnetmedpy.save_distances(dmat,\"output/ppi_distances_BRW.pkl\")\n```\n\n**6. Calculate proximity with log-binning null model**\n\nThe notebook computes proximity z-scores using the log-binning null model:\n\n```python\n# Calculate proximity between Vitamin D targets and Diseases\nproximity_lb = netmedpy.screening(\n    dtargets, \n    disease_genes, \n    ppi,\n    dmat,\n    score=\"proximity\",\n    properties=[\"z_score\"],\n    null_model=\"log_binning\",\n    n_iter=10000,n_procs=10\n)\n\nzscore_lb = proximity_lb['z_score'].T\nzscore_lb = zscore_lb.sort_values(by='Vitamin D')\nzscore_lb\n```\n\n**7. Repeat analysis with degree-matched null model**\n\nThe same analysis is performed using the degree-match null model for comparison:\n\n```python\npythonproximity_dm = netmedpy.screening(\n    dtargets, \n    disease_genes, \n    ppi,\n    dmat,\n    score=\"proximity\",\n    properties=[\"z_score\"],\n    null_model=\"degree_match\",\n    n_iter=10000,n_procs=10\n)\n\nzscore_dm = proximity_dm['z_score'].T\nzscore_dm = zscore_dm.sort_values(by='Vitamin D')\nzscore_dm\n```\n\n\n**8. Compare results from both null models**\n\nFinally, the notebook combines results from both methods for comparison:\n\n```python\nzscore_lb.columns = [\"Log Binning\"]\nzscore_dm.columns = [\"Degree Match\"]\n\nzscore = pd.merge(zscore_lb,zscore_dm, left_index=True, right_index=True)\n\nzscore\n```\n\nThis produces a table showing z-scores from both null models, with Vitamin D deficiency having the strongest connection to Vitamin D targets.\n\n### Data sources\n\n- STRING v12: Human protein-protein interactions downloaded directly from `stringdb-downloads.org`\n- Compound-target databases: Collection of databases accessed from the VitaminD supplementary data folder\n- DisGeNet: Disease-gene associations provided as CSV files in the `input_data` folder\n\n### Expected outputs\n\nAfter running the notebook, the following files will be created:\n\n\n```bash\noutput/\n├── string_ppi_filtered.csv    # Filtered STRING PPI network\n├── vd_targets.json           # Vitamin D protein targets\n├── disease_genes.json        # Disease gene sets\n├── ppi_distances_BRW.pkl     # Biased random walk distance matrix\n└── cpie_Databases/           # Extracted compound-protein interaction databases\n```\n\n## Package Structure\nRoot folder organization (__init__.py files removed for simplicity):\n```plaintext\n│   .gitattributes                                 \n│   .gitignore                                    \n│   LICENSE.txt                                     // License information for the package\n│   README.md                                       // Package documentation\n│   environment.yml                                 // yml file to create conda environment\n│   setup.py                                        // Package installation script\n│\n├───doc                                             // Documentation directory\n│   └───source                                      // Source files for documentation\n│       │   DistanceMatrix.rst                      // Documentation for DistanceMatrix module\n│       │   NetMedPy.rst                            // Documentation for NetMedPy module\n│       │   conf.py                                 // Sphinx configuration file for documentation\n│       │   index.rst                               // Main index file for documentation\n│       │   Makefile                                // Make file for building documentation\n│       │   make.bat                                // Batch script for building documentation on Windows\n│\n├───examples                                        // directory with working examples using the NetMedPy pipeline\n│   │   Basic_example.py                            // python script for running a basic example to test the pipeline\n│   │   Cronometer.py                               // Performance timing utility\n│   │   VitD_pipeline.ipynb                         // Jupyter notebook with Vitamin D example using the NetMedPy pipeline\n│   │   VitD_pipeline.py                            // python script with Vitamin D example using the NetMedPy pipeline\n│   │   1_4_netsize_edges.png                       // Figure showing network size and edges relationships\n│   │   1_7_prox_vd.png                             // Figure related to proximity and Vitamin D\n│   │   1_8_correlation.png                         // Correlation analysis figure\n│   │   2_2_deviation.png                           // Deviation analysis figure\n│   │   2_3_rank_correlation_distr...               // Rank correlation distribution figure\n│   │\n│   ├───NetworkMedicineIntro                        // Introduction to Network Medicine examples\n│   │   │   Intro_Network_Medicine.ipynb            // Jupyter notebook with intro to network medicine\n│   │   │   tools.py                                // Helper tools for the analysis\n│   │   │\n│   │   └───input_data/disease_genes                // Disease gene data for examples\n│   │           DGN_Huntington.csv                  // Huntington disease gene data\n│   │           DGN_Rickets.csv                     // Rickets disease gene data\n│   │           DGN_VDdeff.csv                      // Vitamin D deficiency gene data\n│   │           DGN_inflammation.csv                // Inflammation gene data\n│   │\n│   └───VitaminD                                    // directory with Vitamin D example using the NetMedPy pipeline\n│       ├───data                                    // directory with data files necessary for the Vitamin D example\n│       │   └───input                               // Input data directory\n│       │       ├───disease_genes                   // Disease gene data directory\n│       │       │       disease_genes_merge.pkl     // Merged disease genes data\n│       │       │\n│       │       ├───drug_targets                    // Drug target data directory\n│       │       │       vitd_targets_cpie.pkl       // Vitamin D targets data\n│       │       │\n│       │       └───ppi                             // Protein-protein interaction data\n│       │               ppi_network.pkl             // PPI network data\n│       │               Alias.csv                   // Alias mapping file\n│       │\n│       ├───guney                                   // Implementation of Guney's network algorithms\n│       │       distances.py                        // Distance calculation functions\n│       │       network.py                          // Network manipulation functions\n│       │\n│       ├───output                                  // directory where the output files from the Vitamin D example are saved\n│       │       amspl.pkl                           // Analysis output file\n│       │       d1_d2.pkl                           // Disease pairs data\n│       │       inf_fix.pkl                         // Inflammation-related output\n│       │       inf_hun.pkl                         // Huntington-related output\n│       │       lcc_size.pkl                        // Largest connected component size data\n│       │       performance_size.csv                // Performance metrics\n│       │       screen.pkl                          // Screening results\n│       │\n│       └───supplementary                           // Supplementary materials\n│           └───sup_code                            // Supplementary code\n│               └───data_integration                // Data integration scripts\n│\n├───images                                          // directory with figures from paper\n│       OverviewPipeline.png                        // pipeline flowchart figure from paper\n│\n└───netmedpy                                        // directory containing the python scripts that contain the functions of the NetMedPy pipeline\n        DistanceMatrix.py                           // Module for distance matrix calculations\n        NetMedPy.py                                 // Core NetMedPy functionality\n```\n\n## Further information\n\nDetails about each function (what it is used for, what the input parameters are, the possible values of the input parameters, what the output is) from the pipeline are available in `doc/build/html/NetMedPy.html` and in the `netmedpy/NetMedPy.py` script in the comments before each function. \n\n\n## License\n\nThis project is licensed under the terms of the MIT license.\n\n\n## References\n\n\u003cb id=\"f1\"\u003e1\u003c/b\u003e Barabási, A. L., Gulbahce, N., \u0026 Loscalzo, J. (2011). Network medicine: a network-based approach to human disease. Nature Reviews Genetics, 12(1), 56-68.[DOI 10.1038/nrg2918](https://doi.org/10.1038/nrg2918) [↩](#a1)\n\n\u003cb id=\"f2\"\u003e2\u003c/b\u003e Menche, Jörg, et al. \"Uncovering disease-disease relationships through the incomplete interactome.\" Science 347.6224 (2015). [DOI 10.1126/science.1257601](https://doi.org/10.1126/science.1257601) [↩](#a2)\n\n\u003cb id=\"f3\"\u003e3\u003c/b\u003e Guney, Emre, et al. \"Network-based in silico drug efficacy screening.\" Nature Communications 7,1 (2015). [DOI 10.1038/ncomms10331](https://doi.org/10.1038/ncomms10331) [↩](#a3)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmenicgiulia%2FNetMedPy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmenicgiulia%2FNetMedPy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmenicgiulia%2FNetMedPy/lists"}