{"id":20074734,"url":"https://github.com/greenelab/phenoplier","last_synced_at":"2025-07-07T01:37:59.073Z","repository":{"id":40290369,"uuid":"273271013","full_name":"greenelab/phenoplier","owner":"greenelab","description":"PhenoPLIER","archived":false,"fork":false,"pushed_at":"2024-04-08T20:01:38.000Z","size":218439,"stargazers_count":11,"open_issues_count":6,"forks_count":5,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-04-08T22:47:20.547Z","etag":null,"topics":["complex-traits","gene-modules","software","tool","twas","unsupervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/greenelab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2020-06-18T15:13:58.000Z","updated_at":"2024-03-31T11:37:13.000Z","dependencies_parsed_at":"2023-01-30T11:46:26.542Z","dependency_job_id":"dc74eb70-8f5c-481b-a625-0678c984ae3f","html_url":"https://github.com/greenelab/phenoplier","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Fphenoplier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Fphenoplier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Fphenoplier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Fphenoplier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/greenelab","download_url":"https://codeload.github.com/greenelab/phenoplier/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224470642,"owners_count":17316704,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["complex-traits","gene-modules","software","tool","twas","unsupervised-learning"],"created_at":"2024-11-13T14:54:04.490Z","updated_at":"2024-11-13T14:54:05.135Z","avatar_url":"https://github.com/greenelab.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PhenoPLIER (source code)\n\n[![HTML Manuscript](https://img.shields.io/badge/manuscript-HTML-blue.svg)](https://greenelab.github.io/phenoplier_manuscript/)\n[![PDF Manuscript](https://img.shields.io/badge/manuscript-PDF-blue.svg)](https://greenelab.github.io/phenoplier_manuscript/manuscript.pdf)\n\n## Contents\n\n * [Overview](#overview)\n * [Quick demo](#quick-demo)\n * [Code and data](#code-and-data)\n * [Setup](#setup)\n * [Running the code](#running-the-code)\n\n## Overview\n\n![](images/phenoplier_overview.png)\n\nPhenoPLIER is a flexible computational framework that combines gene-trait and gene-drug associations with gene modules expressed in specific contexts (see Figure above).\nThe approach uses a latent representation (with latent variables or LVs representing gene modules) derived from a large gene expression compendium to integrate TWAS with drug-induced transcriptional responses for a joint analysis.\nThe approach consists in three main components:\n 1) an LV-based regression model to compute an association between an LV and a trait,\n 2) a clustering framework to learn groups of traits with shared transcriptomic properties, and\n 3) an LV-based drug repurposing approach that links diseases to potential treatments.\n\nFor more details, check out our article in [Nature Communications](https://doi.org/10.1038/s41467-023-41057-4) or our [Manubot web version](https://greenelab.github.io/phenoplier_manuscript/).\nTo cite PhenoPLIER, see [10.1038/s41467-023-41057-4](https://doi.org/10.1038/s41467-023-41057-4):\n\n\u003e **Projecting genetic associations through gene expression patterns highlights disease etiology and drug mechanisms**\u003cbr\u003e\nPividori, M., Lu, S., Li, B. *et al.*\u003cbr\u003e\n*Nat Commun* **14**, 5562 (2023) \u003chttps://doi.org/gspsxr\u003e\u003cbr\u003e\nDOI: [10.1038/s41467-023-41057-4](https://doi.org/10.1038/s41467-023-41057-4)\n\n**Interested in using PhenoPLIER? Any questions?** Check out our [Discussions section](https://github.com/greenelab/phenoplier/discussions) and start a discussion by asking a question or sharing your thoughts. We are happy to help!\n\n## Quick demo\n\nYou can follow the instructions in the [`demo/`](nbs/99_demo) folder to run a small demo on real data to quickly see what you can do with PhenoPLIER. \nDepending on your Internet connection, downloading the necessary data for the demo should take less than 5-10 minutes.\nRunning the demo code should take between 2-5 minutes.\n\n## Code and data\n\nThis repository contains both the *code* (mostly Jupyter notebooks) and the *data* generated by this project.\nIf you only want to access the code, then you can avoid downloading everything by using this command:\n```bash\nGIT_LFS_SKIP_SMUDGE=1 git clone git@github.com:greenelab/phenoplier.git\n```\n\nIf you want to download everything, you can run `git clone` without the `GIT_LFS_SKIP_SMUDGE=1` part.\n\nYou can access individual *data* files by going to the [`data/`](data/) folder, selecting the one you are interested in, and downloading them.\nYou will find all the data matrices mentioned in the manuscript, as well as files to see which genes belong to each latent variable (LV, or gene module) and their weights, or which pathways are associated with each LV, among other files.\nIf you use any of these files, please carefully follow the [instructions for citations](data/) since this project uses data generated by others.\n\n## Setup\n\n### Software requirements\n\nTo prepare the environment to run the PhenoPLIER code, follow the steps in [environment](environment/).\nThis will create a conda environment and download the necessary data.\nDepending on your Internet speed, this shouldn't take more than 24 hours.\nWe tested the code in Ubuntu 20.04+.\n\n**We strongly recommend** using our Docker image (see below), which will greatly simplify running the code and make sure you use the same environment for the analyses.\n\n### Hardware requirements\n\nMost of the code was run with an Intel Core i5 (4 cores) and 64 GB of RAM (32 GB should be enough also).\nThe null simulations and gene-gene correlation matrices for the GLS model were computed using the [LPC cluster](https://www.med.upenn.edu/dart/computing.html) at the University of Pennsylvania.\nThe setup will download ~130 GB of input data and software needed.\nIf you run all the analyses, they will generate ~1100 GB of results files, which includes the null simulations for the GLS model (if you skip this, results would be ~50 GB).\nTherefore, you would need to have at least ~1200 GB if you plan to run all the steps.\nRunning all the steps would take around a week under this hardware configuration (without considering the cluster jobs, which would depend on the resources available).\n\n## Running the code\n\nYou basically have two options to run the code: 1) create a local conda environment in your computer, or 2) use our Docker image (where you don't need to create a conda environment).\nUsing Docker should be much easier, and it is the recommended way.\nBelow, we first show how to run the code using the command-line (terminal) and your browser.\nThen we show how to do the same but with Docker.\n\n\n### From the command-line\n\nFirst, activate your conda environment and export your settings to environmental variables so non-Python scripts can access them:\n```bash\nconda activate phenoplier\n\n# before running the code below, make sure your environment variables\n# PHENOPLIER_ROOT_DIR and PHENOPLIER_MANUSCRIPT_DIR point to the right location \neval `python libs/conf.py`\n```\n\nThe code to preprocess data and generate results is in the `nbs/` folder. All\nnotebooks are organized by directories, such as `01_preprocessing`, with file\nnames that indicate the order in which they should be run (if they share the prefix, then it\nmeans they can be run in parallel). For example, to run\nall notebooks for the preprocessing step, you can use this command (requires\n[GNU Parallel](https://www.gnu.org/software/parallel/)):\n\n```bash\nparallel -k --lb --halt 2 -j1 'bash nbs/run_nbs.sh {}' ::: nbs/01_preprocessing/*.ipynb\n```\n\n### From your browser\n\nAlternatively, you can start your JupyterLab server by running:\n\n```bash\nbash scripts/run_nbs_server.sh\n```\n\nThen, go to [`http://localhost:8892`](http://localhost:8892), browse the `nbs` folder, and run the\nnotebooks in the specified order.\n\n### Using Docker\n\nYou can also run all the steps above using a Docker image instead of a local conda environment.\nThis means that you **do not need to create a conda environment** nor activate it before using Docker.\n\nFirst, pull the latest Docker image:\n\n```bash\ndocker pull miltondp/phenoplier\n```\n\nThe image only contains the conda environment with the code in this repository, so after pulling the image you need to download the data as well.\nFirst, create a directory in your machine where data will be downloaded/saved:\n\n```bash\n# specify a directory in your computer where data will be stored\nexport DATA_FOLDER=\"/tmp/phenoplier_data\"\nmkdir -p ${DATA_FOLDER}\n```\n\nThen run the script to download the data:\n\n```bash\ndocker run --rm \\\n  -v \"${DATA_FOLDER}:/opt/data\" \\\n  --user \"$(id -u):$(id -g)\" \\\n  miltondp/phenoplier \\\n  /bin/bash -c \"python environment/scripts/setup_data.py\"\n```\n\nThe `-v` parameter allows to mount a local directory into the container; in this case, it specifies a local directory (`${DATA_FOLDER}` pointing to `/tmp/phenoplier_data`) where the data will be downloaded and results saved.\nIf you want to generate the figures and tables for the manuscript, you need to clone the [PhenoPLIER manuscript repo](https://github.com/greenelab/phenoplier_manuscript) and pass it with `-v [PATH_TO_MANUSCRIPT_REPO]:/opt/manuscript`.\nOn the other hand, if you want to pass environment variables like the number of CPU cores to use, you need to use parameter `-e`, such as: `-e PHENOPLIER_N_JOBS=2` for 2 CPU cores.\n\nYou can run notebooks from the command line. For example:\n\n```bash\ndocker run --rm \\\n  -v \"${DATA_FOLDER}:/opt/data\" \\\n  --user \"$(id -u):$(id -g)\" \\\n  miltondp/phenoplier \\\n  /bin/bash -c \"parallel -k --lb --halt 2 -j1 'bash nbs/run_nbs.sh {}' ::: nbs/01_preprocessing/*.ipynb\"\n```\n\nwill run all the notebooks to preprocess the input data.\nAll resulting files will be saved in the folder specified in `${DATA_FOLDER}`.\n\nYou can also start a JupyterLab server with:\n\n```bash\ndocker run --rm \\\n  -p 8888:8892 \\\n  -v \"${DATA_FOLDER}:/opt/data\" \\\n  --user \"$(id -u):$(id -g)\" \\\n  miltondp/phenoplier\n```\n\nand access the interface by going to [`http://localhost:8888`](http://localhost:8888).\n\nYou might also want to modify the code for your needs.\nIn that case, you need to clone this repository and mount the directory into the container using `-v \"${PATH_TO_THIS_REPO}:/opt/code\"`.\nThe [`demo/`](nbs/99_demo) has instructions to do this as well.\nMounting the code directory will also allow you to see the output of the notebooks after running them.\nOtherwise, although you'll be able to access the resulting files under `${DATA_FOLDER}`, you won't see other outputs that are saved inside the notebooks.\n\nUse the [Discussions section](https://github.com/greenelab/phenoplier/discussions) if you have any questions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreenelab%2Fphenoplier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgreenelab%2Fphenoplier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreenelab%2Fphenoplier/lists"}