{"id":20074779,"url":"https://github.com/greenelab/brd-net","last_synced_at":"2026-05-09T21:44:44.484Z","repository":{"id":79359153,"uuid":"193949995","full_name":"greenelab/brd-net","owner":"greenelab","description":"Transfer learning for uncovering the biology underlying rare disease","archived":false,"fork":false,"pushed_at":"2019-09-13T19:52:59.000Z","size":1855,"stargazers_count":1,"open_issues_count":1,"forks_count":1,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-01-13T00:42:13.072Z","etag":null,"topics":["analysis","gene-expression","methodology","rare-disease","tool"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/greenelab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-06-26T17:31:00.000Z","updated_at":"2024-03-12T12:44:49.000Z","dependencies_parsed_at":"2023-06-07T23:15:19.079Z","dependency_job_id":null,"html_url":"https://github.com/greenelab/brd-net","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Fbrd-net","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Fbrd-net/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Fbrd-net/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Fbrd-net/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/greenelab","download_url":"https://codeload.github.com/greenelab/brd-net/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241506478,"owners_count":19973611,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analysis","gene-expression","methodology","rare-disease","tool"],"created_at":"2024-11-13T14:54:19.404Z","updated_at":"2026-05-09T21:44:44.450Z","avatar_url":"https://github.com/greenelab.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# brd-net\nTransfer learning for uncovering the biology underlying rare disease\n\n[![Build Status](https://travis-ci.com/greenelab/brd-net.svg?branch=master)](https://travis-ci.org/ben-heil/brd-net)\n\nThe goal of this project is to \n1. train a model to differentiate between the gene expression of healthy individuals and individuals with a disease\n2. apply the model to rare diseases and find how gene expression differs in the disease state\n\n## Method Overview\nThe current plan is to use [PLIER](https://github.com/wgmao/PLIER) as a dimensionality reduction method to help the training of a model that\npredicts whether gene expression is healthy or unhealthy. \nPLIER is well suited for this problem, because it uses prior biological knowlege to constrain the embedding.\nMore specifically, we can specify biological pathways or sets of genes, and PLIER constrains the latent variables it selects to be close to the binary set of genes in the pathway.\nA classifier will then be run on the embedded data to make predictions about diseases, hopefully classifying their gene expression as unhealthy.\nFinally, some form of model interpretation will determine which of the latent variables used by PLIER most strongly indicate to the classifier that something is wrong.\n\nSome patients don't ever receive a diagnosis explaining what their rare disease is, and a method like brd-net would help guide doctors and researchers to find the right diagnosis.\n\n## Data Generation\nThe [Sequence Read Archive](https://www.ncbi.nlm.nih.gov/sra) is a large repository of various types of biological sequence data.\nWhile it is possible to filter based on the type of sequence or species, there is not a database (that we know of) that contains labels denoting whether samples\ncorrespond to healthy or unhealthy gene expression. \nAs a result, we had to construct such a dataset by hand.\nThis hand labeling was done using the script [find\\_studies.py](brdnet/find_studies.py), which queries [MetaSRA](http://metasra.biostat.wisc.edu/) to retrieve\ngene expression from all tissue samples in the SRA that have associated metadata.\n[find\\_studies.py](brdnet/find_studies.py) then provides an interface to write string matching rules to sort samples into \nthe categories 'healthy', 'disease', and 'unknown' based on their titles. \nThe expression data for the set of assigned samples from [find\\_studies.py](brdnet/find_studies.py) can be downloaded by \nrunning the notebook [download\\_categorized\\_data.ipynb](brdnet/download_categorized_data.ipynb).\n\n\n## Installation\nMost of the dependencies (for both R and python) are included in the file [environment.yml](environment.yml).\nUpon [installing Anaconda](https://docs.anaconda.com/anaconda/install/), the dependencies can be installed and loaded with the following command:\n\n```sh\nconda env create --file environment.yml\nconda activate brdnet\n```\n\nThe one package that must be installed manually is PLIER, due to an issue in compatibility between Conda's tools for\nbuilding packages from Github and the Bioconductor package repository.\nFortunately, it can be installed easily by activating the brdnet conda environment, starting R, and running the commands below.\n\n```R\nlibrary('devtools')\ninstall_github('wgmao/PLIER')\n```\n\n\nOnce everything is installed, if you want to create a Jupyter Notebook kernel from the environment, you can do so with\n\n```sh\nconda activate brdnet\npython -m ipykernel install --user --name brdnet --display-name \"brdnet\"\n```\n## Run Order \nBecause some scripts depend on the output of others, running them in order is important when starting from scratch.\nThe recommended running order is as follows:\n\n1. Run [find\\_studies.py](brdnet/find_studies.py) to label samples from studies which contain adult gene expression that is clearly healthy or unhealthy\n2. Run [download\\_categorized\\_data.ipynb](brdnet/download_categorized_data.ipynb) to download the expression data for the samples output by find\\_studies.py\n3. If you want to filter your results based on ontology terms, run [subset\\_studies.py](brdnet/subset_studies.py).\n4. Run [model\\_evaluation\\_pipeline.sh](brdnet/model_evaluation_pipeline.sh), which runs PLIER with different k values, then calls [evaluate\\_models.py](brdnet/evaluate_models.py) on the results\n\n### Note\nThe environment file explicitly references the channel for each dependency.\nWhile this makes it easier to follow, older versions of Anaconda may not support this format.\nDevelopment was done with conda version 4.7.5, so any version newer than that should work fine.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreenelab%2Fbrd-net","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgreenelab%2Fbrd-net","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreenelab%2Fbrd-net/lists"}