https://github.com/greenelab/greedy-geneset-selection

Source code associated with "Leveraging global gene expression patterns to predict expression of unmeasured genes"
https://github.com/greenelab/greedy-geneset-selection

cancer gene-expression methodology

Last synced: 4 months ago
JSON representation

Source code associated with "Leveraging global gene expression patterns to predict expression of unmeasured genes"

Host: GitHub
URL: https://github.com/greenelab/greedy-geneset-selection
Owner: greenelab
License: bsd-2-clause
Created: 2015-10-14T19:41:28.000Z (almost 10 years ago)
Default Branch: master
Last Pushed: 2015-12-09T18:16:53.000Z (almost 10 years ago)
Last Synced: 2025-01-13T00:42:10.885Z (9 months ago)
Topics: cancer, gene-expression, methodology
Language: R
Homepage: http://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-2250-5
Size: 296 KB
Stars: 4
Watchers: 6
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

#Greedy Geneset Selection

#### By [James Rudd](http://www.dartmouth.edu/~doherty/personnel.html), [Rene A. Zelaya](http://www.greenelab.com/lab-members), and [Casey Greene](http://www.greenelab.com/)

[![DOI](https://zenodo.org/badge/doi/10.5281/zenodo.35086.svg)](http://dx.doi.org/10.5281/zenodo.35086)

## Description

This repository houses methods used to identify 'tag' genes in expression data such that a reduced number of genes is directly assayed but the expression values for a maximized number of genes can be predicted. We created a greedy approach to identify genes to directly measure which maximize the number of genes whose expression we can predict.

The greedy geneset selection (GGS) algorithm requires three arguments:

1. A binary gene by gene symmetrical matrix in which 1 at position i,j indicates a strong correlation between gene_i and gene_j

2. Number of genes to directly measure

3. Fold redundancy. Redundancy is the number of directly measured genes that have to be correlated with an unmeasured gene in order for the unmeasured gene to be predictable.

GGS can also be used in the context of existing candidate gene
lists. Candidate genes are automatically added to the directly
measured geneset.

For an explanation of the command line arguments use
python GGS.py -h

Also included in this repository is the workflow used for analyses and figures for the GGS manuscript (excluding those using non-public data from Konecny et al.) The bash script GGS.sh runs analyses sequentially and produces plots. Seperate and confirmatory analyses were performed using TCGA RNAseq data and R scripts for these are located in the BrCa subdirectory.

##Dependencies

GGS has been tested using Python 2.7 on Ubuntu 14.04

Used python libraries:

- docopt 0.6.1
- numpy 1.7.1
- itertools
- collections
- copy

Analyses for the publication were performed using R version 3.1.2

R libraries used:

- ggplot2 1.0.0
- reshape 0.8.5
- survcomp 1.16.0
- survival 2.37
- randomForest 4.6
- outliers 0.14
- reshape2 1.4.1
- boot 1.3
- doMC 1.3.3
- ff 2.2
- curatedOvarianData 1.3.4
- igraph 0.7.1
- RTCGAToolbox 1.99.4
- doppelgangR (https://github.com/lwaldron/doppelgangR last tested using commit sha: f0e02bca770efab4671a8290cd2580f0ec5cc674)

For convenience, scripts are included in the GGS.sh bash file to help install doppelgangR but are commented out. It is recommended that you install the latest version from the github repository but the included scripts worked as of 6/18/15

[![Analytics](https://ga-beacon.appspot.com/UA-20583020-7/greenelab/greedy-geneset-selection/README?pixel)](https://github.com/igrigorik/ga-beacon)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/greenelab/greedy-geneset-selection

Awesome Lists containing this project

README