Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/edawson/firehose
A collection of scripts for munging genomic and epigenetic data from the Broad Firehose.
https://github.com/edawson/firehose
Last synced: 4 days ago
JSON representation
A collection of scripts for munging genomic and epigenetic data from the Broad Firehose.
- Host: GitHub
- URL: https://github.com/edawson/firehose
- Owner: edawson
- License: gpl-2.0
- Created: 2014-07-31T17:35:52.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2014-09-07T03:56:40.000Z (about 10 years ago)
- Last Synced: 2023-04-04T14:14:09.982Z (over 1 year ago)
- Language: Python
- Size: 215 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Firehose_Preprocessing
======================A collection of scripts for munging genomic and epigenetic data from
the Broad Firehose.## Description and Motivation
The Broad Institute of MIT and Harvard provides near-monthly releases of
pre-processed data from The Cancer Genome Atlas. Firehose remains the
most convenient way to programatically download data from TCGA, but often
additional preprocessing is needed to make analysis using R or python
convenient. This package provides a set of scripts for processing, cleaning,
and transforming data from the Broad Firehose site to make it easier to perform
analyses.## Dependencies
To run any scripts in this package you'll need to install a few dependencies. First,
make sure you're running a recent version of Python (preferably from the 2.7 generation.
Next, you'll need to install a few commonly-used Python libraries for data analysis:1. NumPy
2. SciPy
3. PANDAS
4. MatplotlibOn Ubuntu, these may in turn have some other dependencies such as:
1. python-dev
2. build-essentialThe easiest way to install all of these is with a combination of APT and pip:
**NB: Always be careful and responsible using sudo**
```sudo apt-get install python-dev build-essential python-pip```
```sudo pip install numpy```
```sudo pip install scipy```
```sudo pip install pandas```
```sudo pip install matplotlib```
## Installation
Now you're ready to install the Firehose package:```git clone https://github.com/edawson/Firehose.git```
Right away you can run scripts out of the Firehose directory, but if you'd like
to link them into /usr/bin/ and run them anywhere you can do this:```sudo ln -s "```
That's it! Currently there are no options for fine-tuning processing because these scripts try
to stay as minimalistic as possible. As time goes on usage may be subject to change, but
the above usage will **always** perform the standard transformations.Running the FirehosePreprocess script will perform the batch preprocessing of the entire set of files
(miRNA, mRNA, CNV, methylation, and MAF). Its usage is a little ugly but will be improved upon in the
future. You'll need to modify the variables directly at the top of the script; I promise to make this better
as time permits.## Contacting the Author
If you use these scripts please make sure to credit where they came from:>Eric T. Dawson
>github.com/edawson/Firehose
>erictdawson.comIf you would like to contact the author for comments, criticism, or praise, he can be reached
through his website above.## Acknowledgements
These scripts were developed during the author's stint as a student intern at The Ontario
Institute for Cancer Research. They are a (nearly) direct port of scripts originally produced
by Dr. Guanming Wu.