https://github.com/mschubert/tcgabiolinks-downloader
GNU Make-driven workflow to download TCGA data via the TCGAbiolinks package
https://github.com/mschubert/tcgabiolinks-downloader
cancer reproducible-research rstats tcga-data
Last synced: 2 months ago
JSON representation
GNU Make-driven workflow to download TCGA data via the TCGAbiolinks package
- Host: GitHub
- URL: https://github.com/mschubert/tcgabiolinks-downloader
- Owner: mschubert
- Created: 2018-02-07T11:06:22.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2022-01-22T17:04:05.000Z (over 3 years ago)
- Last Synced: 2025-04-14T17:38:10.213Z (6 months ago)
- Topics: cancer, reproducible-research, rstats, tcga-data
- Language: R
- Homepage:
- Size: 18.6 KB
- Stars: 8
- Watchers: 3
- Forks: 0
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
TCGAbiolinks-downloader
=======================This workflow is using the [TCGAbiolinks
package](http://bioconductor.org/packages/release/bioc/html/TCGAbiolinks.html)
to download data from the NCI's [Genomic Data Commons](https://docs.gdc.cancer.gov/).All files are stored as `.RData` in their respecitive analysis
directories.Requirements
------------The following software is required to run this workflow:
* A recent version of R
* The [TCGAbiolinks package](http://bioconductor.org/packages/release/bioc/html/TCGAbiolinks.html) from [Bioconductor](http://bioconductor.org/)
* GNU makeOptionally, the following R packages for post-processing:
* [edgeR](http://bioconductor.org/packages/release/bioc/html/edgeR.html) - for `log2 cpm transformation of RNA-seq reads`
* [DESeq2](http://bioconductor.org/packages/release/bioc/html/DESeq2.html) - for variance stabilizing transformation of RNA-seq readsDownloading the data
--------------------The are three options to download and save TCGA data:
```r
# Download everything
make # add the -j flag to run n data sets in parallel# Selection by cohort
# - see projects.txt for valid cohorts
make # eg. 'TCGA-LUAD' for lung adenocarcinoma# Selection by data type
# - valid types are: snv_mutect2, rna_seq_raw, cnv_segments, mirna_seq, clinical
make # eg. 'clinical' for downloading clinical data
```Data will be stored as `RData` files (containing a `data.frame` or
`SummarizedExperiment` object) for each cohort in the respective data type
directories.Additional documentation
------------------------The data processing steps underlying the data being downloaded is fully
documented on the [GDC webpage](https://docs.gdc.cancer.gov/Data/Introduction/).