https://github.com/phydev/tcga-supertreat
Download gene expression data from TCGA for a subset of patients selected in the SuPerTreat project.
https://github.com/phydev/tcga-supertreat
Last synced: about 1 year ago
JSON representation
Download gene expression data from TCGA for a subset of patients selected in the SuPerTreat project.
- Host: GitHub
- URL: https://github.com/phydev/tcga-supertreat
- Owner: phydev
- License: gpl-3.0
- Created: 2023-02-06T23:06:35.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-02-08T10:28:43.000Z (over 3 years ago)
- Last Synced: 2025-01-22T10:11:29.426Z (over 1 year ago)
- Language: Python
- Size: 195 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# TCGA data scraping
Download gene expression data from TCGA for a subset of patients selected in the SuPerTreat project.
## How it works?
1 - Access [GDC data portal Repository](https://portal.gdc.cancer.gov/repository), filter the data of interest and download the list of entries by clicking on the button `JSON` (see figure bellow). This file will be your `manifest.json`.

2 - Provide a list of case ids such as `data/supertreat_cases.csv`.
3 - Run `src/main.py`.
The script will link files and case ids and then filter the entries in the manifest using the provided case ids. Then it downloads all selected files from GDC and stores in `data/`.