Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/baderlab/pnc_pathwayanalysis
Pathway analysis pipeline for PNC data (GWAS + GSEA)
https://github.com/baderlab/pnc_pathwayanalysis
Last synced: about 5 hours ago
JSON representation
Pathway analysis pipeline for PNC data (GWAS + GSEA)
- Host: GitHub
- URL: https://github.com/baderlab/pnc_pathwayanalysis
- Owner: BaderLab
- Created: 2018-11-15T02:21:43.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2018-11-15T02:28:28.000Z (about 6 years ago)
- Last Synced: 2024-11-05T10:26:06.780Z (about 2 months ago)
- Language: Python
- Size: 36 MB
- Stars: 1
- Watchers: 13
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
* Instructions on how to run the Pathway Analysis pipeline for PNC data
* Nov 9, 2018
* By Shirley HuiA) PERMUTE PHENOTYPES
1) Permute phenotypes
File: ROOT_DIR/permutePhenos.R
- This file permutes phenotype status in the phenoFile (see code).
- Subjects with status -9 are not included in the permutationRun:
> Rscript perumtePhenos.R* Note: Set variables inside script before running
Output:
* Each permutation is written out to a file with the same format as phenoFile
* Note: pemutations are written to files to ensure reproducibilityB) PERFORM GWAS (genome-wide associations)
1) Make gwas batch and submit scripts
File: ROOT_DIR/make_plink_assoc_bin_jobfiles.sh
- This script make a txt file that contains the call to run gwas (logistic regression + PC covariates) to compute snp-phenotype associations using plink.Run:
> ./make_plink_assoc_bin_jobfiles.sh 0Example: ./make_plink_assoc_bin_jobfiles.sh volt_svt 0 100 1-5
Output:
* a file called ROOT_DIR/jobfiles/plink_assoc__0--covar-.txt that contains the call to run one permutation of gwas (one call per line)
* a file called ROOT_DIR/run_gwas_bin__0--covar-.sh that calls the txt file above via parallel2) Compute gene-phenotype associations
- Run ROOT_DIR/run_gwas_bin__0--covar-.sh script from previous
- The script runs GWAS (runGWAS.R) 0 to times for pheno of interest using logistic regression and 0-n PC covariates supplied by the userOutput:
* pvalues for each SNP outputted to a file for each permuation (0 to )C) PERFORM PATHWAY GSEA
The following assumes the pipeline is running on a high
1) Make gsea batch and submit scripts
File: ROOT_DIR/code/src/MakeJobs.java
- This script will make a bunch of .sh files that contain the 8 parallel jobs that will run when it is called.
- If num perm = 100, 13 .sh files will be made. If num perm= 1000, 125 .sh files will be made. Note code doesn't really support num perms not equal to 100 or 1000.
- If you make any code changes, to compile script, run the ./compile.sh at the promptRun:
> module load java
> java -cp ../out MakeJobs binExample: java -cp ../out MakeJobs volt_svt 100 bin 1-5
Output:
* a bunch of job files located in ROOT_DIR/job_scripts/gsea/run-gsea---bin-covars.sh
* a master submit script located in ROOT_DIR/submit-gsea--bin-covars.sh2) Run gsea
- Run submit script from previous
- You will get a bunch of warnings (which is ok) about how the submitted job time is the minimum of 15 mins.
- This will create several directories (one for each permutation). In each directory there will be two output files (*gsea.txt - gsea output, *genes.txt - detailed output)3) Move gsea output files into one directory
File: ROOT_DIR/moveFiles.sh
- This will move all output files into ROOT_DIR/gsea/bin/gsea-_covars where:
* gsea.txt files get moved to obs directory,for perm0 or perm directory, for all other permutations
* genes.txt files get moved to genes directory for all permsRun:
> moveFiles.sh binExample: moveFiles.sh volt_svt bin 1-5 100
- Delete empty directories and run txt and log files:
> rm -r Gsea-_bin-covars_*
> rm run_gsea_--bin-covars*4) Normalize ES scores
File: ROOT_DIR/mygseaCalcNES.py
- This script will normalize the enrichment scores from step 2 above.Run:
> module load python/2.7.14-anaconda5.1.0
> python mygseaCalcNES.py bin
Example: python mygseaCalcNES.py volt_svt bin 1-5 100Output:
- This will create a bunch of nes*.txt in the ROOT_DIR/nes/ directory5) Compute FDR score
File: ROOT_DIR/compareFDR.R
- This script will compute FDR scores using NES from step 4)Run:
> module load r/3.4.3-anaconda5.1.0
> Rscript computeFDR.R bin
Example: Rscript computeFDR.R volt_svt bin 1-5 8Output: - This will create a file fdr--bin-covars-.txt in the ROOT_DIR/fdr directory