Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/baderlab/pnc_pathwayanalysis

Pathway analysis pipeline for PNC data (GWAS + GSEA)
https://github.com/baderlab/pnc_pathwayanalysis

Last synced: about 5 hours ago
JSON representation

Pathway analysis pipeline for PNC data (GWAS + GSEA)

Awesome Lists containing this project

README

        

* Instructions on how to run the Pathway Analysis pipeline for PNC data
* Nov 9, 2018
* By Shirley Hui

A) PERMUTE PHENOTYPES

1) Permute phenotypes

File: ROOT_DIR/permutePhenos.R
- This file permutes phenotype status in the phenoFile (see code).
- Subjects with status -9 are not included in the permutation

Run:
> Rscript perumtePhenos.R

* Note: Set variables inside script before running

Output:
* Each permutation is written out to a file with the same format as phenoFile
* Note: pemutations are written to files to ensure reproducibility

B) PERFORM GWAS (genome-wide associations)

1) Make gwas batch and submit scripts

File: ROOT_DIR/make_plink_assoc_bin_jobfiles.sh
- This script make a txt file that contains the call to run gwas (logistic regression + PC covariates) to compute snp-phenotype associations using plink.

Run:
> ./make_plink_assoc_bin_jobfiles.sh 0

Example: ./make_plink_assoc_bin_jobfiles.sh volt_svt 0 100 1-5

Output:
* a file called ROOT_DIR/jobfiles/plink_assoc__0--covar-.txt that contains the call to run one permutation of gwas (one call per line)
* a file called ROOT_DIR/run_gwas_bin__0--covar-.sh that calls the txt file above via parallel

2) Compute gene-phenotype associations
- Run ROOT_DIR/run_gwas_bin__0--covar-.sh script from previous
- The script runs GWAS (runGWAS.R) 0 to times for pheno of interest using logistic regression and 0-n PC covariates supplied by the user

Output:
* pvalues for each SNP outputted to a file for each permuation (0 to )

C) PERFORM PATHWAY GSEA

The following assumes the pipeline is running on a high

1) Make gsea batch and submit scripts

File: ROOT_DIR/code/src/MakeJobs.java
- This script will make a bunch of .sh files that contain the 8 parallel jobs that will run when it is called.
- If num perm = 100, 13 .sh files will be made. If num perm= 1000, 125 .sh files will be made. Note code doesn't really support num perms not equal to 100 or 1000.
- If you make any code changes, to compile script, run the ./compile.sh at the prompt

Run:
> module load java
> java -cp ../out MakeJobs bin

Example: java -cp ../out MakeJobs volt_svt 100 bin 1-5

Output:
* a bunch of job files located in ROOT_DIR/job_scripts/gsea/run-gsea---bin-covars.sh
* a master submit script located in ROOT_DIR/submit-gsea--bin-covars.sh

2) Run gsea
- Run submit script from previous
- You will get a bunch of warnings (which is ok) about how the submitted job time is the minimum of 15 mins.
- This will create several directories (one for each permutation). In each directory there will be two output files (*gsea.txt - gsea output, *genes.txt - detailed output)

3) Move gsea output files into one directory

File: ROOT_DIR/moveFiles.sh
- This will move all output files into ROOT_DIR/gsea/bin/gsea-_covars where:
* gsea.txt files get moved to obs directory,for perm0 or perm directory, for all other permutations
* genes.txt files get moved to genes directory for all perms

Run:
> moveFiles.sh bin

Example: moveFiles.sh volt_svt bin 1-5 100

- Delete empty directories and run txt and log files:
> rm -r Gsea-_bin-covars_*
> rm run_gsea_--bin-covars*

4) Normalize ES scores

File: ROOT_DIR/mygseaCalcNES.py
- This script will normalize the enrichment scores from step 2 above.

Run:
> module load python/2.7.14-anaconda5.1.0
> python mygseaCalcNES.py bin

Example: python mygseaCalcNES.py volt_svt bin 1-5 100

Output:
- This will create a bunch of nes*.txt in the ROOT_DIR/nes/ directory

5) Compute FDR score

File: ROOT_DIR/compareFDR.R
- This script will compute FDR scores using NES from step 4)

Run:
> module load r/3.4.3-anaconda5.1.0
> Rscript computeFDR.R bin

Example: Rscript computeFDR.R volt_svt bin 1-5 8

Output: - This will create a file fdr--bin-covars-.txt in the ROOT_DIR/fdr directory