https://github.com/immunogenomics/goshifter
GoShifter
https://github.com/immunogenomics/goshifter
Last synced: 10 months ago
JSON representation
GoShifter
- Host: GitHub
- URL: https://github.com/immunogenomics/goshifter
- Owner: immunogenomics
- License: other
- Created: 2016-02-17T20:46:59.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2018-01-04T19:57:55.000Z (over 8 years ago)
- Last Synced: 2025-06-05T21:09:51.606Z (about 1 year ago)
- Language: Python
- Homepage: http://www.broadinstitute.org/mpg/goshifter/
- Size: 3.14 MB
- Stars: 11
- Watchers: 7
- Forks: 7
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# GoShifter
GoShifter is a method to determine enrichment of annotations in GWAS significant loci. The manual can be found in the GoShifter_manual.pdf file. These files were taken from http://www.broadinstitute.org/mpg/goshifter/, and may have small updates and bugfixes.
**update June 26th, 2017:** You can input LD files precalculated with ProxyFinder (see below).
# Manual
## Requirements
GoShifter is written for **Python 2.7**. It uses the following modules:
* bisect
* subprocess
* chromtree (provided)
* bx-python
* numpy
Numpy and bx-python can be installed using pip.
## Usage
GoShifter is divided into two scripts:
1. **goshifter.py**, which tests for enrichment of a provided set of SNPs with one genomic annotation of interest
2. **goshifter.strat.py**, which tests for the significance of an overlap of a provided set of SNPs with annotation A stratifying on secondary, possibly colocalizing annotation B.
## **goshifter.py**
### Input files
**snpmap file** - with mappings for the tested set of SNPs, tab delimited, with columns SNP, Chrom, BP. Chromosome in the format ‘chrN’. Must include header. Example: ```cat GoShifter/test_data/bc.snpmappings.hg19.txt```
SNP Chrom BP
rs10069690 chr5 1279790
rs1045485 chr2 202149589
rs3757318 chr6 151914113
rs10941679 chr5 44706498
rs13281615 chr8 128355618
rs2823093 chr21 16520832
rs17530068 chr6 82193109
rs2380205 chr10 5886734
rs3803662 chr16 52586341
**annotation** – mappings of the annotation which will be tested for enrichment with the SNP set. Must be in BED format (gzipped), includes Chrom, Start, End columns (no header required). Chromosome in the format ‘chrN’. Example: ```zcat GoShifter/test_data/UCSF-UBC.Breast_vHMEC.bed.gz```
chrY 128031 128231
chrY 142761 142961
chrY 231491 231691
chrY 231983 232183
chrY 233430 233630
chrY 285237 285437
chrY 296657 296857
chrY 1318260 1318460
chrY 1459641 1459841
**ld files**
When testing for enrichment GoShifter uses LD information for the set of provided SNPs. There are now two options to input LD information:
* precomputed LD files provided by us
* LD files calculated using ProxyFinder
**Precomputed LD files provided by us**
To obtain this information please download precomputed pairwise SNP LD information from (these files are large!): [Location of LD files](https://www.broadinstitute.org/~slowikow/tgp/pairwise_ld/). These path to these files can be specified using the ```--ld``` command line switch (see below). In these files, each pair of variants should be on a single line, values should be tab-separated, and the files should be indexed with [Tabix](http://www.htslib.org/doc/tabix.html). Also, this program requires Tabix to be in the $PATH. These LD files are headerless.
Format of LD files (chrA\tposA\trsIdA\tposB\trsIdB\tRsquared\tDPrime), e.g.:
chr7 143099133 rs10808026 143103481 rs56402156 0.970556 0.992515
**ProxyFinder LD files**
Another option to specify LD is to generate an LD proxy file using a program such as [ProxyFinder](https://github.com/immunogenomics/harmjan/releases). The path to this file can be specified using the ```--proxies``` command line switch (see below). The output of the ProxyFinder program is slightly different than the above LD files, but has as a benefit that it only includes LD information for the SNPs that are of your interest (so that GoShifter doesn't have to do the same tabix queries over and over again). Also, this allows you to generate LD files for other populations easily. To generate these files, please refer to the [ProxyFinder readme.md](https://github.com/immunogenomics/harmjan/tree/master/ProxyFinder).
Format of proxy LD files (chrA\tposA\trsIdA\tchrB\tposB\trsIdB\tDistance\tRsquared\tDPrime), e.g.:
ChromA PosA RsIdA ChromB PosB RsIdB Distance RSquared Dprime
Chr2 202149589 rs1045485 Chr2 202149589 rs1045485 0 0.9999999999999981 0.999999999999999
Chr2 202149589 rs1045485 Chr2 202150914 rs35550815 1325 0.9999999999999981 0.999999999999999
### Command line
Goshifter can be run using a single command line:
```./goshifter.py --snpmap FILE --annotation FILE --permute INT --ld DIR --out FILE [--rsquared NUM --window NUM --min-shift NUM --max-shift NUM --ld-extend NUM --nold]```
**Options:**
Usage:
./goshifter.py --snpmap FILE --annotation FILE --permute INT [--ld DIR --proxies FILE] --out FILE [--rsquared NUM --window NUM --min-shift NUM --max-shift NUM --ld-extend NUM --no-ld]
Options:
-h, --help Print this message and exit.
-v, --version Print the version and exit.
-s, --snpmap FILE File with SNP mappings, tab delimited, must include header: SNP, CHR, BP. Chromosomes in format chrN.
-a, --annotation FILE File with annotations, bed format. No header.
-p, --permute INT Number of permutations.
-i, --proxies FILE File with proxy information for snps in --snpmap; takes precedence over --ld
-l, --ld DIR Directory with LD files
-r, --rsquared NUM Include LD SNPs at rsquared >= NUM [default: 0.8]
-w, --window NUM Window size to find LD SNPs [default: 5e5]
-n, --min-shift NUM Minimum shift [default: False]
-x, --max-shift NUM Maximum shift [default: False]
-e, --ld-extend NUM Fixed value by which to extend LD boundries [default: False]
-n, --no-ld Do not include SNPs in LD [default: False]
-o, --out FILE Write output file.
### Example usage:
```./goshifter.py --snpmap test_data/bc.snpmappings.hg19.txt --annotation test_data /UCSF-UBC.Breast_vHMEC.bed.gz --permute 1000 --ld 1kG-beaglerelease3/pairwise_ld/ --out test_data/bc.H3K4me1_vHMEC3```
If you make use of LD files precalculated by ProxyFinder, replace ```--ld 1kG-beaglerelease3/pairwise_ld/``` with ```--proxies /path/to/proxyfinderoutput.txt```
GoShifter will output a message on the screen (and print the P-value corresponding to the significance of an overlap). The following output files will be created:
***.enrich** – output file with observed and permuted overlap values
nperm nSnpOverlap allSnps enrichment
0 29 68 0.42647
1 17 68 0.25
2 26 68 0.38235
3 25 68 0.36765
4 23 68 0.33824
5 17 68 0.25
6 21 68 0.30882
7 25 68 0.36765
8 17 68 0.25
nperm = 0 is the observed overlap
nSnpOverlap – number of loci where at least one SNP overlaps an annotation
allSnps – total number of tested loci
enrichment – nSnpOverlap/allSnps
Note, P-value is the number of times the “enrichment” is greater or equal to the observed overlap divided by total number of permutations.
***.locusscore** – the likelihood of a locus to overlap an annotation under the null. The smaller the value the more likely a locus overlaps an annotation not by chance. Loci not overlapping any annotation are denoted as “N/A”.
locus overlap score
rs11780156 1 0.609
rs4808801 1 0.996
rs10069690 N/A N/A
rs12493607 1 0.888
rs1045485 N/A N/A
rs204247 1 0.678
rs3757318 N/A N/A
rs2943559 N/A N/A
rs11552449 N/A N/A
***.snpscore** – defines which LD SNPs are overlapping an annotation in the observed data
locus ld_snp overlap
rs11780156 rs11780156 1
rs11780156 rs1016578 0
rs11780156 rs12542202 0
rs11780156 rs11776569 0
rs11780156 rs7836152 0
rs11780156 rs10956414 0
rs11780156 rs67397162 0
rs11780156 rs11778142 1
rs11780156 rs11997192 0
## **goshifter.strat.py**
### Input files:
snpmap – see above
annotation-a – mappings of the primary annotation which will be tested for enrichment with the SNP set. See above for format details for the annotation input file.
annotation-b – mappings of the secondary annotation. Assessment of enrichment for annotation-a will be tested stratifying on this annotation-b. See above for format details for the annotation input file.
### Usage:
```./goshifter.strat.py --snpmap FILE --annotation-a FILE --annotation-b FILE --permute INT --ld DIR --out FILE [--rsquared NUM --window NUM --min-shift NUM --max-shift NUM --ld-extend NUM --no-ld]```
If you make use of LD files precalculated by ProxyFinder, replace ```--ld 1kG-beaglerelease3/pairwise_ld/``` with ```--proxies /path/to/proxyfinderoutput.txt```
### Options
Same options as goshifter.py, with the exception:
-a, --annotation-a FILE File with primary annotations, bed format. Gzipped. No header.
-b, --annotation-b FILE File with annotation to stratify on, bed format. Gzipped. No header.
## Example usage
```./goshifter.strat.py --snpmap test_data/bc.snpmappings.hg19.txt --annotation-a test_data/UCSFUBC. Breast_vHMEC.bed.gz --annotation-b test_data/UCSFUBC.Breast_Myoepithelial_Cells.bed.gz --permute 1000 --ld 1kG-beaglerelease3/pairwise_ld/ --out test_data/bc.H3K4me1_vHMEC_strat_Myoepithelial_Cells```
This will output a message on the screen (and print the P-value corresponding to the significance of an overlap) and write results to *.enrich (see above for explanation of the format).