Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cmdcolin/macs-accessories

visualizations for macs output
https://github.com/cmdcolin/macs-accessories

Last synced: 4 days ago
JSON representation

visualizations for macs output

Awesome Lists containing this project

README

        

macs-accessories
================

The package Model-based analysis for ChIP-seq (MACS) is very useful software for peak finding on ChIP-seq data. The outputted files include peaks in BED and XLS formats and the 'tag pileup' in wiggle file format. We use macs-accessories for additional data analysis from the files including visualizations and calculations of normalized difference (NormDiff) scores from Zheng et al. (2010)

```{r setup, dev='png',cache=FALSE,echo=FALSE}
debug=FALSE
opts_knit$set(upload.fun = imgur_upload) # upload all images to imgur.com
source('readplot.R')
source('knitr.R')
```

We created a 'wiggle class', an S3 R object, for loading MACS output files and doing additional data analysis. The class automatically loads peak files (bed,xls,csv) and and wiggle files (macs output format)

```{r setup2, cache=TRUE}
wig1=WiggleClass('S96')
wig2=WiggleClass('HS959')
wig1$loadWiggles()
wig2$loadWiggles()
```

The normalized difference score gives us on average the expected value of the ChIP-seq subtracted from the input data using a simple random model. For $A$, $B$ representing chip-seq and input control data respectively

$A\sim Poisson(f+g)$
$B\sim Poisson(cg)$

Then the NormDiff score $Z$ is defined as

$$Z(x_i)=\frac{A(x_i)-B(x_i)/c}{\hat\sigma}$$

We use the data to estimate scaling factor $c$ and variance $\hat\sigma$

We can look at the average normalized difference scores of the peaks, and see how this compares with the same location in other experiments, simply by comparing wig1 with wig2 Z scores for the peak.

```{r d2,dev='png',cache=TRUE}
wig1$estimateScalingFactor()
wig1$estimateVarianceAll()
wig2$estimateScalingFactor()
wig2$estimateVarianceAll()
wz1=wig1$Z(wig1$peaks)
wz2=wig2$Z(wig1$peaks)
wz4=wig2$Z(wig2$peaks)
wz3=wig1$Z(wig2$peaks)
r1=plotMaxAvgZscore('Max Avg S96 peak NormDiff score vs HS959 synteny w=100', wig1, wig2, wz1, wz2,'#bb0000', '#001199')
r2=plotMaxAvgZscore('Max Avg HS959 peak NormDiff score vs S96 synteny w=100', wig2, wig1, wz4, wz3, '#bb0000','#009900')
```

We can calculate NormDiff scores for the whole genome using
```{r zallcalc, cache=TRUE}
wza1=wig1$Zall()
wza2=wig2$Zall()
```

We can observe the distribution of NormDiff scores

```{r zall, dev='png',cache=TRUE}
datasort=as.numeric(wza1[[1]][,4])
d <- density(datasort,adjust=1.4) # returns the density data
plot(d, main='Kernel density of NormDiff scores') # plots the results
polygon(d, col="#BB2222CC", border="#222244")
clone=datasort
qqnorm(clone)
qqline(clone,col=2)
```

By taking the pvalues of all Zscores we can see which ones are significant according to a p-value $P(\bar x \leq X)$
```{r color, dev='png', cache=TRUE}
plotZscoreColor('HS959 pvalue for S96 peaks', wig1,wig2,wz1,wz2)
```

We want to use hypothesis testing to observe NormDiff scores that are highly different from the background. We used a likelihood ratio test sgnificance level of 1%. When comparing ChIP-seq experiments from different strains of yeast.

```{r cutoff, dev='png',cache=TRUE}
ret=plotZscoreCutoff('S96 peaks vs HS959 synteny (Hypothesis testing)',wig1,wig2,wz1,wz2,0.05)
```