Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cmdcolin/macs-accessories
visualizations for macs output
https://github.com/cmdcolin/macs-accessories
Last synced: 4 days ago
JSON representation
visualizations for macs output
- Host: GitHub
- URL: https://github.com/cmdcolin/macs-accessories
- Owner: cmdcolin
- Created: 2012-05-18T05:17:35.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2013-07-23T16:29:11.000Z (over 11 years ago)
- Last Synced: 2024-04-15T02:14:35.856Z (7 months ago)
- Language: JavaScript
- Size: 30.5 MB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
README
macs-accessories
================The package Model-based analysis for ChIP-seq (MACS) is very useful software for peak finding on ChIP-seq data. The outputted files include peaks in BED and XLS formats and the 'tag pileup' in wiggle file format. We use macs-accessories for additional data analysis from the files including visualizations and calculations of normalized difference (NormDiff) scores from Zheng et al. (2010)
```{r setup, dev='png',cache=FALSE,echo=FALSE}
debug=FALSE
opts_knit$set(upload.fun = imgur_upload) # upload all images to imgur.com
source('readplot.R')
source('knitr.R')
```We created a 'wiggle class', an S3 R object, for loading MACS output files and doing additional data analysis. The class automatically loads peak files (bed,xls,csv) and and wiggle files (macs output format)
```{r setup2, cache=TRUE}
wig1=WiggleClass('S96')
wig2=WiggleClass('HS959')
wig1$loadWiggles()
wig2$loadWiggles()
```The normalized difference score gives us on average the expected value of the ChIP-seq subtracted from the input data using a simple random model. For $A$, $B$ representing chip-seq and input control data respectively
$A\sim Poisson(f+g)$
$B\sim Poisson(cg)$Then the NormDiff score $Z$ is defined as
$$Z(x_i)=\frac{A(x_i)-B(x_i)/c}{\hat\sigma}$$
We use the data to estimate scaling factor $c$ and variance $\hat\sigma$
We can look at the average normalized difference scores of the peaks, and see how this compares with the same location in other experiments, simply by comparing wig1 with wig2 Z scores for the peak.
```{r d2,dev='png',cache=TRUE}
wig1$estimateScalingFactor()
wig1$estimateVarianceAll()
wig2$estimateScalingFactor()
wig2$estimateVarianceAll()
wz1=wig1$Z(wig1$peaks)
wz2=wig2$Z(wig1$peaks)
wz4=wig2$Z(wig2$peaks)
wz3=wig1$Z(wig2$peaks)
r1=plotMaxAvgZscore('Max Avg S96 peak NormDiff score vs HS959 synteny w=100', wig1, wig2, wz1, wz2,'#bb0000', '#001199')
r2=plotMaxAvgZscore('Max Avg HS959 peak NormDiff score vs S96 synteny w=100', wig2, wig1, wz4, wz3, '#bb0000','#009900')
```We can calculate NormDiff scores for the whole genome using
```{r zallcalc, cache=TRUE}
wza1=wig1$Zall()
wza2=wig2$Zall()
```We can observe the distribution of NormDiff scores
```{r zall, dev='png',cache=TRUE}
datasort=as.numeric(wza1[[1]][,4])
d <- density(datasort,adjust=1.4) # returns the density data
plot(d, main='Kernel density of NormDiff scores') # plots the results
polygon(d, col="#BB2222CC", border="#222244")
clone=datasort
qqnorm(clone)
qqline(clone,col=2)
```By taking the pvalues of all Zscores we can see which ones are significant according to a p-value $P(\bar x \leq X)$
```{r color, dev='png', cache=TRUE}
plotZscoreColor('HS959 pvalue for S96 peaks', wig1,wig2,wz1,wz2)
```We want to use hypothesis testing to observe NormDiff scores that are highly different from the background. We used a likelihood ratio test sgnificance level of 1%. When comparing ChIP-seq experiments from different strains of yeast.
```{r cutoff, dev='png',cache=TRUE}
ret=plotZscoreCutoff('S96 peaks vs HS959 synteny (Hypothesis testing)',wig1,wig2,wz1,wz2,0.05)
```