https://github.com/poisonalien/chiptk
optimized protocol for processing 50-bp SE ChIP-seq
https://github.com/poisonalien/chiptk
chip-seq pipeline-runner super-enhancers
Last synced: 8 months ago
JSON representation
optimized protocol for processing 50-bp SE ChIP-seq
- Host: GitHub
- URL: https://github.com/poisonalien/chiptk
- Owner: PoisonAlien
- License: mit
- Created: 2017-09-27T02:38:09.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-09-27T03:16:14.000Z (over 8 years ago)
- Last Synced: 2025-04-01T00:41:20.383Z (about 1 year ago)
- Topics: chip-seq, pipeline-runner, super-enhancers
- Language: Shell
- Size: 6.84 KB
- Stars: 5
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# chiptk
optimized protocols for processing 50-bp SE ChIP-seq
## Introduction
chiptk is a set of optimized protocols for ChIP-seq read-alignment, peak calling with MACS2 and fast Super Enhancer identification via bwtool.
### align
Uses `bwa` aligner for alignment. Few parameters are hardcoded which are found to work best for 50 bp SE reads. Also does removes (not marks) duplicates via picard.
```
---------------------------------------------------------------------------------------------------------------------------------------------------
usage: chiptk align [options]
wrapper around bowtie and picard MarkDcuplicate. Bowtie alignment parameters are optimized for 50 bp single end reads.
bowtie -> picard
positional arguments:
picard path to picard jar file
output_fn Basename for output file. Ususally sample name.
bowtie_idx Bowtie index file for reference genome. Required.
fq Fastq file (gz compressed). Required.
optional arguments:
-D Output directory to store results. Optional. Default ./bams
-t threads to use. Default 4.
-k report up to good alignments per read (default: 2)
-n max mismatches in seed (can be 0-3, default: -n 2)
-m suppress all alignments if > exist (default: 2)
Example: align picard.jar foo hg19 foo.fq.gz
---------------------------------------------------------------------------------------------------------------------------------------------------
```
### macspeaks
wrapper around macs2 callpeak. Also converts bedGraphs to bigWig following input signal subtraction. Uses hard-coded value of 200bp as the fragment size for read extension.
```
---------------------------------------------------------------------------------------------------------------------------------------------------
usage: chiptk macspeaks [options]
positional arguments:
chromSizes path to chromosome sizes. Can be obtained using UCSC fetchChromSizes.sh script.
chip.bam ChIP bam - Required.
input.bam Input bam - Required.
optional arguments:
-D Output directory to store results. Optional. Default ./macs_op
-o Basename for output file. Ususally sample name. Default parses from chip.bam
-f Format of Input file, AUTO, BED or ELAND or ELANDMULTI or ELANDEXPORT or SAM or BAM or BOWTIE or BAMPE or BEDPE
-g Effective genome size. Default hs. (can be mm, ce, dm)
-q Minimum FDR (q-value) cutoff for peak detection. Deafult 0.01
-b call broad peaks. Default false.
Example: macspeaks hg19.chrom.sizes KOCebpeInput.bam KOCebpe.bam
---------------------------------------------------------------------------------------------------------------------------------------------------
```
### SE
Identify SuperEnhancers using BigWig files instead of BAM files. (Usaually from H3K27Ac or H3K4Me1 pulldown)
ROSE which uses BAM files for signal extraction is emabarrisingly slow. Using bw files with bwtools can achieve this within minutes.
```
---------------------------------------------------------------------------------------------------------------------------------------------------
Usage: chiptk SE [options]
positional arguments:
rose path to 'ROSE_callSuper.R' Rscript. This comes as a part of ROSE software
peaks Input enhancer peaks.
bwc BigWig sample for Control. Input
bwt BigWig sample for Treatment. ChIP
optional arguments:
-m Distance to merge closely spaced peaks in bps. Default 12000.
-D Output directory to store results. Optional. Default ./SE
-o Basename for output file. Ususally sample name. Default parses from bwt
Example: SE ROSE_callSuper.R H3K27Ac_peaks.narrowPeak H3K27Ac_control.bw H3K27Ac_treat.bw
---------------------------------------------------------------------------------------------------------------------------------------------------
```
## Summarize homer annotations
`homerAnnoStats.R` a tiny R script which summarizes peak annotations generated with homer `annotatePeaks.pl`, also a generates a pie chart of peak distributions.
```r
Rscript homerAnnoStats.R
```