Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Zhang-lab/ATAC-seq_QC_analysis

Atac-seq QC matrix
https://github.com/Zhang-lab/ATAC-seq_QC_analysis

Last synced: about 1 month ago
JSON representation

Atac-seq QC matrix

Awesome Lists containing this project

README

        

# Atac-seq Integrative Analysis Pipeline
Pipeline for the QC metrics construction, data analysis and visualization of ATAC-seq data.
Current version: `AIAP_v1.1`
Last update: `2019.12.9`

Advisor: Bo Zhang
Contributor: Cheng Lyu and Shaopeng Liu

For any question, please contact [email protected]



## Documentation:
1. Pipeline documentation: analysis details and QC metrics information
Please **[ click here ](https://github.com/Zhang-lab/ATAC-seq_QC_analysis/blob/master/documents/Documentation.md)**
2. Potential transcription factor binding region prediction algorithm:
Please **[ click here ](https://github.com/Zhang-lab/ATAC-seq_QC_analysis/blob/master/documents/ifr_documentation.md)**
3. Update logfile: pipeline change record
Please **[ click here](https://github.com/Zhang-lab/ATAC-seq_QC_analysis/blob/master/documents/update_log.md)**



## Usage:
### Test data:
There are one paired-end mm10 data with 0.25M reads for test purpose, they can be downloaded by:
```
wget https://regmedsrv1.wustl.edu/Public_SPACE/resources/pipeline/atac-seq/test_mm10_data/mm10_1.fastq.gz
wget https://regmedsrv1.wustl.edu/Public_SPACE/resources/pipeline/atac-seq/test_mm10_data/mm10_2.fastq.gz
```


### General IAP version:
Step1. download singularity images and reference files (you only need download them **ONCE**, then you can use them directly), if there is any update, you may need to download a new image, but reference files are usually **NOT** changed:
####
1. Download the singularity image:
```
wget https://regmedsrv1.wustl.edu/Public_SPACE/resources/pipeline/atac-seq/ATAC_IAP_v1.1.simg
```
If you want to use previous version, please find them by **[ click here ](https://regmedsrv1.wustl.edu/Public_SPACE/resources/pipeline/atac-seq/)**

2. Download the reference files of different genomes:
```
wget https://regmedsrv1.wustl.edu/Public_SPACE/resources/pipeline/atac-seq/ref_file/atac_mm10_ref.tar.gz
```
You can also find more genome builds: **[ click here ](https://regmedsrv1.wustl.edu/Public_SPACE/resources/pipeline/atac-seq/ref_file/)**. Currently we have: mm9/10, hg19/38, danRer10/11, rn6 and dm6.

3. Decompress the reference files and put to your own folder:
```
tar -xzf atac_mm10_ref.tar.gz
```

Step2. process data by the singularity image:
#### Please run the cmd on the same directory of your data, if your data is on /home/example, then you may need `cd /home/example` first. The location of image and reference files is up to you.
```bash
singularity run -B ./:/process -B :/atac_seq/Resource/Genome -r -g -o -p
```
It may looks a little confusing at first time, but when you get familier with Singularity they will be friendly :)
For example, if
a) you download the image on /home/image/ATAC_IAP_v1.1.simg
b) the reference file on /home/src/mm10
c) and your data is read1.fastq.gz and read2.fastq.gz on folder /home/data

Then you need to:
1. `cd /home/data`
2. `singularity run -B ./:/process -B /home/src:/atac_seq/Resource/Genome /home/image/ATAC_IAP_v1.1.simg -r PE -g mm10 -o read1.fastq.gz -p read2.fastq.gz`

### TaRGET II version:
1. **[ click here ](https://regmedsrv1.wustl.edu/Public_SPACE/resources/pipeline/TaRGET/)** to find the TaRGET image and download to your server
2. then run the code below on the same directory with your data:
`singularity run -B ./:/process -r -g -o -p `

Soft link of file is supported, but you need to use **full path** of the file and mount the original location, for example:
```
ln -s `pwd`/myfile* /scratch/test
cd /scratch/test
singularity run -B ./:/process -B /scratch/test:/scratch/test -r -g -o -p
```

**explaination**:
The cmd is in this manner: `singularity run `

**soft link introduction**:
If you want to use soft link, which is much more friendly when you have a lot of data, of the data. You will only need to add one bind option for singularity, which is `-B :`
For example, I want to soft link my data from /scratch to run on my own folder /home/example:
1. ln -s /scrach/mydata.fastq.gz /home/example; **Please make sure you use the absolute path**
2. cd /home/example
3. `singularity run -B ./:/process -B /home/src:/atac_seq/Resource/Genome -B /scratch:/scratch /home/image/ATAC_IAP_v1.00.simg -r PE -g mm10 -o read1.fastq.gz -p read2.fastq.gz`

#parameters:
`-h`: help information
`-r`: SE for single-end, PE for paired-end
`-g`: genome reference, one simg is designed for ONLY one species due to the file size. For now the supported genoms are:
`-o`: reads file 1 or the SE reads file, must be ended by .fastq or .fastq.gz or .sra (for both SE and PE)
`-p`: reads file 2 if input PE data, must be ended by .fastq or .fastq.gz
`-c`: (optional) specify read length minimum cutoff for methylQA filtering, default 38
`-t`: (optional) specify number of threads to use, default 24
`-i`: (optional) insertion free region finding parameters used by Wellington Algorithm (Jason Piper etc. 2013), see documentation for more details.
      If you don NOT want to run IFR finding step, please just ignore the -i option; however IFR finding will use default parameters only if -i specified as 0:
      min_lfp=5
      max_lfp=15
      step_lfp=2
      min_lsh=50
      max_lsh=200
      step_lsh=20
      method=BH
      p_cutoff=0.05
      If you want to specify your own parameter, please make sure they are in the same order and seperated by comma
      Example: -i 5,15,2,50,200,20,BH,0.05
      You can check the pipe log file for the parameters used by IFR code