Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lulab/Ribowave
Ribo-seq data analysis tool
https://github.com/lulab/Ribowave
Last synced: 3 months ago
JSON representation
Ribo-seq data analysis tool
- Host: GitHub
- URL: https://github.com/lulab/Ribowave
- Owner: lulab
- License: gpl-3.0
- Created: 2017-04-13T06:34:05.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-09-10T12:45:16.000Z (about 5 years ago)
- Last Synced: 2024-06-30T15:44:43.702Z (4 months ago)
- Language: Shell
- Homepage: https://lulab.github.io/Ribowave
- Size: 146 KB
- Stars: 9
- Watchers: 3
- Forks: 8
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-riboseq - Code
README
# RiboWave
RiboWave is a funtional Ribo-seq analysis tool to identify translated ORF based on Ribo-seq data.
The RiboWave workflow consists of:
- Pre-processing :
- Create the annotation file for the subsequent analysis. [`create_annotation.sh`]
- Determine the P-site position of Ribo-seq. [`P-site_determination.sh`]
- Generate P-site tracks from Ribo-seq.[`create_track_Ribo.sh`]- Main function :
- Denoise [`Ribowave`]
- Predict translated ORFs [`Ribowave`]
- Estimate reads density for each given ORF [`Ribowave`]
- Estimate frameshift potential for each given ORF [`Ribowave`]## Requirements
### software
* R
* bedtools v2.25.0### R packages
* reshape
* ggplot2
* rhdf5
* methods
* wmtsa
* parallel## Before running
It is **recommanded** to make a new directory and move the `Ribo-seq bam file` into that directory;## Pre-processing
### 0. Create annotation
This step scans for and annotates all putative ORFs as well as define annotated ORF according to the annotation file
```
./create_annotation.sh -h
```A helper message is shown:
```
----------------------------------------------------------------------------------------------------
RiboWave : version 1.0
This step is set for the purpose of genome annotation.
Several of the output is necessary for the following steps.
----------------------------------------------------------------------------------------------------Usage:
create_annotation.sh [-h] -G annotation.gtf -f fasta -o annotation_dir -s scripts_dirOptions:
-G (annotation.gtf )
-f (genome fasta )
-o (Output annotation directory )
-s (Script directory )
-h (Help )
----------------------------------------------------------------------------------------------------
```Run `create_annotation.sh` on example:
```
scripts/create_annotation.sh -G annotation_fly/dmel-all-r6.18.gtf -f annotation_fly/dmel-all-chromosome-r6.18.fasta -o annotation_fly -s scripts;
```#### Input files:
- : the annotation gtf should contain **start_codon** and **stop_codon** information,eg: `dmel-all-r6.18.gtf`
- : genome fasta ,eg: `dmel-all-chromosome-r6.18.fasta`- : the directory for all the annotation output
- : the directory of all the scripts in the package
#### Output files:
**`annotation`** directory, including :
* start_codon.bed : the bed file annotating start codon
* final.ORFs : all identified ORFs, eg: `FBtr0300105_0_31_546` where `FBtr0300105` refers to the transcript, `0` refers to the reading frame relative to the start of transcript, `31` refers to the start site, `546` refers to the stop codon.
### 1. P-site determination
This step determines the P-site position for each Ribo-seq reads length by overlapping with the annotated start codons from previous step
```
./P-site_determination.sh -h
```A helper message is shown:
```
----------------------------------------------------------------------------------------------------
RiboWave : version 1.0
This step is for determining the P-site position for each read length.
----------------------------------------------------------------------------------------------------Usage:
P-site_determination.sh [-h] -i Ribo.bam -S start_codon.bed -o out_dir -s scripts_dir [-n study name]Options:
-i (Ribo-seq bam )
-S (start codon annotation file )
-o (Output annotation directory )
-s (Script directory )
-n (study name, defult:test )
-h (Help )
----------------------------------------------------------------------------------------------------
```Run `P-site_determination.sh` on example :
```
scripts/P-site_determination.sh -i GSE52799/SRR1039770.sort.bam -S annotation_fly/start_codon.bed -o GSE52799 -n SRR1039770 -s scripts;
```#### Input files:
- : **secondary alignment removed** to ensure one genomic position per aligned read and sorted- **`annotation`** :
- : annotated start site `start_codon.bed`. It is generated in the `create_annotation.sh` step.- : the directory of the output result, eg: `GSE52799`
- : the name of all the output file, default: test. eg: `SRR1039770`
- : the directory of all the scripts in the package
#### Output files:
**`P-site`** directory, including :* _name_.psite1nt.txt : the Ribo-seq reads length and its corresponding P-sites position(= offset + 1). It may look this this :
```
30 13
```
* _name_.psite.pdf : the **PDF** displaying the histogram of aggregated reads### 2. Generating P-site track
This step creats the P-site track for transcripts of interests using determined P-sites position from previous step.
```
./create_track_Ribo.sh -h
```A helper message is shown:
```
----------------------------------------------------------------------------------------------------
RiboWave : version 1.0
This step is the last step for data processing where P-site track is generated for each transcript.
----------------------------------------------------------------------------------------------------Usage:
create_track_Ribo.sh [-h] -i Ribo.bam -G exons.gtf -g genome_size -P psite.position -o out_dir -s scripts_dir [-n study name]Options:
-i (Ribo-seq bam )
-G (exons gtf file )
-g (Genome size annotation file )
-P (P-site position (offset+1) )
-o (Output annotation directory )
-s (Script directory )
-n (study name, defult:test )
-h (Help )
----------------------------------------------------------------------------------------------------
```Run `create_track_Ribo.sh` on example:
###### look at transcripts from chromosome X :```
scripts/create_track_Ribo.sh -i GSE52799/SRR1039770.sort.bam -G annotation_fly/X.exons.gtf -g annotation_fly/genome -P GSE52799/P-site/SRR1039770.psite1nt.txt -o GSE52799 -n SRR1039770 -s scripts;
```#### Input files:
-
- : a gtf file for only the exons from transcripts of intersect, eg: `X.exons.gtf`
- : the file including all the chromosomes and its genome size. **Noted: `genome` can be obtained by using `samtools faidx` function with the input of fasta file.** `genome` may look like this:
```
2L 23513712
2R 25286936
3L 28110227
3R 32079331
```- **`P-site`** :
- : the file listing the P-site position for each read length. This file can be found in the output of previous step, eg: _name_.psite1nt.txt- : the directory of the output result, eg: `GSE52799`
- : the name of all the output file, default: test. eg: `SRR1039770`
- : the directory of all the scripts in the package
#### Output files:
**`bedgraph/name`** directory, including :
* final.psite : P-site track at transcriptome wide. It may look like this :
```
FBtr0070533 0,0,0,0,0,0,0,0,0,0,0,0,6,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,1,0,0,0,0,1,0,2,1,0,0,0,0,0,0,4,8,0,0,3,0,5,12,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
FBtr0073886 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,25,0,0,0,0,0,0,0,0,0,0,0,0,0
FBtr0070604 0,0,0,0,0,0,0,0,0,0,0,0,59,6,0,1,0,0,2,6,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
FBtr0070603 0,0,0,0,0,0,0,0,0,0,0,0,75,2,7,10,7,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,3,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
```
## Main function### 3. RiboWave
This step can achieve multiple functions :
- denoising [`denoise`]
- providing predicted p.value for each given ORF to identify its translation status [`pvalue`,`-P`]
- providing reads density (P-site/PF P-site) for each given ORF [`density`,`-D`]
- providing translation efficiency (TE) estimation for each given ORF [`TE`,`-T`]
- providing frameshift potential (CRF score) for each given ORF [`CRF`,`-F`]
```
./Ribowave -h
```A helper message will be shown:
```
----------------------------------------------------------------------------------------------------
RiboWave : version 1.0
This step is main function of RiboWave.
Functions are provided including : predicting translated ORF, estimating reads density, estimating translation efficiency and predicting frameshift events.
----------------------------------------------------------------------------------------------------Usage:
Ribowave [Options] -a P-site track -b ORF_list -o output_dir -s scripts_dirOptions:
-P (providing P value for each ORF )
-D (providing reads abundance for each ORF )
-F (predicting frameshift potential for each ORF )
-T (estimating TE for each ORF. requires total Ribo-seq sequence depth; requires RNA-seq FPKM)
-a (psite track )
-b (ORF list )
-o (Output directory )
-s (Script directory )
-n (The name of the study, default: test )
-p (The number of threads, default: 1 )
-h (Help )
----------------------------------------------------------------------------------------------------
```It might take hours to perform the analysis if the input is large. It is **recommended** to specify the number of CPU cores through the `-p` option.
Run `Ribowave` on example:
##### Denoise the P-site track
```
mkdir -p Ribowave;
scripts/Ribowave -a GSE52799/bedgraph/SRR1039770/final.psite -b annotation_fly/final.ORFs -o GSE52799/Ribowave -n SRR1039770 -s scripts -p 8;
```##### Identifying translated ORF
```
mkdir -p Ribowave;
scripts/Ribowave -P -a GSE52799/bedgraph/SRR1039770/final.psite -b annotation_fly/final.ORFs -o GSE52799/Ribowave -n SRR1039770 -s scripts -p 8;
```##### Estimating abundance
```
mkdir -p Ribowave;
scripts/Ribowave -D -a GSE52799/bedgraph/SRR1039770/final.psite -b annotation_fly/final.ORFs -o GSE52799/Ribowave -n SRR1039770 -s scripts -p 8;
```##### Estimating TE
IMPORTANT :
when estimating TE, user should input the **sequenced depth of Ribo-seq** and the **FPKM value from paired RNA-seq**```
mkdir -p Ribowave;
scripts/Ribowave -T 9012445 GSE52799/mRNA/SRR1039761.RPKM -a GSE52799/bedgraph/SRR1039770/final.psite -b annotation_fly/final.ORFs -o GSE52799/Ribowave -n SRR1039770 -s scripts -p 8;
```##### Calculating frameshift potential
###### on annotated ORFs```
mkdir -p Ribowave;
awk -F '\t' '$3=="anno"' annotation_fly/final.ORFs > annotation_fly/aORF.ORFs;
scripts/Ribowave -F -a GSE52799/bedgraph/SRR1039770/final.psite -b annotation_fly/aORF.ORFs -o GSE52799/Ribowave -n SRR1039770 -s scripts -p 8;
```##### Multiple functions
```
mkdir -p Ribowave;
scripts/Ribowave -PD -T 9012445 GSE52799/mRNA/SRR1039761.RPKM -a GSE52799/bedgraph/SRR1039770/final.psite -b annotation_fly/final.ORFs -o GSE52799/Ribowave -n SRR1039770 -s scripts -p 8;
```#### Input files:
- **`bedgraph/name`**:
- : output from the previous step, containing the P-site track of transcripts of interest, eg: `final.psite`- : ORFs of interest ,eg : `final.ORFs`. It is generated in the step of `create_annotation.sh`
- : the sequenced depth of Ribo-seq to calculate FPKM , eg: `9012445`
- : FPKM table. It may look like this :
```
FBtr0100871 22262
FBtr0070604 18682
FBtr0100231 14746.5
FBtr0100874 14024.5
FBtr0100864 11475.6
```#### Output files:
* _name_.PF_psite : the denoised signal track(PF P-sites signal track) at transcriptome wide. It looks similar as the input `final psite`.* _name_.feats1 : the features of ORFs including chi-square P-value information. It may look like this :
Column | Explanation
------------ | -------------
Column1-Column4 | Basic information about the ORF
Column5 | Reads coverage within the ORF
Column6 | P-value predicted by RiboWave
Column7 | Values estimating the relative abundance of PF P-sites outside of the studied ORF
Column8 | Reads intensity at the current start codon**`result`** directory, including :
* _name_.95%.mx : RiboWave makes the prediction on the translation initiation sites and gives the final translated product output (**p.value < 0.05**) . It may look like this :
```
FBtr0070007_2_93_1028
FBtr0070008_1_128_943
FBtr0070025_2_135_1094
```* _name_.density : reads density ( **PF P-site** ) of given ORFs. It may look like this :
Column | Explanation
------------ | -------------
Column1-Column4 | Basic information about the ORF
Column5 | number of PF P-sites in transcript
Column6 | number of PF P-sites in given ORF
Column7 | Density of PF P-sites in given ORF* _name_.TE : TE of given ORFs. It may look like this :
Column | Explanation
------------ | -------------
Column1 | transcript
Column2 | ORF
Column3 | TE* _name_.CRF.final : ORFs that might experience reading frame translocation. It may look like this :
Column | Explanation
------------ | -------------
Column1 | ORF
Column2 | Start of frameshift
Column3 | Stop of frameshift
Column4 | PF P-sites' reading frames **after** the change point ,eg: `2_2,0_1` where `2_2` refers to continuous two PF P-sites of frame 2 followed by continuous one PF P-sites of frame 0.
Column5 | Relative position of PF P-sites **after** the shift ,eg : `1413,1440;1789` where `1413,1440` corresponds to the exact position of `2_2` within the transcript. Discontinuity in the reading frame is separated by `;`
Column6 | CRF score describing the potential of frameshift## Example file
An example file is packed and found in [here](http://lulab.life.tsinghua.edu.cn/RiboWave/RiboWave_v1.0.tar.gz).Enclosed in the RiboWave_v1.0.tar.gz, `run_Ribowave_dmel.sh` combines all the steps together into a pipeline.
## For any questions, please contact:
Zhiyu Xu [[email protected]]