An open API service indexing awesome lists of open source software.

https://github.com/showteeth/debpeak

DEbPeak - Analyze and integrate multi-omics to unravel the regulation of gene expression.
https://github.com/showteeth/debpeak

analysis atac-seq chip-seq count-matrix differential-expression geo integration m6a-seq peak-related rna-seq transcription-factors visualization

Last synced: 2 months ago
JSON representation

DEbPeak - Analyze and integrate multi-omics to unravel the regulation of gene expression.

Awesome Lists containing this project

README

          

# DEbPeak - Analyze and integrate multi-omics to unravel the regulation of gene expression.

![License](https://img.shields.io/badge/license-GPL--3.0-blue.svg)
[![CODE\_SIZE](https://img.shields.io/github/languages/code-size/showteeth/DEbPeak.svg)](https://github.com/showteeth/DEbPeak)

## Introduction
`DEbPeak` aims to **explore**, **visualize**, **interpret** multi-omics data and **unravel the regulation of gene expression** by combining RNA-seq with peak-related data (eg: ChIP-seq, ATAC-seq, m6a-seq et al.). It contains **eleven functional modules**:

* **Parse GEO**: Extract study information, raw count matrix and metadata from GEO database.
* **Quality Control (QC)**: QC on count matrix and samples.
- QC on count matrix: Proportion of genes detected in different samples under different CPM thresholds and the saturation of the number of genes detected.
- QC on samples: Euclidean distance and pearson correlation coefficient of samples across different conditions, sample similarity on selected principal components (check batch information and conduct batch correction) and outlier detection with robust PCA.
* **Principal Component Analysis (PCA)**: this module can be divided into three sub modules, basic info, loading related and 3D visualization.
- Basic info: scree plot (help to select the useful PCs), biplot (sample similarity with corresponding genes with larger loadings) and PC pairs plot (sample similarity under different PC combinations).
- Loading related: visualize genes with larger positive and negative loadings on selected PCs, conduct GO enrichment analysis on genes with larger positive and negative loadings on selected PCs.
- 3D visualization: visualize samples on three selected PCs.
* **Differential Analysis and Visualization**: this module includes seven powerful visualization methods (Volcano Plot, Scatter Plot, MA Plot, Rank Plot, Gene/Peak Plot, Heatmap, Pie Plot for peak-related data).
* **Functional Enrichment Analysis (FEA)**: GO enrichment analysis, KEGG enrichment analysis, Gene Set Enrichment Analysis (GSEA).
- GO (Biological Process, Molecular Function, Cellular Component) and KEGG on differential expression genes or accessible/binding peaks.
- GSEA on all genes (Notice: GSEA is not available for peak-related data)
* **Predict transcription factors (PredictTFs)**: Identify transcription factors with differentially expressed genes, `DEbPeak` provides three methods ([BART](https://academic.oup.com/bioinformatics/article/34/16/2867/4956015?login=false), [ChEA3](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6602523) and [TFEA.ChIP](https://academic.oup.com/bioinformatics/article/35/24/5339/5538988)).
* **Motif analysis**:
- *de novo* motif discovery
- motif enrichment
* **Integrate RNA-seq with peak-related data**:
- Get consensus peaks: For multiple peak files, get consensus peaks; for single peak file, use it directly (used in consensus integration mode).
- Peak profile plots: Heatmap of peak binding to TSS regions, Average Profile of ChIP peaks binding to TSS region, Profile of ChIP peaks binding to different regions (used in consensus integration mode).
- Peak annotaion (used in consensus integration mode).
- Integrate RNA-seq with peak-related data (consensus mode): Integrate RNA-seq with peak-related data to find direct targets, including up-regulated and down-regulated.
- Integrate RNA-seq with peak-related data (differential mode): Integrate RNA-seq and peak-related data based on differential analysis.
- Integration summary: include venn diagram and quadrant diagram (differential mode).
- GO enrichment on integrated results.
- Find motif on integrated results: Due to the nature of ATAC-seq, we usually need to find motif on integrated results to obtain potential regulatory factors.
* **Integrate RNA-seq with RNA-seq**:
- Integration summary: include venn diagram and quadrant diagram.
- GO enrichment on integrated results.
* **Integrate peak-related data with peak-related data**:
- Integration summary: include venn diagram and quadrant diagram (differential mode).
- GO enrichment on integrated results.
* **Utils**: useful functions, including creating enrichment plot for selected enrichment terms, gene ID conversion and count normalization(DESeq2’s median of ratios, TMM, CPM, TPM, RPKM).

To enhance the ease of use of the tool, we have also developed an **web server** for `DEbPeak` that allows users to submit files to the web page and set parameters to get the desired results. Unlike the standalone R package, *the web server has built-in `DESeq2` for differential analysis*, while the R package can accept user input results from `DESeq2` or `edgeR`, which will be **more flexible**.

By the way, all plots generated are **publication-ready** , and most of them are based on `ggplot2`, so that users can easily modify them according to their needs. We also provide **various color palettes**, including **discrete** and **continuous**, **color blind friendly** and **multiple categorical variables**.


## Citation

If you use [DEbPeak](https://showteeth.github.io/DEbPeak/) in published research, please cite:

* Hou J\#, **Song Y**\#, Xiao C\#, Sun Y, Shen J, Ma X, Zhou Q, Chiu SC, Xu Y, Huang Y, Chen YG, Zhu X\*, Wang J\*, Xiong JW\*. Cloche/Npas4l is a pro-regenerative platelet factor during zebrafish heart regeneration. **Dev Cell**. 2025 Jun 24:S1534-5807(25)00370-3. doi: 10.1016/j.devcel.2025.06.015. Epub ahead of print. PMID: 40602409.


## Framework


DEbPeak_framework


## Application scenarios for multi-omics integration


DEbPeak_scenarios


## Installation
### R package
You can install the package via the Github repository:

``` r
# install.packages("devtools") #In case you have not installed it.

# install prerequisites for enrichplot and ChIPseeker
devtools::install_version("ggfun", version = "0.0.6", repos = "https://cran.r-project.org")
devtools::install_version("aplot", version = "0.1.6", repos = "https://cran.r-project.org")
devtools::install_version("scatterpie", version = "0.1.7", repos = "https://cran.r-project.org")

# For mac, you may need to install xquartz: brew install --cask xquartz

# install DEbPeak
devtools::install_github("showteeth/DEbPeak")
```

In general, it is **recommended** to install from [Github repository](https://github.com/showteeth/DEbPeak) (update more timely).

For other issues about installation, please refer [Installation](https://github.com/showteeth/scfetch/blob/main/INSTALL.md#general-solution) guide.

Install additional tools:
```bash
# install MSPC --- consensus peak
wget --quiet https://github.com/Genometric/MSPC/releases/latest/download/linux-x64.zip -O MSPC_linux_x64.zip && unzip -q MSPC_linux_x64.zip -d mspc && cd mspc && chmod +x mspc

# install meme --- motif anaysis
## install from source
cd /opt && wget --quiet https://meme-suite.org/meme/meme-software/5.5.5/meme-5.5.5.tar.gz -O meme-5.5.5.tar.gz && tar -zxf meme-5.5.5.tar.gz && cd meme-5.5.5 && ./configure --prefix=`pwd`/meme-5.5.5/meme --enable-build-libxml2 --enable-build-libxslt && make && make install
## install from conda: conda install -c bioconda meme

# install homer --- motif enrichment
## install from source
mkdir homer && cd homer && wget --quiet http://homer.ucsd.edu/homer/configureHomer.pl -O configureHomer.pl && chmod +x configureHomer.pl && perl configureHomer.pl -install
## install from conda: conda install -c bioconda homer
## Downloading Homer Packages: http://homer.ucsd.edu/homer/introduction/install.html

# install deeptools and bart
pip install deeptools numpy pandas scipy tables scikit-learn matplotlib
wget --quiet https://virginia.box.com/shared/static/031noe820hk888qzcxvw1cazol1gdhi0.gz -O bart_v2.0.tar.gz && tar -zxf bart_v2.0.tar.gz
## Download the resources and setup the configuration file
## https://zanglab.github.io/bart/index.htm#install
```


### Docker
We also provide a [docker image](https://hub.docker.com/repository/docker/soyabean/debpeak) to use:

```bash
# pull the image
docker pull soyabean/debpeak:1.2

# run the image
docker run --rm -p 8888:8787 -e PASSWORD=passwd -e ROOT=TRUE -it soyabean/debpeak:1.2
```

**Notes**:

* After running the above codes, open browser and enter `http://localhost:8888/`, the user name is `rstudio`, the password is `passwd` (set by `-e PASSWORD=passwd`)
* If port `8888` is in use, change `-p 8888:8787`
* The `meme suit` path: `/opt/meme-5.5.5/meme/bin`.
* The `homer suit` path: `/opt/homer/bin`.
* The `configureHomer.pl` path: `/opt/homer`.
* The `bart` path: `/opt/bart_v2.0/bin`
* You still need to **download the resources and setup the configuration file for [bart](https://zanglab.github.io/bart/index.htm#install)** and **download species packages for [homer](http://homer.ucsd.edu/homer/introduction/install.html)**.


## Usage
### Vignette

Detailed usage is available in [here](https://showteeth.github.io/DEbPeak/). We divide these vignettes into four categories:

* For parse **GEO**:
* [Parse GEO](https://showteeth.github.io/DEbPeak/articles/ParseGEO.html)

* For analyzing **RNA-seq**:
* [Quality Control](https://showteeth.github.io/DEbPeak/articles/QualityControl.html)
* [Principal Component Analysis (RNA-seq)](https://showteeth.github.io/DEbPeak/articles/PrincipalComponentAnalysis.html)
* [Differential Expression Analysis (RNA-seq)](https://showteeth.github.io/DEbPeak/articles/DifferentialExpressionAnalysis.html)
* [Functional Enrichment Analysis (RNA-seq)](https://showteeth.github.io/DEbPeak/articles/FunctionalEnrichmentAnalysis.html)
* [Predict Transcription Factors (RNA-seq)](https://showteeth.github.io/DEbPeak/articles/IdentifyRegulator.html)
* [Utils](https://showteeth.github.io/DEbPeak/articles/Utils.html)

* For analyzing **peak-related data**:
* [Quality Control](https://showteeth.github.io/DEbPeak/articles/QualityControl.html)
* [Principal Component Analysis (Peak-related)](https://showteeth.github.io/DEbPeak/articles/PrincipalComponentAnalysisPeak.html)
* [Differential Analysis (Peak-related)](https://showteeth.github.io/DEbPeak/articles/DifferentialExpressionAnalysisPeak.html)
* [Functional Enrichment Analysis (Peak-related)](https://showteeth.github.io/DEbPeak/articles/FunctionalEnrichmentAnalysisPeak.html)
* [Motif Analysis (Peak-related)](https://showteeth.github.io/DEbPeak/articles/MotifEnrichment.html)

* **Integrating RNA-seq (differential expression analysis) with peak-related data (consensus peak)**:
* [Integrate RNA-seq with ChIP-seq (consensus peak)](https://showteeth.github.io/DEbPeak/articles/IntegrateChIP.html)
* [Integrate RNA-seq with ATAC-seq (consensus peak)](https://showteeth.github.io/DEbPeak/articles/IntegrateATAC.html)
* [Integrate RNA-seq ChIP-seq and ATAC-seq](https://showteeth.github.io/DEbPeak/articles/IntegrateChIPATAC.html)

* **Integrating RNA-seq (differential expression analysis) with peak-related data (differential accessible/binding analysis)**:
* [Integrate RNA-seq with ATAC-seq (differential analysis)](https://showteeth.github.io/DEbPeak/articles/IntegrateATACDE.html)
* [Integrate RNA-seq with ChIP-seq (differential analysis)](https://showteeth.github.io/DEbPeak/articles/IntegrateChIPDE.html).

### Function list


Type
Function
Description
Key packages


Parse GEO
ParseGEO
Extract study information, raw count matrix and metadata from GEO database
GEOquery


Quality Control
CountQC
Quality control on count matrix (gene detection sensitivity and sequencing depth saturation)
NOISeq


SampleRelation
Quality control on samples (sample clustering based on euclidean distance and pearson correlation coefficient)
stats


OutlierDetection
Detect outlier with robust PCA
rrcov


QCPCA
PCA related functions used in quality control (batch detection and correction, outlier detection)
stats, sva, rrcov


Principal Component Analysis
PCA
Conduct principal component analysis
stats


PCABasic
Generated PCA baisc plots, including screen plot, biplot and pairs plot
PCAtools


ExportPCGenes
Export genes of selected PCs
tidyverse


LoadingPlot
PCA loading plot, including bar plot and heatmap
ggplot2, ComplexHeatmap


LoadingGO
GO enrichment on PC’s loading genes
clusterProfiler


PCA3D
Create 3D PCA plot
plot3D


Differential Analysis
ExtractDA
Extract differential analysis results
tidyverse


VolcanoPlot
VolcanoPlot for differential analysis results
ggplot2


ScatterPlot
ScatterPlot for differential analysis results
ggplot2


MAPlot
MA-plot for differential analysis results
ggplot2


RankPlot
Rank plot for differential analysis results
ggplot2


GenePlot
Gene expresion or peak accessibility/binding plot
ggplot2


DEHeatmap
Heatmap for differential analysis results
ComplexHeatmap


DiffPeakPie
Stat genomic regions of differential peaks with pie plot
ggpie


ConductDESeq2
Conduct differential analysis with DESeq2
NOISeq, stats, sva, rrcov, PCAtools, DESeq2, ggplot2, ComplexHeatmap, clusterProfiler, plot3D, tidyverse


Functional Enrichment Analysis
ConductFE
Conduct functional enrichment analysis (GO and KEGG)
clusterProfiler


ConductGSEA
Conduct gene set enrichment analysis (GSEA)
clusterProfiler


VisGSEA
Visualize GSEA results
enrichplot


Predict Transcription Factors
InferRegulator
Predict TFs   from RNA-seq data with ChEA3, BART2 and TFEA.ChIP
ChEA3, BART2,   TFEA.ChIP


VizRegulator
Visualize the   Identified TFs
ggplot2


Motif Analysis
MotifEnrich
Motif enrichment for differentially accessible/binding peaks
HOMER


MotifDiscovery
de novo motif discovery with STREME
MEME


MotifCompare
Map motifs against a motif database with Tomtom
MEME


Peak-related Analysis
PeakMatrix
Prepare count matrix and sample metadata for peak-related data
DiffBind, ChIPseeker


GetConsensusPeak
Get consensus peak from replicates
MSPC


PeakProfile
Visualize peak accessibility/binding profile
ChIPseeker


AnnoPeak
Assign peaks with the genomic binding region and nearby genes
ChIPseeker


PeakAnnoPie
Visualize peak annotation results with pie plot
ggpie


Integrate RNA-seq with Peak-related Data
DEbPeak
Integrate differential expression results and peak annotation/differential analysis results.
tidyverse


DEbPeakFE
GO enrichment on integrated results
clusterProfiler


DEbCA
Integrate differential expression results and peak annotation results (two kinds of peak-related data)
tidyverse


ProcessEnhancer
Get genes near differential peaks
IRanges


InteVenn
Create a Venn diagram for integrated results (support DEbPeak, DEbDE, PeakbPeak)
ggvenn


InteDiffQuad
Create quadrant diagram for differential expression analysis of RNA-seq and peak-related data
ggplot2


NetViz
Visualize   enhancer-gene network results
igraph,   ggnetwork


FindMotif
Find motif on integrated results
HOMER


Integrate RNA-seq with RNA-seq
DEbDE
Integrate Two Differential Expression Results
tidyverse


DEbDEFE
GO Enrichment on Two Differential Expression Integration Results.
clusterProfiler


Integrate Peak-related Data with Peak-related Data
PeakbPeak
Integrate Two Peak Annotation/Differential Analysis Results.
tidyverse


PeakbPeakFE
GO Enrichment on Two Peak Annotation/Differential Analysis Integration Results.
clusterProfiler


Utils


EnrichPlot
Create a bar or   dot plot for selected functional enrichment analysis results (GO and KEGG)
ggplot2


IDConversion
Gene ID conversion between ENSEMBL ENTREZID SYMBOL
clusterProfiler


GetGeneLength
Get gene length from GTF
GenomicFeatures, GenomicRanges


NormalizedCount
Perform counts normalization (DESeq2’s median of ratios, TMM, CPM, RPKM, TPM)
DESeq2, edgeR, tidyverse


## Notice

* The **KEGG API** has changed, to perform KEGG enrichment, you'd better update `clusterProfiler` >= `4.7.1`.


## Contact
For any question, feature request or bug report please write an email to songyb0519@gmail.com.


## Code of Conduct
Please note that the DEbPeak project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.