An open API service indexing awesome lists of open source software.

https://github.com/jlmaier12/proactive

Detect elevations and gaps in read coverage on metagenome contigs or assembled genomes
https://github.com/jlmaier12/proactive

bacteriophage metagenomics mobile-genetic-elements pattern-matching prophages read-mapping sequencing-coverage structural-variants transposable-elements

Last synced: 28 days ago
JSON representation

Detect elevations and gaps in read coverage on metagenome contigs or assembled genomes

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# ProActive

**`ProActive` automatically detects regions of gapped and elevated read coverage
using a 2D pattern-matching algorithm. `ProActive` detects, characterizes and
visualizes read coverage patterns in both genomes and metagenomes. Optionally,
users may provide gene annotations associated with their genome or metagenome
in the form of a .gff file. In this case, `ProActive` will generate an additional
output table containing the gene annotations found within the detected regions of
gapped and elevated read coverage. Additionally, users can search for gene
annotations of interest in the output read coverage plots.**

Visualizing read coverage data is important because gaps and elevations in coverage can
be indicators of a variety of biological and non-biological scenarios, for example-

* Elevations and gaps in read coverage may be caused by some types of structural
variants. Deletions can cause gaps while duplications can cause elevations in read coverage [1].
* Highly active and/or abundant mobile genetic elements, like transposable
elements [2] and prophage [3] for example, can create elevations in read coverage
at their respective integration sites.
* Genetic regions with high mutation rates and/or high variability within the population
can generate gaps in read coverage [4].
* Poor quality sequencing reads and chimeric reference sequences may cause gaps
and elevations in read coverage.

**Since the cause for gaps and elevations in read coverage can be ambiguous,
ProActive is best used as a screening method to identify genetic regions for further
investigation with other tools!**

**References:**

1. Tattini L., D'Aurizio R., & Magi A. (2015). Detection of Genomic Structural
Variants from Next-Generation Sequencing Data. Frontiers in bioengineering and biotechnology,
3, 92. https://doi.org/10.3389/fbioe.2015.00092
2. Kleiner M., Bushnell B., Sanderson K.E. et al. (2020) Transductomics: sequencing-based
detection and analysis of transduced DNA in pure cultures and microbial communities.
Microbiome 8, 158. https://doi.org/10.1186/s40168-020-00935-5
3. Kieft K., Anantharaman K. (2022). Deciphering Active Prophages from Metagenomes. mSystems 7:e00084-22.
https\://doi.org/10.1128/msystems.00084-22
4. Fogarty E., Moore R. (2019). Visualizing contig coverages to better understand
microbial population structure. https://merenlab.org/2019/11/25/visualizing-coverages/

### Input files

#### Pileup file:
ProActive detects read coverage patterns using a pattern-matching algorithm that
operates on pileup files. A pileup file is a file format where each row
summarizes the 'pileup' of reads at specific genomic locations. Pileup files
can be used to generate a rolling mean of read coverages and associated base
pair positions which reduces data size while
preserving read coverage patterns. **ProActive requires that input pileups files**
**be generated using a 100 bp window/bin size.**

Pileup files can be generated by mapping sequencing reads to a
metagenome or genome fasta. **Read mapping should be performed using a high**
**minimum identity (0.97 or higher) and random mapping of ambiguous reads.** The
pileup files needed for ProActive are generated using the .bam files produced
during read mapping. Some read mappers, like
[BBMap](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbmap-guide/),
allow for the generation of pileup files in the
[`bbmap.sh`](https://github.com/BioInfoTools/BBMap/blob/master/sh/bbmap.sh)
command with use of the `bincov` output with the `covbinsize=100`
parameter/argument. **Otherwise, BBMap's**
**[`pileup.sh`](https://github.com/BioInfoTools/BBMap/blob/master/sh/pileup.sh)**
**can convert .bam files produced by any read mapper to pileup files**
**compatible with ProActive using the `bincov` output with `binsize=100`.**

**NOTE:** For detailed information on input file format, please see the vignette. Users may also use
the 'sampleMetagenomePileup' and 'sampleGenomePileup' files that come pre-loaded with
ProActive as a reference.

#### gffTSV:
ProActive optionally accepts a .gff file as input. The .gff file must be
associated with the same metagenome or genome used to create your pileup file.
The .gff file should be a TSV and should follow the same general format described [here](https://en.wikipedia.org/wiki/General_feature_format#:~:text=In%20bioinformatics%2C%20the%20general%20feature,DNA%2C%20RNA%20and%20protein%20sequences.).

## Installation

Install ProActive from CRAN with:
``` r
install.packages("ProActive")
library(ProActive)
```

Install the development version of ProActive from [GitHub](https://github.com/) with:
``` r
if (!require("devtools", quietly = TRUE)) {
install.packages("devtools")
}

devtools::install_github("jlmaier12/ProActive")
library(ProActive)
```

## Quick start

```{r example}
library(ProActive)

## Metagenome mode

MetagenomeProActive <- ProActiveDetect(
pileup = sampleMetagenomePileup,
mode = "metagenome",
gffTSV = sampleMetagenomegffTSV
)

MetagenomePlots <- plotProActiveResults(pileup = sampleMetagenomePileup,
ProActiveResults = MetagenomeProActive)

MetagenomeGeneMatches <- geneAnnotationSearch(ProActiveResults = MetagenomeProActive,
pileup = sampleMetagenomePileup,
gffTSV = sampleMetagenomegffTSV,
geneOrProduct = "product",
keyWords = c("transport", "chemotaxis"))

## Genome mode

GenomeProActive <- ProActiveDetect(
pileup = sampleGenomePileup,
mode = "genome",
gffTSV = sampleGenomegffTSV
)

GenomePlots <- plotProActiveResults(pileup = sampleGenomePileup,
ProActiveResults = GenomeProActive)

GenomeGeneMatches <- geneAnnotationSearch(ProActiveResults = GenomeProActive,
pileup = sampleGenomePileup,
gffTSV = sampleGenomegffTSV,
geneOrProduct = "product",
keyWords = c("ribosomal"),
inGapOrElev = TRUE,
bpRange = 5000)
```