https://github.com/SegataLab/viromeqc
ViromeQC is a computational tool to benchmark and quantify non-viral contamination in VLP-enrihed viromes. ViromeQC provides an enrichment score for each virome. The score is calculated with respect to the expected prokaryotic markers abundances in reference metagenomes
https://github.com/SegataLab/viromeqc
virome
Last synced: about 2 months ago
JSON representation
ViromeQC is a computational tool to benchmark and quantify non-viral contamination in VLP-enrihed viromes. ViromeQC provides an enrichment score for each virome. The score is calculated with respect to the expected prokaryotic markers abundances in reference metagenomes
- Host: GitHub
- URL: https://github.com/SegataLab/viromeqc
- Owner: SegataLab
- License: mit
- Created: 2020-02-13T20:47:03.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2022-11-15T02:44:59.000Z (over 2 years ago)
- Last Synced: 2024-03-19T04:50:33.195Z (about 1 year ago)
- Topics: virome
- Language: Python
- Homepage:
- Size: 677 KB
- Stars: 14
- Watchers: 2
- Forks: 1
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: license.txt
Awesome Lists containing this project
- awesome-virome - ViromeQC - Virome quality control and contamination assessment. [source] [Python] (Other Tools / Quality Control)
README
# ViromeQC #
## Description ##
* Provides an enrichment score for VLP viromes with respect to metagenomes
* Useful benchmark for the quality of enrichment of a virome
* Tested on Linux Ubuntu Server 16.04 LTS and on Linux Mint 19**Requires:**
* [Bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) >= v. 2.3.4
* [Samtools](http://samtools.sourceforge.net/) >= 1.3.1
* [Biopython](https://github.com/biopython/biopython) >= 1.69
* [Pysam](http://pysam.readthedocs.io/en/latest/) >= 0.14
* [Diamond](http://github.com/bbuchfink/diamond) (tested on v.0.9.9 and 0.9.29)
* Python3 (tested on 3.6)
* [pandas](https://pandas.pydata.org) >= 0.20**Update:** _ViromeQC_ now works with newer versions of diamond (e.g. v0.9.29)
Thanks to Ryan Cook ([@RyanCookAMR](https://twitter.com/RyanCookAMR)) for the new diamond db## Usage ##
### Step 1: clone or download the repository ###
`git clone --recurse-submodules https://github.com/SegataLab/viromeqc.git`
or download the repository from the **[releases](https://github.com/SegataLab/viromeqc/releases)** page
### Step 2: install the database: ###
This steps downloads the database file. This needs to be done only the first time you run ViromeQC. This may require a few minutes, depending on your internet connection.
`viromeQC.py --install`
Alternatively, you can also download the database files from [Zenodo](https://zenodo.org/record/4020594#.X1jxgGMzZDM). Once downloaded the files, create a folder named `index/` in the ViromeQC installation folder and unzip all the files in this folder.
### Step 3: Run on your sample ###
`viromeQC.py -i -o `
*Please Note:*
You can pass more than one file as input (e.g. for multiple runs or paired end reads). However, you can process only one sample at a time with this command. If you want to parallelize the execution, this can be easily done with [Parallel](https://www.gnu.org/software/parallel/) or equivalent tools.You can try the test example (`test/test.sh`) which analyzes 10'000 reads from the sample `SRR829034`. This should take approximately 1 or 2 minutes.
Parameters:
```
usage: viromeQC.py -i -ooptional arguments:
-h, --help show this help message and exit
-i [INPUT [INPUT ...]], --input [INPUT [INPUT ...]]
Raw Reads in FASTQ format. Supports multiple inputs
(plain, gz o bz2) (default: None)
-o OUTPUT, --output OUTPUT
output file (default: None)
--minlen MINLEN Minimum Read Length allowed (default: 75)
--minqual MINQUAL Minimum Read Average Phred quality (default: 20)
--bowtie2_threads BOWTIE2_THREADS
Number of Threads to use with Bowtie2 (default: 4)
--diamond_threads DIAMOND_THREADS
Number of Threads to use with Diamond (default: 4)
-w {human,environmental}, --enrichment_preset {human,environmental}
Calculate the enrichment basing on human or
environmental metagenomes. Defualt: human-microbiome
(default: human)
--bowtie2_path BOWTIE2_PATH
Full path to the bowtie2 command to use, deafult
assumes that bowtie2 is present in the system path
(default: bowtie2)
--diamond_path DIAMOND_PATH
Full path to the diamond command to use, deafult
assumes that diamond is present in the system path
(default: diamond)
--version Prints version informations (default: False)
--install Downloads database files (default: False)
--sample_name SAMPLE_NAME
Optional label for the sample to be included in the
output file (default: None)
--tempdir TEMPDIR Temporary Directory override (default is the system
temp. directory) (default: None)
```### Pipeline structure ###
ViromeQC starts from FASTQ files (compressed files are supported), and will:
1. Elimitate short and low quality reads
- *adjust the `minqual` and `minlen` parameters if you want to change the thresholds*
2. Map the reads against a curated collection of rRNAs and single-copy bacteral markers
3. Filter the reads to remove short and dlsivergent alignments
4. Compute the enrichment value of the sample, compared to the median observed in human metagenomes
- use `-w environmental` for envronmental reads
- reference medians for un-enriched metagenomes are taken from `medians.csv`, you can provide your own data to ViromeQC by changing this file accordingly
5. Produce a report file with the alignment rates and the final enrichment score (which is the minimum enrichment observed across SSU-rRNA, LSU-rRNA and single-copy markers)### Output ###
Output is given as a TSV file with the following structure:
| Sample | Reads | Reads_HQ | SSU rRNA alignment (%) | LSU rRNA alignment (%) | Bacterial_Markers alignment (%) | total enrichmnet score
|---|---|---|---|---|---|---|
| your_sample.fq | 40000 | 39479 | 0.00759898 | 0.0227969 | 0.01266496 | 5.795329- An alignment score of 5.8 means that the virome is 5.8 times more enriched than a comparable metagenome
- High score (e.g. 10-50) reflect high VLP enrichment## Citation ##
If you find this tool useful, please cite:
*Zolfo, M., Pinto, F., Asnicar, F., Manghi, P., Tett A., Segata N.* **[Detecting contamination in viromes using ViromeQC](https://www.nature.com/articles/s41587-019-0334-5)**, *Nature Biotechnology* 37, 1408–1412 (2019)