https://github.com/rki-mf1/cievad
A tool suite for a simple, streamlined and rapid evaluation of variant callsets
https://github.com/rki-mf1/cievad
benchmarking bioinformatics genomics indels nextflow ngs oxford-nanopore snps variant-calling
Last synced: 5 months ago
JSON representation
A tool suite for a simple, streamlined and rapid evaluation of variant callsets
- Host: GitHub
- URL: https://github.com/rki-mf1/cievad
- Owner: rki-mf1
- License: gpl-3.0
- Created: 2023-05-09T14:35:43.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-09-16T14:35:27.000Z (over 1 year ago)
- Last Synced: 2024-09-16T16:42:37.485Z (over 1 year ago)
- Topics: benchmarking, bioinformatics, genomics, indels, nextflow, ngs, oxford-nanopore, snps, variant-calling
- Language: Nextflow
- Homepage: https://github.com/rki-mf1/cievad
- Size: 46.1 MB
- Stars: 3
- Watchers: 3
- Forks: 1
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://docs.conda.io/en/latest/)
[](https://www.nextflow.io/)



# CIEVaD
Continuous Integration and Evaluation for Variant Detection. This repository provides a tool suite for simple, streamlined and rapid creation and evaluation of genomic variant callsets. It is primarily designed for continuous integration of variant detection software and a plain containment check between sets of variants. The tools suite utilizes the _conda_ package management system and _nextflow_ workflow language.
## Contents:
1. [System requirements](#system-requirements)
2. [Installation](#installation)
3. [Usage](#usage)
4. [Help](#help)
5. [Citation](#citation)
## System requirements:
This tool suite was developed for Linux and is the only officially supported operating system here.
Having any derivative of the conda package management system installed is the only strict system requirement.
A recent version (≥20.04.0) of nextflow is required to execute the workflows, but can easily be installed via conda.
For an installation instruction of nextflow via conda see [Installation](#installation).
🖥️ See list of tested setups:
| Requirement | Tested with |
| --- | --- |
| 64 bits Linux operating system | Ubuntu 20.04.5 LTS |
| [Conda](https://docs.conda.io/en/latest/) | vers. 23.5.0, 24.1.2|
| [Nextflow](https://nextflow.io/) | vers. 20.04.0, 23.10.1 |
## Installation:
1. Download the repository:
```
git clone https://github.com/rki-mf1/cievad.git
```
2. [Optional] Install nextflow if not yet on your system. For good practise you should use a new conda environment:
```
conda deactivate
conda create -n cievad -c bioconda nextflow
conda activate cievad
```
## Usage:
This tool suite provides multiple functional features to generate synthetic sequencing data, generate sets of ground truth variants (truthsets) and evaluate sets of predicted variants (callsets).
There are two main workflows, `hap.nf` and `eval.nf`.
Both workflows are executed via the nextflow command line interface (CLI).
⚠️ Run commands from the root directory:
Without further ado, please run the commands from a terminal at the top folder (root directory) of this repository.
Otherwise relative paths within the workflows might be invalid.
### Generating haplotype data
The minimal command to generate haplotype data is
```
nextflow run hap.nf -profile local,conda
```
This generates the following data within the `/results/` directory:
- a haplotype (FASTA), which is a copy of the provided reference sequence but deviates by a set of synthetic genomic variants
- the variant set (VCF) of synthetic genomic variants in the haplotype
- a set of reads (FASTQ) representing a sequencing experiment from the haplotype
### Evaluating variant calls
The minimal command to evaluate the accordance between a truthset (generated data) and a callset is
```
nextflow run eval.nf -profile local,conda --callsets_dir
```
where `--callsets_dir` is the parameter to specify a folder containing the callset VCF files.
Currently, a callset within this folder has to follow the naming convention `callset_.vcf[.gz]` where _\_ is the integer of the corresponding truthset.
Alternatively, one can provide a sample sheet of comma separated values (CSV file) with the columns "index", "truthset" and callset", where "index" is an integer from 1 to n (number of samples) and "callset"/"truthset" are paths to the pairwise matching VCF files.
Callsets can optionally be _gzip_ compressed.
The command for the sample sheet input is
```
nextflow run eval.nf -profile local,conda --sample_sheet
```
This generates the following data within the `/results/` directory:
- a report (CSV, JSON) about accordance between the synthetic variant set and a given corresponding callset
- a report (CSV) with statistis across all tested individuals
### Tuning the workflow parameters
CIEVaD enables access and finetuning to a vast majority of parameters of the internal software tools.
The parameters to adjust the workflows are listed on their respective help pages.
To inspect the help pages type `--help` after the script name, e.g. `nextflow run hap.nf --help` for the hap.nf workflow.
Parameters can be adjusted via the CLI or directly within the _nextflow.config_ file.
Mind that parameters provided by the CLI will overwrite parameters set in config.
More information about tuning crucial parameters, e.g. [read quality](https://github.com/rki-mf1/cievad/wiki/Parameterization-of-the-workflow) and [genome coverage](https://github.com/rki-mf1/cievad/wiki/FAQ---Troubleshooting), can be found in the Wiki.
## Help:
Visit the project [wiki](https://github.com/rki-mf1/cievad/wiki) for more detail information on parameters, help and FAQs.
Please file issues, bug reports and questions to the [issues](https://github.com/rki-mf1/cievad/issues) section.
## Citation:
We have a [manuscript](https://www.mdpi.com/1999-4915/16/9/1444) available for CIEVaD.
If you use CIEVaD please cite
```
@article{krannich2024cievad,
title={CIEVaD: A Lightweight Workflow Collection for the Rapid and On-Demand Deployment of End-to-End Testing for Genomic Variant Detection},
author={Krannich, Thomas and Ternovoj, Dmitrii and Paraskevopoulou, Sofia and Fuchs, Stephan},
journal={Viruses},
volume={16},
number={9},
pages={1444},
year={2024},
doi={10.3390/v16091444}
}
```