https://github.com/bibymaths/nf-illumina2lineage
A Nextflow pipeline for SARS-CoV-2 genome assembly and analysis from Illumina reads—includes QC, mapping, variant calling, consensus generation, lineage annotation, and phylogenetics.
https://github.com/bibymaths/nf-illumina2lineage
genome-assembly illumina sars-cov-2 variantcalling
Last synced: 5 months ago
JSON representation
A Nextflow pipeline for SARS-CoV-2 genome assembly and analysis from Illumina reads—includes QC, mapping, variant calling, consensus generation, lineage annotation, and phylogenetics.
- Host: GitHub
- URL: https://github.com/bibymaths/nf-illumina2lineage
- Owner: bibymaths
- License: bsd-3-clause
- Created: 2023-09-26T22:28:39.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-05-16T22:15:02.000Z (6 months ago)
- Last Synced: 2025-06-08T13:06:23.189Z (5 months ago)
- Topics: genome-assembly, illumina, sars-cov-2, variantcalling
- Language: Shell
- Homepage: https://bibymaths.github.io/nf-illumina2lineage/
- Size: 9.04 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# nf-illumina2lineage

[](https://www.nextflow.io/)
[](https://doi.org/10.5281/zenodo.15376065)
A reproducible and modular **Nextflow pipeline** for **SARS-CoV-2 genome assembly and lineage analysis** from Illumina paired-end sequencing data.
## Overview
This pipeline automates:
- Read quality control
- Reference-based mapping
- Primer clipping
- Variant calling
- Consensus generation
- Lineage assignment
- Phylogenetic analysis
It is based on best-practice tools and developed as part of the *SARS-2 Bioinformatics & Data Science* course by Freie Universität Berlin and the Robert Koch Institute.
## Quickstart
```bash
git clone https://github.com/bibymaths/nf-illumina2lineage.git
cd nf-illumina2lineage
````
> 💡 See [docs/quickstart.md](docs/quickstart.md) for full details.
## Inputs
* Illumina paired-end `.fastq.gz` files
* SARS-CoV-2 reference genome (downloaded automatically)
## Outputs
* QC reports: FastQC, Fastp, MultiQC
* BAM & VCF files
* Consensus FASTA sequences
* Pangolin lineage annotations
* Phylogenetic tree (.treefile)
For a full output structure, see [docs/outputs.md](docs/outputs.md).
## Dependencies
Managed via `mamba` or `Docker`:
* QC: `fastqc`, `fastp`, `multiqc`
* Mapping: `minimap2`, `samtools`, `bamclipper`
* Variant Calling: `freebayes`, `vcftools`, `bcftools`
* Consensus: `vcfR`, `bcftools`, `president`
* Lineage & MSA: `pangolin`, `mafft`, `iqtree`
## Documentation
Complete documentation is available under the `docs/` folder and rendered via [MkDocs](https://www.mkdocs.org/). Includes:
* [Pipeline overview](docs/workflow.md)
* [Process details](docs/processes.md)
* [Parameters](docs/parameters.md)
* [Container usage](docs/containers.md)
* [Lineage QC](docs/lineage_qc.md)
## License
This project is licensed under the **BSD 3-Clause License**. See [LICENSE](LICENSE).
## Author
**Abhinav Mishra**
Email: [mishraabhinav@gmail.com](mailto:mishraabhinav@gmail.com)
## Acknowledgments
Developed during the [SARS-2 Bioinformatics & Data Science](https://github.com/rki-mf1/2023-SC2-Data-Science) course at FU Berlin & RKI, under guidance of **Max von Kleist** and **Martin Hölzer**.
---