https://github.com/dridk/sars-cov-2-ngs-pipeline
A simple snakemake pipeline to call variant from NGS data of Sars-CoV-2 genome
https://github.com/dridk/sars-cov-2-ngs-pipeline
Last synced: 5 months ago
JSON representation
A simple snakemake pipeline to call variant from NGS data of Sars-CoV-2 genome
- Host: GitHub
- URL: https://github.com/dridk/sars-cov-2-ngs-pipeline
- Owner: dridk
- License: mit
- Created: 2021-01-29T20:06:17.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2021-12-30T12:26:09.000Z (over 3 years ago)
- Last Synced: 2024-05-01T21:19:20.765Z (12 months ago)
- Language: Python
- Size: 77.1 KB
- Stars: 4
- Watchers: 2
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Sars-CoV-2-NGS-pipeline
A simple snakemake pipeline to call variant from NGS data of Sars-CoV-2 genome.
It depends on [bwa](http://bio-bwa.sourceforge.net/), [freebayes](https://github.com/freebayes/freebayes), [SnpEff/SnpSift](https://pcingola.github.io/SnpEff/) and [samtools](http://www.htslib.org/)## Installation
You can use conda to install dependencies.
1. ``git clone https://github.com/dridk/Sars-CoV-2-NGS-pipeline.git``
2. ``conda env create -f environment.yml``
3. ``conda activate covid``You can test the pipeline with our toys dataset :
``snakemake -p A.results.csv B.results.csv -j4``
## Configuration and execution
From ```config.yml``` set **FASTQ_DIR** variable with the folder containing your fastq files.
These files must follow the following pattern :- SAMPLENAME_1.fastq.gz
- SAMPLENAME_2.fastq.gzTo get result of a specific SAMPLENAME:
snakemake -p SAMPLENAME.results.csv
To get fasta genom of a specific SAMPLENAME:snakemake -p SAMPLENAME.fa
You can pass this consensus sequence to [Pangolin](https://github.com/cov-lineages/pangolin) to get the lineage.## Results
Each sample comes with a csv file with the following columns :
1. Gene Name
2. Feature ID
3. Variant position
4. Reference bases
5. Alternative bases
6. HGVS coding name
7. HGVS protein name
8. Impact
9. effect```
ANN[*].GENE ANN[*].FEATUREID POS REF ALT ANN[*].HGVS_C ANN[*].HGVS_P ANN[*].IMPACT ANN[*].EFFECT
ORF1ab GU280_gp01 490 T A c.225T>A p.Asp75Glu MODERATE missense_variant
ORF1ab YP_009725297.1 490 T A c.225T>A p.Asp75Glu MODERATE missense_variant
ORF1ab YP_009742608.1 490 T A c.225T>A p.Asp75Glu MODERATE missense_variant
ORF1ab GU280_gp01.2 490 T A c.225T>A p.Asp75Glu MODERATE missense_variant
ORF1ab YP_009725298.1 490 T A c.-316T>A MODIFIER upstream_gene_variant
ORF1ab YP_009742609.1 490 T A c.-316T>A MODIFIER upstream_gene_variant
ORF1ab YP_009725299.1 490 T A c.-2230T>A MODIFIER upstream_gene_variant
ORF1ab YP_009742610.1 490 T A c.-2230T>A MODIFIER upstream_gene_variant
```