{"id":18448835,"url":"https://github.com/sequana/variant_calling","last_synced_at":"2025-04-08T01:32:41.164Z","repository":{"id":43257174,"uuid":"223250401","full_name":"sequana/variant_calling","owner":"sequana","description":null,"archived":false,"fork":false,"pushed_at":"2025-02-28T15:46:42.000Z","size":761,"stargazers_count":0,"open_issues_count":9,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-30T23:18:09.162Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sequana.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-11-21T19:35:46.000Z","updated_at":"2025-02-28T15:36:46.000Z","dependencies_parsed_at":"2024-06-12T16:39:27.826Z","dependency_job_id":"b153eae8-d224-4583-a357-065841cb5e13","html_url":"https://github.com/sequana/variant_calling","commit_stats":{"total_commits":73,"total_committers":3,"mean_commits":"24.333333333333332","dds":0.04109589041095896,"last_synced_commit":"2070740e62eaa609a101d150add2286013084a2d"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sequana%2Fvariant_calling","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sequana%2Fvariant_calling/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sequana%2Fvariant_calling/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sequana%2Fvariant_calling/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sequana","download_url":"https://codeload.github.com/sequana/variant_calling/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247760641,"owners_count":20991520,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T07:17:25.071Z","updated_at":"2025-04-08T01:32:41.144Z","avatar_url":"https://github.com/sequana.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n.. image:: https://badge.fury.io/py/sequana-variant-calling.svg\n     :target: https://pypi.python.org/pypi/sequana_variant_calling\n\n.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg\n    :target: http://joss.theoj.org/papers/10.21105/joss.00352\n    :alt: JOSS (journal of open source software) DOI\n\n.. image:: https://github.com/sequana/variant_calling/actions/workflows/main.yml/badge.svg\n   :target: https://github.com/sequana/variant_calling/actions\n\n.. image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C3.10-blue.svg\n    :target: https://pypi.python.org/pypi/sequana\n    :alt: Python 3.8 | 3.9 | 3.10\n\nThis is the **variant_calling** pipeline from the `Sequana \u003chttps://sequana.readthedocs.org\u003e`_ projet\n\n:Overview: Variant calling from FASTQ files\n:Input: FASTQ files from Illumina Sequencing instrument\n:Output: VCF and HTML files\n:Status: production\n:Citation: Cokelaer et al, (2017), 'Sequana': a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi:10.21105/joss.00352\n\n\nInstallation\n~~~~~~~~~~~~\n\nYou can install sequana_variant_calling pipeline using::\n\n    pip install sequana_variant_calling --upgrade\n\nI would recommend to setup a *sequana_variant_calling* conda environment executing::\n\n    conda env create -f environment.yml\n\nwhere the environment.yml can be found in the https://github.com/sequana/variant_calling repository.\n\nLater, you can activate the environment as follows::\n\n  conda activate sequana_variant_calling\n\nNote, however, that the recommended method is to use singularity/apptainer as explained here below.\n\n\nUsage\n~~~~~\n\n::\n\n    sequana_variant_calling --input-directory DATAPATH --reference-file measles.fa \n\nThis creates a directory **variant_calling**. You just need to move into the directory and execute the script::\n\n    cd variant_calling\n    sh variant_calling.sh\n\nThis launch a snakemake pipeline. If you are familiar with snakemake, you can\nretrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters::\n\n    snakemake -s variant_calling.rules -c config.yaml --cores 4 --stats stats.txt\n\nyou can also edit the profile file in .sequana/profile/config.ya,l\n\nOr use `sequanix \u003chttps://sequana.readthedocs.io/en/main/sequanix.html\u003e`_ interface.\n\nUsage with singularity::\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWith singularity, initiate the working directory as follows::\n\n    sequana_variant_calling --use-singularity --singularity-prefix ~/.sequana/apptainers\n\nImages are downloaded in a global direcory (here .sequana/apptainers) so that you can reuse them later.\n\nand then as before::\n\n    cd variant_calling\n    sh variant_calling.sh\n\nif you decide to use snakemake manually, do not forget to add singularity options::\n\n    snakemake -s variant_calling.rules -c config.yaml --cores 4 --stats stats.txt --use-singularity --singularity-prefix ~/.sequana/apptainers --singularity-args \"-B /home:/home\"\n\nRequirements\n~~~~~~~~~~~~\n\nIf you rely on singularity/apptainer, no extra dependencies are required (expect python and\nhttps://damona.readthedocs.io). If you cannot use apptainer, you will need to install some software: \n\n- bwa\n- freebayes\n- picard (picard-tools)\n- sambamba\n- minimap2\n- samtools\n- snpEff you will need 5.0 or 5.1d (note the d); 5.1 does not work.\n\n\n.. image:: https://raw.githubusercontent.com/sequana/sequana_variant_calling/main/sequana_pipelines/variant_calling/dag.png\n\nDetails\n~~~~~~~~\n\nSnakemake variant calling pipeline is based on\n`tutorial \u003chttps://github.com/ekg/alignment-and-variant-calling-tutorial\u003e`_\nwritten by Erik Garrison. Input reads (paired or single) are mapped using\n`bwa \u003chttp://bio-bwa.sourceforge.net/\u003e`_ and sorted with\n`sambamba-sort \u003chttp://lomereiter.github.io/sambamba/docs/sambamba-sort.html\u003e`_.\nPCR duplicates are marked with\n`sambamba-markdup \u003chttp://lomereiter.github.io/sambamba/docs/sambamba-sort.html\u003e`_.\n`Freebayes \u003chttps://github.com/ekg/freebayes\u003e`_ is used to detect SNPs and short\nINDELs. The INDEL realignment and base quality recalibration are not necessary\nwith Freebayes. For more information, please refer to a post by Brad Chapman on\n`minimal BAM preprocessing methods\n\u003chttps://bcbio.wordpress.com/2013/10/21/updated-comparison-of-variant-detection-methods-ensemble-freebayes-and-minimal-bam-preparation-pipelines/\u003e`_.\n\nThe pipeline provides an analysis of the mapping coverage using\n`sequana coverage \u003chttp://www.biorxiv.org/content/early/2016/12/08/092478\u003e`_.\nIt detects and characterises automatically low and high genome coverage regions.\n\nDetected variants are annotated with `SnpEff \u003chttp://snpeff.sourceforge.net/\u003e`_ if a\nGenBank file is provided. The pipeline does the database building automatically.\nAlthough most of the species should be handled automatically, some special cases\nsuch as particular codon table will required edition of the snpeff configuration file.\n\nFinally, joint calling is also available and can be switch on if desired.\n\nTutorial\n~~~~~~~~\n\nLet us download an ecoli reference genome and the data set used to create the assembly. All tools used here below can be\ninstalled with damona (or your favorite environment manager)::\n\n    pip install damona\n    damona create TEST\n    damona activate TEST\n    damona install pigz\n    damona install sratoolkit # for fasterq-dump\n    damona install datasets\n\nThen, download the data::\n\n    fasterq-dump SRR13921546\n    pigz SRR*fastq\n\nand the reference genome with its annnotation::\n\n    datasets download genome accession GCF_000005845.2 --include gff3,rna,cds,protein,genome,seq-report,gbff\n    unzip ncbi_dataset.zip\n    ln -s ncbi_dataset/data/GCF_000005845.2/GCF_000005845.2_ASM584v2_genomic.fna ecoli.fa\n    ln -s ncbi_dataset/data/GCF_000005845.2/genomic.gff ecoli.gff\n\n\nInitiate the pipeline::\n \n    sequana_variant_calling --input-directory . --reference-file ecoli.fa --aligner-choice bwa_split \\\n        --do-coverage --annotation-file ecoli.gff  \\\n        --use-apptainer --apptainer-prefix ~/.sequana/apptainers \\ \n        --input-readtag \"_[12].\" \n\nExplication:\n\n- we use apptainer/singularity\n- we use the reference genome ecoli.fa (--reference-file) and its annotation for SNPeff (--annotation-file)\n- we use the sequana_coverage tool (True by default) to get coverage plots.\n- we use --input-directory to indicatre where to find the input files\n- This data set is paired. In NGS, it is common to have _R1_ and _R2_ tags to differentiate the 2 files. Here the tag\nare _1 and _2. In sequana we define the a wildcard for the read tag. So here we tell the software that thex ecpted tag\nfollow this pattern: \"_[12].\" and everything is then automatic.\n\nThen follow the instructions (prepare and execute the pipeline).\n\nYou should end up with a summary.hml report.\n\n\nYou can browse the different samples (only one in this example) and get a table with variant calls:\n\n    https://raw.githubusercontent.com/sequana/variant_calling/refs/heads/main/doc/table.png\n\nIf you set the coverage one, (not recommended for eukaryotes), you should see this kind of plots:\n\n    https://raw.githubusercontent.com/sequana/variant_calling/refs/heads/main/doc/coverage.png\n\n\n\n\n\nChangelog\n~~~~~~~~~\n\n========= ======================================================================\nVersion   Description\n========= ======================================================================\n1.3.0     * Updated version to use latest damona containers and latest \n            sequana version 0.19.1. added plot in HTML report with distribution\n            of variants. added tutorial. added bwa_split and freebaye split to \n            process ultra deep sequencing.\n1.2.0     * -Xmx8g option previously added is not robust. Does not work with\n            snpEff 5.1 for instance.\n          * add minimap aligner\n          * add --nanopore and --pacbio to automatically set minimap2 as the\n            aligner and the minimap options (map-pb or map-ont)\n          * add minimap2 container.\n          * add missing resources in snpeff section\n1.1.2     * add -Xmx8g option in snpeff rule at the build stage.\n          * add resources (8G) in the snpeff rule at run stage\n          * fix missing output_directory in sequana_coverage rule\n          * fix joint calling (regression) input function and inputs\n1.1.1     * Fix regression in coverage rule\n1.1.0     * add specific apptainer for freebayes (v1.2.0)\n          * Update API to use click\n1.0.2     * Fixed failure in multiqc if coverage and snpeff are off\n1.0.1     * automatically fill the bwa index algorithm and fix bwa_index rule to\n            use the options in the config file (not the harcoded one)\n1.0.0     * use last warppers and graphviz apptainer\n0.12.0    * set all apptainers containers and add vcf to bcf conversions\n          * Update rule sambamba to use latest wrappers\n0.11.0    * Add singularity containers\n0.10.0    * fully integrated sequana wrappers and simplification of HTML reports\n0.9.10    * Uses new sequana_pipetools and wrappers\n0.9.5     * fix typo in the onsuccess and update sequana requirements to use\n            most up-to-date snakemake rules\n0.9.4     * fix typo related to the reference-file option new name not changed\n            everyhere in the pipeline.\n0.9.3     * use new framework (faster --help, --from-project option)\n          * rename --reference into --reference-file and --annotation to\n            --annotation-file\n          * add custom summary page\n          * add multiqc config file\n0.9.2     * snpeff output files are renamed sample.snpeff (instead of\n            samplesnpeff)\n          * add multiqc to show sequana_coverage and snpeff summary sections\n          * cleanup onsuccess section\n          * more options sanity checks and options (e.g.,\n          * genbank_file renamed into annotation_file in the config\n          * use --legacy in freebayes options\n          * fix coverage section to use new sequana api\n          * add the -do-coverage, --do-joint-calling options as well as\n            --circular and --frebayes--ploidy\n0.9.1     * Fix input-readtag, which was not populated\n0.9.0     First release\n========= ======================================================================\n\nContribute \u0026 Code of Conduct\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo contribute to this project, please take a look at the\n`Contributing Guidelines \u003chttps://github.com/sequana/sequana/blob/maib/CONTRIBUTING.rst\u003e`_ first. Please note that this project is released with a\n`Code of Conduct \u003chttps://github.com/sequana/sequana/blob/main/CONDUCT.md\u003e`_. By contributing to this project, you agree to abide by its terms.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsequana%2Fvariant_calling","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsequana%2Fvariant_calling","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsequana%2Fvariant_calling/lists"}