{"id":41312795,"url":"https://github.com/jts/ncov-tools","last_synced_at":"2026-01-23T05:26:40.568Z","repository":{"id":37835703,"uuid":"259666307","full_name":"jts/ncov-tools","owner":"jts","description":"Small collection of tools for performing quality control on coronavirus sequencing data and genomes","archived":false,"fork":false,"pushed_at":"2024-04-25T20:37:20.000Z","size":299,"stargazers_count":45,"open_issues_count":7,"forks_count":16,"subscribers_count":14,"default_branch":"master","last_synced_at":"2024-04-25T21:27:21.537Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jts.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2020-04-28T14:49:52.000Z","updated_at":"2024-04-25T21:27:25.322Z","dependencies_parsed_at":"2023-01-25T07:30:16.257Z","dependency_job_id":"2318ceb4-bafd-46a1-8eab-a4574a045bf6","html_url":"https://github.com/jts/ncov-tools","commit_stats":null,"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"purl":"pkg:github/jts/ncov-tools","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jts%2Fncov-tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jts%2Fncov-tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jts%2Fncov-tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jts%2Fncov-tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jts","download_url":"https://codeload.github.com/jts/ncov-tools/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jts%2Fncov-tools/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28680692,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-23T04:33:33.518Z","status":"ssl_error","status_checked_at":"2026-01-23T04:33:30.433Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-23T05:26:40.508Z","updated_at":"2026-01-23T05:26:40.563Z","avatar_url":"https://github.com/jts.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ncov-tools\n\nTools and plots for perfoming quality control on coronavirus sequencing results.\n\n## Installation\n\nDownload the package:\n```\ngit clone https://github.com/jts/ncov-tools\ncd ncov-tools\n```\n\nTo use this package, install the dependencies using conda:\n```\nconda env create -f workflow/envs/environment.yml\n```\n\nAlternatively, if install times are very slow using conda, we recommend using\nthe conda wrapper: [mamba](https://github.com/TheSnakePit/mamba).\n\nInstall mamba as follows:\n```\nconda install -c conda-forge mamba\n```\n\nThen create the ncov-tools environment using mamba\n\n```\nmamba env create -f workflow/envs/environment.yml\n```\n\nEither way, if you used conda directly or mamba, activate the conda package:\n\n```\nconda activate ncov-qc\n```\n\n## Required Configuration\n\nThis package is implemented as a snakemake pipeline, so requires a `config.yaml` file to describe where the input files are. To generate QC plots, a bam file with reads mapped to a reference genome is required. Consensus sequences (FASTA) are needed to generate a phylogenetic tree with associated mutations.\n\nAs an example, let's say your data is laid out in the following structure:\n\n```\n   run_200430/\n     sampleA.sorted.bam\n     sampleA.consensus.fasta\n     sampleB.sorted.bam\n     sampleB.consensus.fasta\n   resources/\n     artic_reference.fasta\n     V3/\n        nCoV-2019.bed\n```\n\nThen your config.yaml should look like:\n\n```\n# path to the top-level directory containing the analysis results\ndata_root: run_200430\n\n# optionally the plots can have a \"run name\" prefix. If this is not defined the prefix will be \"default\"\nrun_name: my_run\n\n# path to the nCov reference genome\nreference_genome: resources/artic_reference.fasta\n\n# the sequencing platform used, can be \"oxford-nanopore\" or \"illumina\"\nplatform: \"oxford-nanopore\"\n\n# path to the BED file containing the primers, this should follow the format downloaded from\n# the ARTIC repository\nprimer_bed: resources/V3/nCoV-2019.bed\n```\n\nThe pipeline is designed to work with the results of `ivar` (illumina) or the artic-ncov2019/fieldbioinformatics workflow (oxford nanopore). It will automatically detect the names of the output files (BAMs, consensus fasta, variants) from these workflows using the `platform` value. If you used a different workflow, you can set the following options to help the pipeline find your files:\n\n```\n# the naming convention for the bam files\n# this can use the variables {data_root} (as above) and {sample}\n# As per the example above, this will expand to run_200430/sampleA.sorted.bam for sampleA\nbam_pattern: \"{data_root}/{sample}.sorted.bam\"\n\n# the naming convention for the consensus sequences\nconsensus_pattern: \"{data_root}/{sample}.consensus.fasta\"\n\n# the naming convention for the variants file, NF illumina runs typically use\n# \"{data_root}/{sample}.variants.tsv and oxford nanopore runs use \"{data_root}/{sample}.pass.vcf.gz\"\nvariants_pattern: \"{data_root}/{sample}.variants.tsv\n```\n\n## Metadata (optional)\n\nSome plots and QC statistics can be augmented with metadata like the qPCR Ct values, or the date the sample was collected. To enable this feature, add the path to the metadata to config.yaml:\n\n```\nmetadata: \"/path/to/metadata.tsv\"\n```\n\nThe expected metadata file is a simple TSV with a `sample` field and optional `ct` and `date` fields. Other fields can be provided but will be ignored.\n\n```\nsample   ct     date\nsampleA  20.8   2020-05-01\nsampleB  27.1   2020-06-02\n```\n\nWhen providing the metadata, the value `NA` can be used for missing data.\n\n## Other optional configuration\n\nAdditional features can be turned on by adding to the config if desired:\n\n```\n#\n# if a list of sample IDs for negative controls is provided, a report containing the amount\n# of coverage detected in the negative controls can be generated\n#\nnegative_control_samples: [ \"NTC-1\", \"NTC-2\" ]\n\n#\n# when building a tree of the consensus genomes you can optionally include other sequences\n# in the tree by providing a fasta file here\n#\ntree_include_consensus: some_genomes_from_gisaid.fasta\n\n# list the type of amplicon BED file that will be created from the \"primer_bed\".  This can include:\n# full -- amplicons including primers and overlaps listed in the primer BED file\n# no_primers -- amplicons including overlaps but with primers removed\n# unique_amplicons -- distinct amplicons regions with primers and overlapping regions removed\nbed_type: unique_amplicons\n\n# minimum completeness threshold for inclusion to the SNP tree plot, if no entry\n# is provided the default is set to 0.75\ncompleteness_threshold: 0.9\n\n# the set of mutations to automatically flag in the QC reports\n# this can be the name of one of the watchlists built into ncov-watch\n# or the path to a local VCF file. \n# Built in lists: https://github.com/jts/ncov-watch/tree/master/ncov_watch/watchlists\nmutation_set: spike_mutations\n\n# user specifiable output directory \n# defaults to just current working directory but otherwise\n# will write output files to the specified directory\noutput_directory: run1_output\n\n# primer name prefix used in the primer scheme BED file, the default\n# value is \"nCoV-2019\" which is used for ARTIC V3, note that\n# ARTIC V4.1 uses `SARS-CoV-2`\nprimer_prefix: \"SARS-CoV-2\"\n```\n\n## Running\n\nAfter configuration, you can run the pipeline using Snakemake\n\n```\n# Build the sequencing QC plots (coverage, allele frequencies)\nsnakemake -s workflow/Snakefile all_qc_sequencing\n\n# Build the analysis QC plots (tree with annotated mutations)\nsnakemake -s workflow/Snakefile all_qc_analysis\n\n# Build the quality report tsv files (in qc_reports directory) \nsnakemake -s workflow/Snakefile all_qc_reports\n```\n\nThere is also an  `all` rule that executes the three rules noted above in one `snakemake` command:\n```\n# Build all the reports and plots\nsnakemake -s workflow/Snakefile all\n```\n\nYou can also build a single PDF summary with the main plots and results. This requires a working \ninstallation of pdflatex, which is not provided through the environment\n\n```\nsnakemake -s workflow/Snakefile all_final_report\n```\n\n\n## Output\n\n```\n# A plot containing the coverage depth across the SARS-CoV-2 reference genome for each sample in the run\nplots/run_name_depth_by_position.pdf\n\n# A plot containing the coverage across all samples, plotted as a heatmap across amplicons\nplots/run_name_amplicon_coverage_heatmap.pdf\n\n# A plot with the variation found within each sample, plotted as a tree with associated SNP matrix\nplots/run_name_tree_snps.pdf\n\n# A report on per-sample quality metrics and pass/warn/fail criteria\nqc_reports/run_name_summary_qc.tsv\n\n# A report on coverage within each negative control\nqc_reports/run_name_negative_control_report.tsv\n\n# A report on positions within the genome that are consistently ambiguous across samples (an indicator of possible contamination)\nqc_reports/run_name_ambiguous_report.tsv\n\n# A report on samples that have evidence for a mixture of alleles at multiple positions (this code is experimental and still being tested)\nqc_reports/run_name_mixture_report.tsv\n```\n\n## Variant Annotation\nSNVs and Indels are annotated using SNPEff.  The `MN908947.3` SNPEff database\nis part of the standard set of genomes.\n\nCurrently the database is not available for download and requires building.  To\ndownload the NCBI gene file and build the database, run the following:\n\n```\nsnakemake -s workflow/Snakefile --cores 1 build_snpeff_db\n```\n\nOnce the database has been built, the workflow can be run using:\n\n```\nsnakemake -s workflow/Snakefile --cores 2 all_qc_annotation\n```\n\nVariant annotation output can be found in `qc_annotation` and the recurrent\namino acid change heatmap can be found in `plots/\u003cprefix\u003e_aa_mutation_heatmap.pdf`.\n\n\n## Pangolin Version 4\nPangolin version 4 included several changes which required updates\nto the `ncov-tools` environment.  By default, `ncov-tools` will run pangolin\n4 and will require changes to `ncov-parser` version 1.9 to parse the output\nand populate the summary QC file.\n\nBackward compability with Pangolin 3 is available and will require the following\nparameter addition in the `config.yaml` file:\n```\npangolin_version: \"3\"\n```\nNote that the specific version is not required, only if it is \"3\" or \"4\".\n\nSupport for option `--analysis-mode` for `pangolin 4` has been provided as of `ncov-tools`\nversion 1.9.1.  The `config.yaml` file should contain the following entry:\n```\npango_analysis_mode: \"accurate\"\n```\nThe available options are: `accurate (default)` and `fast`.  See the `pangolin`\n[documentation](https://cov-lineages.org/resources/pangolin/usage.html) for further details.\n\n## Credit and Acknowledgements\n\n* The tree-with-SNPs plot was inspired by a plot shared by Mads Albertsen.\n\n* The script to convert `variants.tsv` files into `.vcf` files was obtained\n  from: `https://github.com/nf-core/viralrecon/blob/dev/bin/ivar_variants_to_vcf.py`\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjts%2Fncov-tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjts%2Fncov-tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjts%2Fncov-tools/lists"}