{"id":42623437,"url":"https://github.com/scaramir/covid-assembly","last_synced_at":"2026-01-29T04:29:34.997Z","repository":{"id":197091376,"uuid":"694811233","full_name":"Scaramir/Covid-Assembly","owner":"Scaramir","description":"Project 2 from SC2 @ FUB - Snakemake implementation to compare Illumina and Nanopore sequence assembly quality of the same sample.","archived":false,"fork":false,"pushed_at":"2025-03-13T17:25:53.000Z","size":184,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-13T18:35:46.857Z","etag":null,"topics":["assembly","consenus","illumina","nanopore","snakemake","university-project"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Scaramir.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-09-21T18:36:53.000Z","updated_at":"2025-03-13T17:25:57.000Z","dependencies_parsed_at":"2023-12-31T00:13:05.034Z","dependency_job_id":"08ec425c-6a8f-47ff-96fe-2c1ac1ea16bd","html_url":"https://github.com/Scaramir/Covid-Assembly","commit_stats":{"total_commits":76,"total_committers":4,"mean_commits":19.0,"dds":0.6052631578947368,"last_synced_commit":"c8a5219905b06e2fa104331e8c808939c1e8d874"},"previous_names":["scaramir/covid-assembly"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Scaramir/Covid-Assembly","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Scaramir%2FCovid-Assembly","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Scaramir%2FCovid-Assembly/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Scaramir%2FCovid-Assembly/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Scaramir%2FCovid-Assembly/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Scaramir","download_url":"https://codeload.github.com/Scaramir/Covid-Assembly/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Scaramir%2FCovid-Assembly/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28862184,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-28T22:56:21.783Z","status":"online","status_checked_at":"2026-01-29T02:00:06.714Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["assembly","consenus","illumina","nanopore","snakemake","university-project"],"created_at":"2026-01-29T04:29:34.926Z","updated_at":"2026-01-29T04:29:34.987Z","avatar_url":"https://github.com/Scaramir.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![DOI](https://zenodo.org/badge/694811233.svg)](https://doi.org/10.5281/zenodo.15020719)\n\n# Project 2 - SARS-CoV-2 genome assembly from Illumina \u0026 Nanopore data\n#### Project 2 from SC2 @ FUB\n#### Jule Brenningmeyer, Maximilian Otto\n\nThis repository contains a workflow to assemble the SARS-CoV-2 genome from Illumina and Nanopore data and finally compares the assembly quality between the two techniques.   \nThe workflow is based on Snakemake and uses Conda to manage the software dependencies.  \nBesides assembling the genomes of the sample files, the workflow also performs quality control on the raw reads as well as on the assembled consensus sequences.  \nTo perform the assembly on both sequencing technologies, it is required, that the Illumina data is paired-end and the Nanopore data is single-end amplicon data. Additionally, the Illumina data needs to be demultiplexed and the Nanopore data needs to be basecalled.  \nNOTE: \"illumina\" or \"nanopore\" should be included in the corresponding file names. The file naming convention this script is built upon can be derived from the example data set.    \nThe workflow is designed to be run on a Linux system with an active conda environment in which `snakemake` is installed and activated (`conda activate snakemake`).  \n\nThe presentation, including the results, can be found [here](https://docs.google.com/presentation/d/139hQzr9hJuHUSIcze_c5MqbzL4uuP447vb-4YYWZpiE/edit?usp=sharing).\n\n## Workflow\nThe workflow consists of the following rules:\n\n![Workflow](snakemake_rules_workflow_graphviz.png)\n\n\n## Usage\n### Environment\n\n```bash\n\n# download the repository to the current working directory using git \ngit clone https://github.com/Scaramir/Covid-Assembly.git\n\ncd Covid-Assembly/\n```\n\n### Example Data\n\n```bash\n\nmkdir data\n\n# Illumina and nanopore data\nwget --no-check-certificate https://osf.io/yz4ad/download -O data/sc2-nanopore-illumina-reads.tar.gz\ntar -xzvf data/sc2-nanopore-illumina-reads.tar.gz -C data/\n\n# Reference data\nwget \"https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=NC_045512.2\u0026db=nuccore\u0026report=fasta\u0026retmode=text\u0026withmarkup=on\u0026tool=portal\u0026log$=seqview\u0026maxdownloadsize=1000000\" -O data/NC_045512.2.fasta\n\n# Python Script to Convert bed to bedpe files\nmkdir scripts\nwget --no-check-certificate https://osf.io/3295h/download -O scripts/primerbed2bedpe.py\n\n# primer scheme folder\nmkdir data/primer_scheme\n\n# Illumina\n# Download the primer BED scheme\n# V3\nwget https://raw.githubusercontent.com/artic-network/artic-ncov2019/master/primer_schemes/nCoV-2019/V3/nCoV-2019.scheme.bed -O data/primer_scheme/V3-nCoV-2019.scheme.bed\n\n# Nanopore\n# First, we download the primer BED scheme\n# ARTIC V4.1 primer kit\nwget https://raw.githubusercontent.com/artic-network/artic-ncov2019/master/primer_schemes/nCoV-2019/V4.1/SARS-CoV-2.scheme.bed -O data/primer_scheme/V4.1-SARS-CoV-2.scheme.bed\n\n# ! ATTENTION ! remove the files starting with \"._\" from the data folder\nrm data/*/**/._*\n\n```\n\n### Run\nTo run the workflow, you need to have `snakemake` installed and activate before you can execute it like so:\n```bash\nsnakemake --cores 16 --use-conda\n```\nSet `cores` to the maximum number of threads you want to Snakemake to distribute jobs to.   \nMost jobs use 4 threads, so 16 cores should be fine.\n\nTo perform quality control using FastQC for Illumina data, additional to fastp,just run snakemake like this: \n```bash\nsnakemake --cores 16 --use-conda -p fastqc_illumina\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscaramir%2Fcovid-assembly","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscaramir%2Fcovid-assembly","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscaramir%2Fcovid-assembly/lists"}