{"id":26903967,"url":"https://github.com/quinlan-lab/snakemake_tutorial","last_synced_at":"2025-04-01T10:49:17.817Z","repository":{"id":188345513,"uuid":"676368659","full_name":"quinlan-lab/Snakemake_Tutorial","owner":"quinlan-lab","description":null,"archived":false,"fork":false,"pushed_at":"2025-03-24T20:42:34.000Z","size":411,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-24T21:34:47.395Z","etag":null,"topics":["joshua","pipeline","tutorial"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/quinlan-lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-08-09T03:46:20.000Z","updated_at":"2025-03-24T20:42:39.000Z","dependencies_parsed_at":"2023-08-15T00:25:54.556Z","dependency_job_id":null,"html_url":"https://github.com/quinlan-lab/Snakemake_Tutorial","commit_stats":null,"previous_names":["quinlan-lab/snakemake_tutorial"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quinlan-lab%2FSnakemake_Tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quinlan-lab%2FSnakemake_Tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quinlan-lab%2FSnakemake_Tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quinlan-lab%2FSnakemake_Tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/quinlan-lab","download_url":"https://codeload.github.com/quinlan-lab/Snakemake_Tutorial/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246628408,"owners_count":20808106,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["joshua","pipeline","tutorial"],"created_at":"2025-04-01T10:49:17.265Z","updated_at":"2025-04-01T10:49:17.804Z","avatar_url":"https://github.com/quinlan-lab.png","language":"Python","readme":"# Snakemake Tutorial\nJoshua L. Major-Mincer \nLast updated: 03/24/2025\n\n## WARNING\n**Please ensure that you are on a compute or interactive node if you are testing this on the CHPC, NOT on a login node!**\n\n## Quickstart\n```\n# Get snakemake tutorial. \ngit clone https://github.com/quinlan-lab/Snakemake_Tutorial.git\ncd Snakemake_Tutorial \n\n# Downloaded necessary files. \nwget https://snakemake-tutorial.s3.us-west-1.amazonaws.com/data.tar.gz -O data.tar.gz\n\nwget https://snakemake-tutorial.s3.us-west-1.amazonaws.com/reference.tar.gz -O reference.tar.gz\n\n# Uncompress files. \ntar -zxvf data.tar.gz\ntar -zxvf reference.tar.gz\n\n# Remove the archives if desired. \nrm data.tar.gz \nrm reference.tar.gz \n\n# Activate environment using either conda or environment modules.\n# ====== OPTION 1 - Conda\nconda env create -f envs/snakemake_env.yaml\nconda activate snakemake_tutorial\n\n# ====== OPTION 2 - Env Modules\nmodule load bwa/2020_03_19 fastp/0.20.1 R/4.4.0 samtools/1.16 snakemake/8.16.0\n\n```\n\n## Description\nThis repository serves as an introduction to Snakemake for the members of the Quinlan lab. Each subdirectory is a module meant to provide information and practice with concepts that can help with the development and portability of Snakemake workflows. The concepts covered here are meant to provide an introductory lesson to how Snakemake workflows are generally structured, function, and 'tips-n-tricks' to make the most out of reproducible workflows. Each module uses the same input files, those that are downloaded and extracted into `data/` and `reference/`. As such, if you want to run the modules' Snakefiles to practice, **download the data.tar.gz and reference.tar.gz files before proceeding!**\n\nThere may be more to follow, but for now, advanced concepts such as rule output aggregation, parallel SLURM job execution, etc. will not be included. **If you would like any such concepts to be covered, or have any suggestions, please let me know!**\n\n## Modules\nEach module serves a purpose to introduce one or a few new concepts to build upon the previous ones, ordered numerically. Each module contains a Snakefile(s), and the inputs referenced are `data/` and `reference/`. Each comes with a README, providing a description on the new concepts introduced, as well as the command-line to run the pipeline. \n1. Module 1 - Basics: Shows the bare-bones skeleton of Snakemake workflows and how to get a minimally-function workflow running. \n2. Module 2 - Muliptple Rules: Illustrates the chaining of inputs-outputs of many rules to make a longer, more comprehensive workflow. \n3. Module 3 - Workflow Decorations: Some rules may need specific resources, such as CPU cores or RAM limits. This module shows how to define these. \n4. Module 4 - Branching: An intermediate output might need to have two separate operations performed on it. In this instance, it is useful to know how to \"branch\" the workflow into deviating paths. \n5. Module 5 - Config: Configurations are `.yaml` files that provide flexibility to the workflow using defined dictionary options. \n6. Module 6 - Conda or Modules: Snakemake provides the ability to define software environments to run the rule with. In this module, we exemplify the usage of either conda environments or installed modules on a high performance computing environment (i.e. `module load tool/1.0.0`)\n7. Module 7 - ScriptsAndOOP: More complicated rules can be put into Python or R scripts, which is covered here. Additionally, illustrated is the Object Orientation of Snakemake and how to leverage this to make your workflow more legible. \n8. Module 8 - ProfilesAndSlurm: Information on how to run a pipeline using a cluster such as SLURM. Since this necessitates greater numbers of command line arguments, also covers profiles, which allow reproducible command line argument files.\n9. Module 9 - Full Pipeline: This workflow ties together all of these concepts, with a few minor changes that can clean up and make your workflows neater and more legible. \n\n\n## Acknowledgement\n1000 Genomes Acknowledgement for deep coverage of the 2504 phase 3 genomes (or subset thereof):\n\nThe following cell lines/DNA samples were obtained from the NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical Research: [NA06984, NA06985, NA06986, NA06989, NA06994, NA07000, NA07037, NA07048, NA07051, NA07056, NA07347, NA07357, NA10847, NA10851, NA11829, NA11830, NA11831, NA11832, NA11840, NA11843, NA11881, NA11892, NA11893, NA11894, NA11918, NA11919, NA11920, NA11930. NA11931, NA11932, NA11933, NA11992, NA11994, NA11995, NA12003, NA12004, NA12005, NA12006, NA12043, NA12044, NA12045, NA12046, NA12058, NA12144, NA12154, NA12155, NA12156, NA12234, NA12249, NA12272, NA12273, NA12275, NA12282, NA12283, NA12286, NA12287, NA12340, NA12341, NA12342, NA12347, NA12348, NA12383, NA12399, NA12400, NA12413,, NA12414, NA12489, NA12546, NA12716, NA12717, NA12718, NA12748, NA12749, NA12750, NA12751, NA12760, NA12761, NA12762, NA12763, NA12775, NA12776, NA12777, NA12778, NA12812, NA12813, NA12814, NA12815, NA12827, NA12828, NA12829, NA12830, NA12842, NA12843, NA12872, NA12873, NA12874, NA12878, NA12889, NA12890]. These data were generated at the New York Genome Center with funds provided by NHGRI Grant 3UM1HG008901-03S1.\n\n1000 Genomes Acknowledgement for deep coverage of the extended 3202 genomes (or subset thereof):\n \nThe following cell lines/DNA samples were obtained from the NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical Research: [NA06984, NA06985, NA06986, NA06989, NA06991, NA06993, NA06994, NA06995, NA06997, NA07000, NA07014, NA07019, NA07022, NA07029, NA07031, NA07034, NA07037, NA07045, NA07048, NA07051, NA07055, NA07056, NA07340, NA07345, NA07346, NA07347, NA07348, NA07349, NA07357, NA07435, NA10830, NA10831, NA10835, NA10836, NA10837, NA10838, NA10839, NA10840, NA10842, NA10843, NA10845, NA10846, NA10847, NA10850, NA10851, NA10852, NA10853, NA10854, NA10855, NA10856, NA10857, NA10859, NA10860, NA10861, NA10863, NA10864, NA10865, NA11829, NA11830, NA11831, NA11832, NA11839, NA11840, NA11843, NA11881, NA11882, NA11891, NA11892, NA11893, NA11894, NA11917, NA11918, NA11919, NA11920, NA11930, NA11931, NA11932, NA11933, NA11992, NA11993, NA11994, NA11995, NA12003, NA12004, NA12005, NA12006, NA12043, NA12044, NA12045, NA12046, NA12056, NA12057, NA12058, NA12144, NA12145, NA12146, NA12154, NA12155, NA12156, NA12234, NA12239, NA12248, NA12249, NA12264, NA12272, NA12273, NA12274, NA12275, NA12282, NA12283, NA12286, NA12287, NA12329, NA12335, NA12336, NA12340, NA12341, NA12342, NA12343, NA12344, NA12347, NA12348, NA12375, NA12376, NA12383, NA12386, NA12399, NA12400, NA12413, NA12414, NA12485, NA12489, NA12546, NA12707, NA12708, NA12716, NA12717, NA12718, NA12739, NA12740, NA12748, NA12749, NA12750, NA12751, NA12752, NA12753, NA12760, NA12761, NA12762, NA12763, NA12766, NA12767, NA12775, NA12776, NA12777, NA12778, NA12801, NA12802, NA12812, NA12813, NA12814, NA12815, NA12817, NA12818, NA12827, NA12828, NA12829, NA12830, NA12832, NA12842, NA12843, NA12864, NA12865, NA12872, NA12873, NA12874, NA12875, NA12877, NA12878, NA12889, NA12890, NA12891, NA12892]. These data were generated at the New York Genome Center with funds provided by NHGRI Grants 3UM1HG008901-03S1 and 3UM1HG008901-04S2.\n\nPending publication of our manuscript, please cite our biorxiv preprint (Byrska-Bishop et al.) as the marker paper for these data: https://www.biorxiv.org/content/10.1101/2021.02.06.430068v1 or doi: https://doi.org/10.1101/2021.02.06.430068.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquinlan-lab%2Fsnakemake_tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquinlan-lab%2Fsnakemake_tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquinlan-lab%2Fsnakemake_tutorial/lists"}