{"id":32596223,"url":"https://github.com/schlosslab/mikropml-snakemake-workflow","last_synced_at":"2025-10-30T04:58:19.791Z","repository":{"id":41810683,"uuid":"292886119","full_name":"SchlossLab/mikropml-snakemake-workflow","owner":"SchlossLab","description":"Snakemake template for building reusable and scalable machine learning pipelines with mikropml ","archived":false,"fork":false,"pushed_at":"2025-02-26T02:40:45.000Z","size":7087,"stargazers_count":12,"open_issues_count":9,"forks_count":4,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-02-26T03:26:50.570Z","etag":null,"topics":["machine-learning","rstats","snakemake"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SchlossLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":".github/SUPPORT.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-09-04T15:42:46.000Z","updated_at":"2025-02-26T02:40:47.000Z","dependencies_parsed_at":"2025-02-26T03:23:06.955Z","dependency_job_id":"76e07d98-181f-4698-9c31-df25d7fdbffd","html_url":"https://github.com/SchlossLab/mikropml-snakemake-workflow","commit_stats":null,"previous_names":[],"tags_count":4,"template":true,"template_full_name":null,"purl":"pkg:github/SchlossLab/mikropml-snakemake-workflow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SchlossLab%2Fmikropml-snakemake-workflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SchlossLab%2Fmikropml-snakemake-workflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SchlossLab%2Fmikropml-snakemake-workflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SchlossLab%2Fmikropml-snakemake-workflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SchlossLab","download_url":"https://codeload.github.com/SchlossLab/mikropml-snakemake-workflow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SchlossLab%2Fmikropml-snakemake-workflow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281748722,"owners_count":26554822,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-30T02:00:06.501Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","rstats","snakemake"],"created_at":"2025-10-30T04:58:04.994Z","updated_at":"2025-10-30T04:58:19.786Z","avatar_url":"https://github.com/SchlossLab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Run mikropml with snakemake \u003cimg src='figures/mikropml-snakemake-workflow.png' align=\"right\" height=\"120\" /\u003e\n\n\u003c!--[![tests](https://github.com/SchlossLab/mikropml-snakemake-workflow/actions/workflows/tests.yml/badge.svg)](https://github.com/SchlossLab/mikropml-snakemake-workflow/actions/workflows/tests.yml)--\u003e\n[![build](https://github.com/SchlossLab/mikropml-snakemake-workflow/actions/workflows/build.yml/badge.svg)](https://github.com/SchlossLab/mikropml-snakemake-workflow/actions/workflows/build.yml)\n[![tests](https://github.com/SchlossLab/mikropml-snakemake-workflow/actions/workflows/tests.yml/badge.svg)](https://github.com/SchlossLab/mikropml-snakemake-workflow/actions/workflows/tests.yml)\n[![License](https://img.shields.io/badge/license-MIT-blue)](/LICENSE.md)\n[![DOI](https://zenodo.org/badge/292886119.svg)](https://zenodo.org/badge/latestdoi/292886119)\n\n[Snakemake](https://snakemake.readthedocs.io/en/stable) is a workflow manager\nthat enables massively parallel and reproducible\nanalyses.\nSnakemake is a suitable tool to use when you can break a workflow down into\ndiscrete steps, with each step having input and output files.\n\n[mikropml](http://www.schlosslab.org/mikropml/) is an R package for supervised machine learning pipelines.\nWe provide this example workflow as a template to get started running mikropml with snakemake.\nWe hope you then customize the code to meet the needs of your particular ML task.\n\nFor more details on these tools, see the\n[Snakemake tutorial](https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html)\nand read the [mikropml docs](http://www.schlosslab.org/mikropml/).\n\n## The Workflow\n\nThe [`Snakefile`](workflow/Snakefile) contains rules which define the output files we want and how to make them.\nSnakemake automatically builds a directed acyclic graph (DAG) of jobs to figure\nout the dependencies of each of the rules and what order to run them in.\nThis workflow preprocesses the example dataset, calls `mikropml::run_ml()`\nfor each seed and ML method set in the config file,\ncombines the results files, plots performance results\n(cross-validation and test AUROCs, hyperparameter AUROCs from cross-validation, and benchmark performance),\nand renders a simple [R Markdown report](report.Rmd) as a GitHub-flavored markdown file ([see example here](report-example.md)).\n\n\u003c!-- snakemake make_graph_figures --\u003e\n![rulegraph](figures/graphviz/rulegraph.png)\n\nThe DAG shows how calls to `run_ml` can run in parallel if\nsnakemake is allowed to run more than one job at a time.\nIf we use 100 seeds and 4 ML methods, snakemake would call `run_ml` 400 times.\nHere's a small example DAG if we were to use only 2 seeds and 1 ML method:\n\n\u003c!-- snakemake make_graph_figures --\u003e\n![dag](figures/graphviz/dag.png)\n\n## Usage\n\nFull usage instructions recommended by snakemake are available in the \n[snakemake workflow catalog](https://snakemake.github.io/snakemake-workflow-catalog/?usage=SchlossLab/mikropml-snakemake-workflow).\nSnakemake recommends using `snakedeploy` to use this workflow as a module in \nyour own project.\n\nAlternatively, you can download this repo and modify the code \ndirectly to suit your needs. See instructions [here](/quick-start.md).\n\n## Help \u0026 Contributing\n\nIf you come across a bug, [open an\nissue](https://github.com/SchlossLab/mikropml-snakemake-workflow/issues)\nand include a minimal reproducible example.\n\nIf you have questions, create a new post in\n[Discussions](https://github.com/SchlossLab/mikropml-snakemake-workflow/discussions).\n\nIf you’d like to contribute, see our guidelines\n[here](.github/CONTRIBUTING.md).\n\n## Code of Conduct\n\nPlease note that the mikropml-snakemake-workflow is released with a\n[Contributor Code of Conduct](.github/CODE_OF_CONDUCT.md).\nBy contributing to this project, you agree to abide by its terms.\n\n## More resources\n\n- [mikropml docs](http://www.schlosslab.org/mikropml/)\n- [Snakemake tutorial](https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html)\n- [conda user guide](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fschlosslab%2Fmikropml-snakemake-workflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fschlosslab%2Fmikropml-snakemake-workflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fschlosslab%2Fmikropml-snakemake-workflow/lists"}