{"id":20456551,"url":"https://github.com/krassowski/nbpipeline","last_synced_at":"2025-04-13T04:05:22.098Z","repository":{"id":57445137,"uuid":"188075188","full_name":"krassowski/nbpipeline","owner":"krassowski","description":"Snakemake-like pipeline manager for reproducible Jupyter Notebooks","archived":false,"fork":false,"pushed_at":"2021-10-04T19:36:41.000Z","size":431,"stargazers_count":17,"open_issues_count":5,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-13T04:05:06.394Z","etag":null,"topics":["jupyter","jupyter-notebook","snakemake"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/krassowski.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-05-22T16:24:15.000Z","updated_at":"2023-07-07T01:39:18.000Z","dependencies_parsed_at":"2022-09-26T17:30:47.071Z","dependency_job_id":null,"html_url":"https://github.com/krassowski/nbpipeline","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krassowski%2Fnbpipeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krassowski%2Fnbpipeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krassowski%2Fnbpipeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krassowski%2Fnbpipeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/krassowski","download_url":"https://codeload.github.com/krassowski/nbpipeline/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248661707,"owners_count":21141450,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["jupyter","jupyter-notebook","snakemake"],"created_at":"2024-11-15T11:23:03.059Z","updated_at":"2025-04-13T04:05:22.071Z","avatar_url":"https://github.com/krassowski.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# nbpipeline\n[![Build Status](https://travis-ci.org/krassowski/nbpipeline.svg?branch=master)](https://travis-ci.org/krassowski/nbpipeline)\n[![DOI](https://zenodo.org/badge/188075188.svg)](https://zenodo.org/badge/latestdoi/188075188)\n\nSnakemake-like pipelines for Jupyter Notebooks, producing interactive pipeline reports like this:\n\n\u003cimg src=\"https://raw.githubusercontent.com/krassowski/nbpipeline/master/examples/screenshots/example_interactive_result.png\" width=400\u003e  \u003cimg src=\"https://raw.githubusercontent.com/krassowski/nbpipeline/master/examples/screenshots/example_diff.png\" width=400\u003e\n\n### Install \u0026 general remarks\n\nThese are still early days of this software so please bear in mind that it is not ready for production yet.\nNote: for simplicity I assume that you are using a recent Ubuntu with git installed.\n\n\n```bash\npip install nbpipeline\n```\n\nGraphiz is required for static SVG plots:\n\n```bash\nsudo apt-get install graphviz libgraphviz-dev graphviz-dev\n```\n\n#### Development install\n\nTo install the latest development version you may use:\n\n```bash\ngit clone https://github.com/krassowski/nbpipeline\ncd nbpipeline\npip install -r requirements.txt\nln -s $(pwd)/bin/nbpipeline-dev ~/bin/nbpipeline\n```\n\n### Quickstart\n\nCreate `pipeline.py` file with list of rules for your pipeline. For example:\n\n```python\nfrom nbpipeline.rules import NotebookRule\n\n\nNotebookRule(\n    'Extract protein data',  # a nice name for the step\n    input={'protein_data_path': 'data/raw/data_from_wetlab.xlsx'},\n    output={'output_path': 'data/clean/protein_levels.csv'},\n    notebook='analyses/Data_extraction.ipynb',\n    group='Proteomics'  # this is optional\n)\n\nNotebookRule(\n    'Quality control and PCA on proteins',\n    input={'protein_levels_path': 'data/clean/protein_levels.csv'},\n    output={'qc_report_path': 'reports/proteins_failing_qc.csv'},\n    notebook='analyses/Exploration_and_quality_control.ipynb',\n    group='Proteomics'\n)\n```\n\nthe keys of the input and output variables should correspond to variables in one of the first cells\nin the corresponding notebook, which should be tagged as \"parameters\". It can be done easily in JupyterLab:\n\n\u003cimg src=\"https://raw.githubusercontent.com/krassowski/nbpipeline/master/examples/screenshots/tags_in_JupyterLab_2.0.png\" width=550\u003e\n\nIf you forget to add them, a warning will be displayed.\n\nAlternativaly, you can create a dedicated cell for input paths definitions and tag it \"inputs\" and a separate one for output   paths definitions, tagging it \"outputs\", which allows to omit input and output keywords when creating a `NotebookRule`. However, only simple variable definitions will be deduced (parsing uses regular expressions to avoid potential dangers of `eval`).\n\nFor more details, please see the example [pipeline](https://github.com/krassowski/nbpipeline/blob/master/examples/pipeline.py) and [notebooks](https://github.com/krassowski/nbpipeline/tree/master/examples/analyses) in the [examples](https://github.com/krassowski/nbpipeline/tree/master/examples) directory.\n\n\n#### Run the pipeline:\n\n```bash\nnbpipeline\n```\n\nOn any consecutive run the notebooks which did not change will not be run again.\nTo disable this cache, use `--disable_cache` switch.\n\nTo generate an interactive diagram of the rules graph, together with reproducibility report add `-i` switch:\n\n```bash\nnbpipeline -i\n```\n\nThe software defaults to `google-chrome` for graph visualization display, which can be changed with a CLI option.\n\nIf you named your definition files differently (e.g. `my_rules.py` instead of `pipeline.py`), use:\n\n```bash\nnbpipeline --definitions_file my_rules.py\n```\n\n\nTo display all command line options use:\n\n```bash\nnbpipeline -h\n```\n\n\n#### Troubleshooting\n\nIf you see `ModuleNotFoundError: No module named 'name_of_your_local_module'`, you may need to enforce the path, running nbpipeline with:\n\n```bash\nPYTHONPATH=/path/to/the/parent/of/local/module:$PYTHONPATH nbpipeline\n```\n\nOftentimes the path is the same as the current directory, so the following command may work:\n\n\n```bash\nPYTHONPATH=$(pwd):$PYTHONPATH nbpipeline\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrassowski%2Fnbpipeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkrassowski%2Fnbpipeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrassowski%2Fnbpipeline/lists"}