{"id":37069876,"url":"https://github.com/binste/pipecutter","last_synced_at":"2026-01-14T08:02:58.124Z","repository":{"id":44429192,"uuid":"231474630","full_name":"binste/pipecutter","owner":"binste","description":"pipecutter provides a few tools for luigi such that it works better with data science libraries and environments such as pandas, scikit-learn, and Jupyter notebooks. ","archived":true,"fork":false,"pushed_at":"2022-08-05T21:57:56.000Z","size":162,"stargazers_count":4,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-17T09:34:46.794Z","etag":null,"topics":["jupyter-notebook","luigi","luigi-targets","luigi-tasks","pandas","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/binste.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-01-02T23:13:24.000Z","updated_at":"2023-10-16T18:00:24.000Z","dependencies_parsed_at":"2022-08-12T11:10:53.879Z","dependency_job_id":null,"html_url":"https://github.com/binste/pipecutter","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/binste/pipecutter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binste%2Fpipecutter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binste%2Fpipecutter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binste%2Fpipecutter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binste%2Fpipecutter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/binste","download_url":"https://codeload.github.com/binste/pipecutter/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binste%2Fpipecutter/sbom","scorecard":{"id":238995,"data":{"date":"2025-08-11","repo":{"name":"github.com/binste/pipecutter","commit":"18cac9340ea9f192e524b8a1b8f351cba972d45b"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.3,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"project is archived","details":["Warn: Repository is archived."],"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":4,"reason":"6 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2022-288 / GHSA-6hrg-qmvc-2xh8","Warn: Project is vulnerable to: PYSEC-2024-159 / GHSA-8qch-vj6m-2694","Warn: Project is vulnerable to: PYSEC-2020-73","Warn: Project is vulnerable to: PYSEC-2023-238 / GHSA-5wvp-7f3h-6wmm","Warn: Project is vulnerable to: GHSA-8cw2-jv5c-c825","Warn: Project is vulnerable to: GHSA-cjw4-2w9r-r8mv"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-17T06:17:54.543Z","repository_id":44429192,"created_at":"2025-08-17T06:17:54.543Z","updated_at":"2025-08-17T06:17:54.543Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28413527,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T05:26:33.345Z","status":"ssl_error","status_checked_at":"2026-01-14T05:21:57.251Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["jupyter-notebook","luigi","luigi-targets","luigi-tasks","pandas","scikit-learn"],"created_at":"2026-01-14T08:02:57.443Z","updated_at":"2026-01-14T08:02:58.107Z","avatar_url":"https://github.com/binste.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pipecutter \u003c!-- omit in toc --\u003e\n[![PyPI version](http://img.shields.io/pypi/v/pipecutter.svg?style=flat-square\u0026color=blue)](https://pypi.python.org/pypi/pipecutter/) [![Python versions](https://img.shields.io/pypi/pyversions/pipecutter.svg?style=flat-square\u0026color=blue)]() [![build status](http://img.shields.io/travis/binste/pipecutter/master.svg?style=flat)](https://travis-ci.org/binste/pipecutter) [![coverage](https://img.shields.io/codecov/c/github/binste/pipecutter/master.svg?style=flat)](https://codecov.io/gh/binste/pipecutter?branch=master)\n\npipecutter provides a few tools for luigi such that it works better with data science libraries and environments such as pandas, scikit-learn, and Jupyter notebooks.\n\n# Table of contents \u003c!-- omit in toc --\u003e\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Debug in an interactive environment](#debug-in-an-interactive-environment)\n  - [Targets](#targets)\n  - [Full example](#full-example)\n\n# Installation\n```bash\npip install pipecutter\n```\n\nPython 3.6+ is required. pipecutter follows [semantic versioning](https://semver.org/).\n\n# Usage\npipecutter currently provides\n\n* a more convenient way to run and debug luigi tasks in interactive environments such as Jupyter notebooks\n* some luigi targets for saving pandas dataframes to parquet, scikit-learn models with joblib, ...\n\n## Debug in an interactive environment\nWith luigi, you can already run tasks in a Python script/Jupyter notebook/Python console by using the `luigi.build` function (probably with `local_scheduler=True` as arugment). However, if the tasks throws an exception this will be caught by luigi and you are not able to drop into a post mortem debugging session. `pipecutter.run` is a light wrapper around `luigi.build` which disables this exception handling.\n\n```python\nIn [1]: import luigi\nIn [2]: import pipecutter\n\nIn [3]: class TaskWhichFails(luigi.Task):\n   ...:     def run(self):\n   ...:         raise Exception(\"Something is wrong\")\n\n# Traceback below is shortened for readability\nIn [4]: pipecutter.run(TaskWhichFails())\n---------------------------------------------------------------------------\nException                                 Traceback (most recent call last)\n\u003cipython-input-5-a970d52d810a\u003e in \u003cmodule\u003e\n----\u003e 1 pipecutter.run(TaskWhichFails())\n\n...\n\n\u003cipython-input-3-4e27674090fa\u003e in run(self)\n      1 class TaskWhichFails(luigi.Task):\n      2     def run(self):\n----\u003e 3         raise Exception\n\nException: Something is wrong\n\n# Drop straight into the debugger\nIn [5]: %debug\n\u003e \u003cipython-input-6-e7528a27d82e\u003e(3)run()\n      1 class TaskWhichFails(luigi.Task):\n      2     def run(self):\n----\u003e 3         raise Exception\n      4\nipdb\u003e\n```\nThis should reduce the barrier for already using luigi tasks while developing a model and thereby making it easier to move into production later on.\n\nAdditionally, you can print the dependencies of tasks with `pipecutter.print_tree` (wrapper around `luigi.tools.deps_tree.print_tree`) or build a graphviz Graph with `pipecutter.build_graph` which you can save as .png, .pdf, etc. or directly view in your Jupyter notebook. See the Full Example for a screenshot of how this looks. The `build_graph` function requires you to have [graphviz installed](https://graphviz.readthedocs.io/en/stable/manual.html#installation).\n\n## Targets\nIn `pipecutter.targets` you find a few targets which build on luigi's `LocalTarget` but additionally have a `load` and a `dump` method. A convenient way to name the targets is hereby to use the `task_id` in the name, which is unique with respect to the task name and its passed in parameters.\n\n```python\nimport luigi\nimport pipecutter\nfrom pipecutter.targets import JoblibTarget\nfrom sklearn.ensemble import RandomForestClassifier\n\n\nclass TrainModel(luigi.Task):\n    n_estimators = luigi.IntParameter()\n\n    def output(self):\n        return JoblibTarget(self.task_id + \".joblib\")\n\n    def run(self):\n        model = RandomForestClassifier(n_estimators=self.n_estimators)\n        self.output().dump(model)\n\n\npipecutter.run(TrainModel(n_estimators=100))\n# -\u003e Produces a file called TrainModel_100_0b0ec0cdea.joblib\n```\n\nIf you use `task_id` in the filename the above task can be written more concise with the `pipecutter.targets.outputs` decorator which adds the `output` method. By default it puts the files in a folder called `data`. This can be adjusted by the optional `folder` argument.\n\n```python\nfrom pipeline.targets import outputs\n\n\n@outputs(JoblibTarget)\nclass TrainModel(luigi.Task):\n    n_estimators = luigi.IntParameter()\n\n    def run(self):\n        model = RandomForestClassifier(n_estimators=self.n_estimators)\n        self.output().dump(model)\n```\n\nYou can also pass a dictionary to `pipecutter.targets.outputs` with a string as the key and a target which inherits from `pipecutter.targets.TargetBase` as the value.\n\n## Full example\n```python\nimport luigi\nimport pandas as pd\nimport numpy as np\nimport pipecutter\nfrom luigi.util import requires\nfrom pipecutter.targets import outputs, JoblibTarget, ParquetTarget\nfrom sklearn.ensemble import RandomForestClassifier\n\n\n@outputs(ParquetTarget)\nclass PrepareData(luigi.Task):\n    drop_missings = luigi.BoolParameter()\n\n    def run(self):\n        train_df = pd.DataFrame.from_dict({\"A\": [0, 1, np.nan], \"B\": [5, 1, 2], \"label\": [0, 1, 1]})\n        if self.drop_missings:\n            train_df = train_df.dropna()\n\n        self.output().dump(train_df)\n\n\n@requires(PrepareData)\n@outputs(JoblibTarget)\nclass TrainModel(luigi.Task):\n    n_estimators = luigi.IntParameter()\n\n    def run(self):\n        train_df = self.input().load()\n        X, y = train_df.drop(\"label\", axis=1), train_df[\"label\"]\n\n        model = RandomForestClassifier(n_estimators=self.n_estimators)\n        model.fit(X, y)\n\n        self.output().dump(model)\n\n\ntrain_model = TrainModel(n_estimators=100, drop_missings=True)\npipecutter.build_graph(train_model)\n```\n\nThe last command can be used to visualize the dependency tree, which is especially useful if your pipelines are more complex. It returns a `graphviz.Digraph` object which will render in a Jupyter Notebook as\n\n![build graph example](https://raw.githubusercontent.com/binste/pipecutter/master/images/build_graph_example.png)\n\nFinally, run the tasks with:\n```\npipecutter.run(train_model)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbinste%2Fpipecutter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbinste%2Fpipecutter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbinste%2Fpipecutter/lists"}