{"id":16707288,"url":"https://github.com/tamasgal/thepipe","last_synced_at":"2025-03-21T20:32:33.300Z","repository":{"id":47897232,"uuid":"177125275","full_name":"tamasgal/thepipe","owner":"tamasgal","description":"A simplistic, general purpose pipeline framework.","archived":false,"fork":false,"pushed_at":"2022-07-21T13:54:06.000Z","size":106,"stargazers_count":14,"open_issues_count":3,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-16T22:53:25.386Z","etag":null,"topics":["data-processing","data-processing-pipelines","data-science","hacktoberfest","pipelines","provenance","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tamasgal.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGELOG.rst","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-03-22T11:05:25.000Z","updated_at":"2024-06-27T15:34:23.000Z","dependencies_parsed_at":"2022-08-12T14:00:31.145Z","dependency_job_id":null,"html_url":"https://github.com/tamasgal/thepipe","commit_stats":null,"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tamasgal%2Fthepipe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tamasgal%2Fthepipe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tamasgal%2Fthepipe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tamasgal%2Fthepipe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tamasgal","download_url":"https://codeload.github.com/tamasgal/thepipe/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244865996,"owners_count":20523445,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-processing","data-processing-pipelines","data-science","hacktoberfest","pipelines","provenance","python"],"created_at":"2024-10-12T19:38:30.411Z","updated_at":"2025-03-21T20:32:32.967Z","avatar_url":"https://github.com/tamasgal.png","language":"Python","readme":"thepipe\n=======\n\n.. image:: https://readthedocs.org/projects/thepipe/badge/?version=latest\n    :target: https://thepipe.readthedocs.io/en/latest/?badge=latest\n    :alt: Documentation Status\n\n.. image:: https://api.codacy.com/project/badge/Grade/20a35727ae364e08845b60bdeb4b233a\n    :alt: Codacy Badge\n    :target: https://www.codacy.com/app/tamasgal/thepipe?utm_source=github.com\u0026amp;utm_medium=referral\u0026amp;utm_content=tamasgal/thepipe\u0026amp;utm_campaign=Badge_Grade\n\n.. image:: https://travis-ci.org/tamasgal/thepipe.svg?branch=master\n    :alt: Travis-CI Build Status\n    :target: https://travis-ci.org/tamasgal/thepipe\n\n.. image:: http://codecov.io/github/tamasgal/thepipe/coverage.svg?branch=master\n    :alt: Test-coverage\n    :target: http://codecov.io/github/tamasgal/thepipe?branch=master\n\n.. image:: https://img.shields.io/pypi/v/thepipe.svg?style=flat\n    :alt: PyPI Package latest release\n    :target: https://pypi.python.org/pypi/thepipe\n\nA simplistic, general purpose pipeline framework, which can easily be\nintegrated into existing (analysis) chains and workflows.\n\nInstallation\n------------\n``thepipe`` can be installed via ``pip``::\n\n    pip install thepipe\n\nFeatures\n--------\n\n- Easy to use interface and integration into existing workflows\n- Automatic provenance tracking (set ``Provenance().outfile`` to dump it upon\n  program termination)\n- Modules can be either subclasses of ``Module`` or bare python functions\n- Data is passed via a simple Python dictionary from module to module (wrapped\n  in a class called ``Blob`` which adds some visual candy and error reporting)\n- Integrated hierarchical logging system\n- Colour coded log and print messages (``self.log()`` and ``self.cprint()`` in\n  ``Modules``)\n- Performance statistics for the whole pipeline and each module individually\n- Clean exit when interrupting the pipeline with CTRL+C\n\nThe Pipeline\n------------\n\nHere is a basic example how to create a pipeline, add some modules to it, pass\nsome parameters and drain the pipeline.\n\nNote that pipeline modules can either be vanilla (univariate) Python functions\nor Classes which derive from ``thepipe.Module``.\n\n.. code-block:: python\n\n    import thepipe as tp\n\n\n    class AModule(tp.Module):\n        def configure(self):\n            self.cprint(\"Configuring AModule\")\n            self.max_count = self.get(\"max_count\", default=23)\n            self.index = 0\n\n        def process(self, blob):\n            self.cprint(\"This is cycle #%d\" % self.index)\n            blob['index'] = self.index\n            self.index += 1\n\n            if self.index \u003e self.max_count:\n                self.log.critical(\"That's enough...\")\n                raise StopIteration\n            return blob\n\n        def finish(self):\n            self.cprint(\"I'm done!\")\n\n\n    def a_function_based_module(blob):\n        print(\"Here is the blob:\")\n        print(blob)\n        return blob\n\n\n    pipe = tp.Pipeline()\n    pipe.attach(AModule, max_count=5)  # pass any parameters to the module\n    pipe.attach(a_function_based_module)\n    pipe.drain()  # without arguments it will drain until a StopIteration is raised\n\nThis will produce the following output:\n\n.. code-block:: shell\n\n    2020-05-26 12:43:12 ++ AModule: Configuring AModule\n    Pipeline and module initialisation took 0.001s (CPU 0.001s).\n    2020-05-26 12:43:12 ++ AModule: This is cycle #0\n    Here is the blob:\n    Blob (1 entries):\n    'index' =\u003e 0\n    2020-05-26 12:43:12 ++ AModule: This is cycle #1\n    Here is the blob:\n    Blob (1 entries):\n    'index' =\u003e 1\n    2020-05-26 12:43:12 ++ AModule: This is cycle #2\n    Here is the blob:\n    Blob (1 entries):\n    'index' =\u003e 2\n    2020-05-26 12:43:12 ++ AModule: This is cycle #3\n    Here is the blob:\n    Blob (1 entries):\n    'index' =\u003e 3\n    2020-05-26 12:43:12 ++ AModule: This is cycle #4\n    Here is the blob:\n    Blob (1 entries):\n    'index' =\u003e 4\n    2020-05-26 12:43:12 ++ AModule: This is cycle #5\n    2020-05-26 12:43:12 CRITICAL ++ AModule: That's enough...\n    2020-05-26 12:43:12 ++ AModule: I'm done!\n    ============================================================\n    5 cycles drained in 0.001284s (CPU 0.001475s). Memory peak: 27.01 MB\n    wall  mean: 0.000070s  medi: 0.000052s  min: 0.000042s  max: 0.000122s  std: 0.000031s\n    CPU   mean: 0.000070s  medi: 0.000052s  min: 0.000042s  max: 0.000124s  std: 0.000032s\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftamasgal%2Fthepipe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftamasgal%2Fthepipe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftamasgal%2Fthepipe/lists"}