{"id":15692560,"url":"https://github.com/pipefunc/pipefunc","last_synced_at":"2025-12-16T09:23:51.519Z","repository":{"id":181539901,"uuid":"666932451","full_name":"pipefunc/pipefunc","owner":"pipefunc","description":"Lightweight fast function pipeline (DAG) creation in pure Python for scientific workflows 🕸️🧪","archived":false,"fork":false,"pushed_at":"2025-05-15T08:57:57.000Z","size":2230,"stargazers_count":358,"open_issues_count":67,"forks_count":14,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-15T09:42:32.825Z","etag":null,"topics":["dag","hpc","parallel-computing","pipeline-framework","pipelines","reproducible-research","slurm","workflow-engine"],"latest_commit_sha":null,"homepage":"https://pipefunc.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pipefunc.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.md","dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-07-16T04:34:50.000Z","updated_at":"2025-05-15T08:57:35.000Z","dependencies_parsed_at":"2024-02-28T02:47:56.763Z","dependency_job_id":"bfa40ac8-acfb-409f-9ead-95441d14c67e","html_url":"https://github.com/pipefunc/pipefunc","commit_stats":null,"previous_names":["basnijholt/pipefunc","pipefunc/pipefunc"],"tags_count":126,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pipefunc%2Fpipefunc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pipefunc%2Fpipefunc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pipefunc%2Fpipefunc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pipefunc%2Fpipefunc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pipefunc","download_url":"https://codeload.github.com/pipefunc/pipefunc/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254436944,"owners_count":22070946,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dag","hpc","parallel-computing","pipeline-framework","pipelines","reproducible-research","slurm","workflow-engine"],"created_at":"2024-10-03T18:35:29.084Z","updated_at":"2025-12-16T09:23:46.471Z","avatar_url":"https://github.com/pipefunc.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PipeFunc: Structure, Automate, and Simplify Your Computational Workflows 🕸\n\n\u003e **_Stop_** micromanaging execution. Focus on the **science**. Capture your workflow's essence with **function pipelines**, represent **computations as DAGs**, and **automate parallel sweeps**.\n\n[![Python](https://img.shields.io/pypi/pyversions/pipefunc)](https://pypi.org/project/pipefunc/)\n[![PyPi](https://img.shields.io/pypi/v/pipefunc?color=blue)](https://pypi.org/project/pipefunc/)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![pytest](https://github.com/pipefunc/pipefunc/actions/workflows/pytest-micromamba.yml/badge.svg)](https://github.com/pipefunc/pipefunc/actions/workflows/pytest-micromamba.yml)\n[![Conda](https://img.shields.io/badge/install%20with-conda-green.svg)](https://anaconda.org/conda-forge/pipefunc)\n[![Coverage](https://img.shields.io/codecov/c/github/pipefunc/pipefunc)](https://codecov.io/gh/pipefunc/pipefunc)\n[![CodSpeed Badge](https://img.shields.io/endpoint?url=https://codspeed.io/badge.json)](https://codspeed.io/pipefunc/pipefunc)\n[![Documentation](https://readthedocs.org/projects/pipefunc/badge/?version=latest)](https://pipefunc.readthedocs.io/en/latest/?badge=latest)\n[![Downloads](https://img.shields.io/conda/dn/conda-forge/pipefunc.svg)](https://anaconda.org/conda-forge/pipefunc)\n[![GitHub](https://img.shields.io/github/stars/pipefunc/pipefunc.svg?style=social)](https://github.com/pipefunc/pipefunc/stargazers)\n[![Discord](https://img.shields.io/discord/1320459922596565103.svg?label=Discord\u0026logo=discord)](https://discord.gg/wUXg2drsNN)\n\n![](https://user-images.githubusercontent.com/6897215/253785642-cf2a6941-2ea6-41b0-8225-b3e52e94c4de.png)\n\n\u003c!-- toc-start --\u003e\n\n## :books: Table of Contents\n\n\u003c!-- START doctoc generated TOC please keep comment here to allow auto update --\u003e\n\u003c!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --\u003e\n\n- [:thinking: What is this?](#thinking-what-is-this)\n- [:rocket: Key Features](#rocket-key-features)\n- [:test_tube: How does it work?](#test_tube-how-does-it-work)\n- [:notebook: Jupyter Notebook Example](#notebook-jupyter-notebook-example)\n- [:computer: Installation](#computer-installation)\n- [:hammer_and_wrench: Development](#hammer_and_wrench-development)\n\n\u003c!-- END doctoc generated TOC please keep comment here to allow auto update --\u003e\n\u003c!-- toc-end --\u003e\n\n## :thinking: What is this?\n\n[![asciicast](https://asciinema.org/a/q5S3ffIxrAGmoLMOc0hOb3aod.svg)](https://asciinema.org/a/q5S3ffIxrAGmoLMOc0hOb3aod)\n\n**`pipefunc`** is a Python library designed for creating and executing **function pipelines**.\nBy simply annotating functions and specifying their outputs, it builds a pipeline that **automatically manages the execution order** based on dependencies.\nVisualize the pipeline as a directed graph, execute the pipeline for all (or specific) outputs, add multidimensional sweeps, automatically parallelize the pipeline, and get nicely structured data back.\n\n\u003e [!NOTE]\n\u003e A _*pipeline*_ is a sequence of interconnected functions, structured as a [Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) (DAG), where outputs from one or more functions serve as inputs to subsequent ones.\n\u003e pipefunc streamlines the creation and management of these pipelines, offering powerful tools to efficiently execute them.\n\nWhether you're working with data processing, scientific computations, machine learning (AI) workflows, or any other scenario involving interdependent functions, `pipefunc` helps you focus on the logic of your code while it handles the intricacies of function dependencies and execution order.\n\n## :rocket: Key Features\n\n1. 🚀 **Function Composition and Pipelining**: Create pipelines by using the `@pipefunc` decorator; execution order is automatically handled.\n1. 📊 **Pipeline Visualization**: Generate visual graphs of your pipelines to better understand the flow of data.\n1. 👥 **Multiple Outputs**: Handle functions that return multiple results, allowing each result to be used as input to other functions.\n1. 🔁 **Map-Reduce Support**: Perform \"map\" operations to apply functions over data and \"reduce\" operations to aggregate results, allowing n-dimensional mappings.\n1. 👮 **Type Annotations Validation**: Validates the type annotations between functions to ensure type consistency.\n1. 🎛️ **Resource Usage Profiling**: Get reports on CPU usage, memory consumption, and execution time to identify bottlenecks and optimize your code.\n1. 🔄 **Automatic parallelization**: Automatically runs pipelines in parallel (local or remote) with shared memory and disk caching options.\n1. ⚡ **Ultra-Fast Performance**: Minimal overhead of [about 15 µs](https://pipefunc.readthedocs.io/en/latest/faq/#what-is-the-overhead-efficiency-performance-of-pipefunc) per function in the graph, ensuring blazingly fast execution.\n1. 🔍 **Parameter Sweep Utilities**: Generate parameter combinations for parameter sweeps and optimize the sweeps with result caching.\n1. 💡 **Flexible Function Arguments**: Call functions with different argument combinations, letting `pipefunc` determine which other functions to call based on the provided arguments.\n1. 🏗️ **Leverages giants**: Builds on top of [NetworkX](https://networkx.org/) for graph algorithms, [NumPy](https://numpy.org/) for multi-dimensional arrays, and optionally [Xarray](https://docs.xarray.dev/) for labeled multi-dimensional arrays, [Zarr](https://zarr.readthedocs.io/) to store results in memory/disk/cloud or any key-value store, and [Adaptive](https://adaptive.readthedocs.io/) for parallel sweeps.\n1. 🤓 **Nerd stats**: \u003e1000 tests with 100% test coverage, fully typed, only 3 required dependencies, _all_ Ruff Rules, _all_ public API documented.\n\n## :test_tube: How does it work?\n\npipefunc provides a Pipeline class that you use to define your function pipeline.\nYou add functions to the pipeline using the `pipefunc` decorator, which also lets you specify the function's output name.\nOnce your pipeline is defined, you can execute it for specific output values, simplify it by combining function nodes, visualize it as a directed graph, and profile the resource usage of the pipeline functions.\nFor more detailed usage instructions and examples, please check the usage example provided in the package.\n\nHere is a simple example usage of pipefunc to illustrate its primary features:\n\n```python\nfrom pipefunc import pipefunc, Pipeline\n\n# Define three functions that will be a part of the pipeline\n@pipefunc(output_name=\"c\")\ndef f_c(a, b):\n    return a + b\n\n@pipefunc(output_name=\"d\")\ndef f_d(b, c):\n    return b * c\n\n@pipefunc(output_name=\"e\")\ndef f_e(c, d, x=1):\n    return c * d * x\n\n# Create a pipeline with these functions\npipeline = Pipeline([f_c, f_d, f_e], profile=True)  # `profile=True` enables resource profiling\n\n# Call the pipeline directly for different outputs:\nassert pipeline(\"d\", a=2, b=3) == 15\nassert pipeline(\"e\", a=2, b=3) == 75\n\n# Visualize the pipeline\npipeline.visualize()\n\n# Show resource reporting (only works if profile=True)\npipeline.print_profiling_stats()\n```\n\nThis example demonstrates defining a pipeline with `f_c`, `f_d`, `f_e` functions, accessing and executing these functions using the pipeline, visualizing the pipeline graph, getting all possible argument mappings, and reporting on the resource usage.\nThis basic example should give you an idea of how to use `pipefunc` to construct and manage function pipelines.\n\nThe following example demonstrates how to perform a map-reduce operation using `pipefunc`:\n\n```python\nfrom pipefunc import pipefunc, Pipeline\nfrom pipefunc.map import load_outputs\nimport numpy as np\n\n@pipefunc(output_name=\"c\", mapspec=\"a[i], b[j] -\u003e c[i, j]\")  # the mapspec is used to specify the mapping\ndef f(a: int, b: int):\n    return a + b\n\n@pipefunc(output_name=\"mean\")  # there is no mapspec, so this function takes the full 2D array\ndef g(c: np.ndarray):\n    return np.mean(c)\n\npipeline = Pipeline([f, g])\ninputs = {\"a\": [1, 2, 3], \"b\": [4, 5, 6]}\npipeline.map(inputs, run_folder=\"my_run_folder\", parallel=True)\nresult = load_outputs(\"mean\", run_folder=\"my_run_folder\")\nprint(result)  # prints 7.0\n```\n\nHere the `mapspec` argument is used to specify the mapping between the inputs and outputs of the `f` function, it creates the product of the `a` and `b` input lists and computes the sum of each pair. The `g` function then computes the mean of the resulting 2D array. The `map` method executes the pipeline for the `inputs`, and the `load_outputs` function is used to load the results of the `g` function from the specified run folder.\n\n## :notebook: Jupyter Notebook Example\n\nSee the detailed usage example and more in our [example.ipynb](https://github.com/pipefunc/pipefunc/blob/main/example.ipynb).\n\n\u003e [!TIP]\n\u003e Have [`uv` installed](https://docs.astral.sh/uv/)?\n\u003e Run `uvx --with \"pipefunc[docs]\" -p 3.13 opennb pipefunc/pipefunc/example.ipynb` to open the example notebook in your browser without the need to setup anything!\n\n## :computer: Installation\n\nInstall the **latest stable** version from conda (recommended):\n\n```bash\nconda install pipefunc\n```\n\nor from PyPI:\n\n```bash\npip install \"pipefunc[all]\"\n```\n\nor install **main** with:\n\n```bash\npip install -U https://github.com/pipefunc/pipefunc/archive/main.zip\n```\n\nor clone the repository and do a dev install (recommended for dev):\n\n```bash\ngit clone git@github.com:pipefunc/pipefunc.git\ncd pipefunc\npip install -e \".[dev]\"\n```\n\n## :hammer_and_wrench: Development\n\nWe use [`pre-commit`](https://pre-commit.com/) to manage pre-commit hooks, which helps us ensure that our code is always clean and compliant with our coding standards.\nTo set it up, install pre-commit with pip and then run the install command:\n\n```bash\npip install pre-commit\npre-commit install\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpipefunc%2Fpipefunc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpipefunc%2Fpipefunc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpipefunc%2Fpipefunc/lists"}