{"id":13533942,"url":"https://github.com/insitro/redun","last_synced_at":"2026-02-23T05:01:53.972Z","repository":{"id":37070467,"uuid":"424419603","full_name":"insitro/redun","owner":"insitro","description":"Yet another redundant workflow engine","archived":false,"fork":false,"pushed_at":"2025-11-20T14:29:24.000Z","size":6500,"stargazers_count":568,"open_issues_count":33,"forks_count":54,"subscribers_count":15,"default_branch":"main","last_synced_at":"2025-12-06T20:25:27.336Z","etag":null,"topics":["aws","bioinformatics","data-engineering","data-science","docker","etl","gcp","ml","python","workflow-engine"],"latest_commit_sha":null,"homepage":"https://insitro.github.io/redun/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/insitro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-11-04T00:14:09.000Z","updated_at":"2025-12-06T04:36:34.000Z","dependencies_parsed_at":"2025-04-01T22:32:12.064Z","dependency_job_id":"d019fb0b-34ba-4258-a245-55757a90e47c","html_url":"https://github.com/insitro/redun","commit_stats":{"total_commits":320,"total_committers":39,"mean_commits":8.205128205128204,"dds":0.675,"last_synced_commit":"729fb784ef6d9eac5b49598dace633cfbe3ad15f"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/insitro/redun","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/insitro%2Fredun","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/insitro%2Fredun/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/insitro%2Fredun/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/insitro%2Fredun/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/insitro","download_url":"https://codeload.github.com/insitro/redun/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/insitro%2Fredun/sbom","scorecard":{"id":489745,"data":{"date":"2025-08-11","repo":{"name":"github.com/insitro/redun","commit":"c00dac7c162663de0c9eef11f9d56b51af7b6530"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.9,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/publish-docs.yml:1","Warn: no topLevel permission defined: .github/workflows/test_pull_request.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Maintained","score":10,"reason":"13 commit(s) and 5 issue activity found in the last 90 days -- score normalized to 10","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Code-Review","score":1,"reason":"Found 5/30 approved changesets -- score normalized to 1","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":0,"reason":"18 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2023-117 / GHSA-mrwq-x4v8-fh7p","Warn: Project is vulnerable to: PYSEC-2021-47 / GHSA-5jqp-qgf6-3pvh","Warn: Project is vulnerable to: GHSA-mr82-8j83-vxmv","Warn: Project is vulnerable to: PYSEC-2019-217 / GHSA-462w-v97r-4m45","Warn: Project is vulnerable to: PYSEC-2014-8 / GHSA-8r7q-cvjq-x353","Warn: Project is vulnerable to: GHSA-cpwx-vrp4-4pq7","Warn: Project is vulnerable to: PYSEC-2014-82 / GHSA-fqh9-2qgg-h84h","Warn: Project is vulnerable to: PYSEC-2021-66 / GHSA-g3rq-g295-4j3m","Warn: Project is vulnerable to: GHSA-h5c8-rqwp-cp95","Warn: Project is vulnerable to: GHSA-h75v-3vvj-5mfj","Warn: Project is vulnerable to: PYSEC-2019-220 / GHSA-hj2j-77xm-mc5v","Warn: Project is vulnerable to: GHSA-q2x7-8rv6-6q7h","Warn: Project is vulnerable to: PYSEC-2014-14 / GHSA-652x-xj99-gmcc","Warn: Project is vulnerable to: GHSA-9hjg-9r4m-mvj7","Warn: Project is vulnerable to: GHSA-9wx4-h78v-vm56","Warn: Project is vulnerable to: PYSEC-2014-13 / GHSA-cfj3-7x9c-4p3h","Warn: Project is vulnerable to: PYSEC-2018-28 / GHSA-x84v-xcm2-53pg","Warn: Project is vulnerable to: PYSEC-2020-73"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/publish-docs.yml:12: update your workflow using https://app.stepsecurity.io/secureworkflow/insitro/redun/publish-docs.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/publish-docs.yml:21: update your workflow using https://app.stepsecurity.io/secureworkflow/insitro/redun/publish-docs.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/test_pull_request.yml:27: update your workflow using https://app.stepsecurity.io/secureworkflow/insitro/redun/test_pull_request.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/test_pull_request.yml:28: update your workflow using https://app.stepsecurity.io/secureworkflow/insitro/redun/test_pull_request.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/test_pull_request.yml:37: update your workflow using https://app.stepsecurity.io/secureworkflow/insitro/redun/test_pull_request.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/test_pull_request.yml:38: update your workflow using https://app.stepsecurity.io/secureworkflow/insitro/redun/test_pull_request.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/test_pull_request.yml:15: update your workflow using https://app.stepsecurity.io/secureworkflow/insitro/redun/test_pull_request.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/test_pull_request.yml:16: update your workflow using https://app.stepsecurity.io/secureworkflow/insitro/redun/test_pull_request.yml/main?enable=pin","Warn: containerImage not pinned by hash: db/Dockerfile:1: pin your Docker image by updating public.ecr.aws/bitnami/postgresql:11 to public.ecr.aws/bitnami/postgresql:11@sha256:bed52175c1be51f589fc5ab7cba211cdae09f9f4e534a09cf18fcc8de0e72e88","Warn: containerImage not pinned by hash: examples/05_aws_batch/docker/Dockerfile:1: pin your Docker image by updating ubuntu:20.04 to ubuntu:20.04@sha256:8feb4d8ca5354def3d8fce243717141ce31e2c428701f6682bd2fafe15388214","Warn: containerImage not pinned by hash: examples/06_bioinfo_batch/docker/Dockerfile:1: pin your Docker image by updating ubuntu:20.04 to ubuntu:20.04@sha256:8feb4d8ca5354def3d8fce243717141ce31e2c428701f6682bd2fafe15388214","Warn: containerImage not pinned by hash: examples/aws_batch_array_jobs/docker/Dockerfile:1: pin your Docker image by updating ubuntu:20.04 to ubuntu:20.04@sha256:8feb4d8ca5354def3d8fce243717141ce31e2c428701f6682bd2fafe15388214","Warn: containerImage not pinned by hash: examples/aws_batch_multinode/docker/Dockerfile:1: pin your Docker image by updating ubuntu:20.04 to ubuntu:20.04@sha256:8feb4d8ca5354def3d8fce243717141ce31e2c428701f6682bd2fafe15388214","Warn: containerImage not pinned by hash: examples/conda/docker/Dockerfile:1: pin your Docker image by updating ubuntu:20.04 to ubuntu:20.04@sha256:8feb4d8ca5354def3d8fce243717141ce31e2c428701f6682bd2fafe15388214","Warn: containerImage not pinned by hash: examples/docker/docker/Dockerfile:1: pin your Docker image by updating ubuntu:20.04 to ubuntu:20.04@sha256:8feb4d8ca5354def3d8fce243717141ce31e2c428701f6682bd2fafe15388214","Warn: containerImage not pinned by hash: examples/federated_task/Dockerfile:1: pin your Docker image by updating ubuntu:20.04 to ubuntu:20.04@sha256:8feb4d8ca5354def3d8fce243717141ce31e2c428701f6682bd2fafe15388214","Warn: containerImage not pinned by hash: examples/gcp_batch/docker/Dockerfile:1: pin your Docker image by updating ubuntu:20.04 to ubuntu:20.04@sha256:8feb4d8ca5354def3d8fce243717141ce31e2c428701f6682bd2fafe15388214","Warn: containerImage not pinned by hash: examples/k8s/docker/Dockerfile:1: pin your Docker image by updating ubuntu:20.04 to ubuntu:20.04@sha256:8feb4d8ca5354def3d8fce243717141ce31e2c428701f6682bd2fafe15388214","Warn: containerImage not pinned by hash: examples/launch/docker/Dockerfile:1: pin your Docker image by updating ubuntu:20.04 to ubuntu:20.04@sha256:8feb4d8ca5354def3d8fce243717141ce31e2c428701f6682bd2fafe15388214","Warn: containerImage not pinned by hash: examples/setup_scheduler/docker/Dockerfile:1: pin your Docker image by updating ubuntu:20.04 to ubuntu:20.04@sha256:8feb4d8ca5354def3d8fce243717141ce31e2c428701f6682bd2fafe15388214","Warn: containerImage not pinned by hash: examples/subrun/docker/Dockerfile:1: pin your Docker image by updating ubuntu:20.04 to ubuntu:20.04@sha256:8feb4d8ca5354def3d8fce243717141ce31e2c428701f6682bd2fafe15388214","Warn: pipCommand not pinned by hash: examples/05_aws_batch/docker/Dockerfile:16","Warn: pipCommand not pinned by hash: examples/05_aws_batch/docker/Dockerfile:17","Warn: pipCommand not pinned by hash: examples/06_bioinfo_batch/docker/Dockerfile:18-20","Warn: pipCommand not pinned by hash: examples/aws_batch_array_jobs/docker/Dockerfile:16","Warn: pipCommand not pinned by hash: examples/aws_batch_array_jobs/docker/Dockerfile:17","Warn: pipCommand not pinned by hash: examples/aws_batch_multinode/docker/Dockerfile:16","Warn: pipCommand not pinned by hash: examples/aws_batch_multinode/docker/Dockerfile:17","Warn: downloadThenRun not pinned by hash: examples/conda/docker/Dockerfile:14","Warn: pipCommand not pinned by hash: examples/docker/docker/Dockerfile:16","Warn: pipCommand not pinned by hash: examples/docker/docker/Dockerfile:17","Warn: pipCommand not pinned by hash: examples/federated_task/Dockerfile:16","Warn: pipCommand not pinned by hash: examples/gcp_batch/docker/Dockerfile:14","Warn: pipCommand not pinned by hash: examples/gcp_batch/docker/Dockerfile:15","Warn: pipCommand not pinned by hash: examples/k8s/docker/Dockerfile:16","Warn: pipCommand not pinned by hash: examples/k8s/docker/Dockerfile:17","Warn: pipCommand not pinned by hash: examples/launch/docker/Dockerfile:12","Warn: pipCommand not pinned by hash: examples/launch/docker/Dockerfile:13","Warn: pipCommand not pinned by hash: examples/setup_scheduler/docker/Dockerfile:16","Warn: pipCommand not pinned by hash: examples/setup_scheduler/docker/Dockerfile:17","Warn: pipCommand not pinned by hash: examples/subrun/docker/Dockerfile:16","Warn: pipCommand not pinned by hash: examples/subrun/docker/Dockerfile:17","Warn: pipCommand not pinned by hash: .github/workflows/publish-docs.yml:17","Warn: pipCommand not pinned by hash: .github/workflows/test_pull_request.yml:20","Warn: pipCommand not pinned by hash: .github/workflows/test_pull_request.yml:22","Info:   0 out of   8 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of  23 pipCommand dependencies pinned","Info:   0 out of   1 downloadThenRun dependencies pinned","Info:   0 out of  13 containerImage dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 6 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-19T18:46:01.965Z","repository_id":37070467,"created_at":"2025-08-19T18:46:01.965Z","updated_at":"2025-08-19T18:46:01.965Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29738083,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-23T04:51:08.365Z","status":"ssl_error","status_checked_at":"2026-02-23T04:49:15.865Z","response_time":90,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","bioinformatics","data-engineering","data-science","docker","etl","gcp","ml","python","workflow-engine"],"created_at":"2024-08-01T07:01:24.510Z","updated_at":"2026-02-23T05:01:53.962Z","avatar_url":"https://github.com/insitro.png","language":"Python","readme":"\u003cimg src=\"docs/source/_static/redun.svg\" width=\"200\"/\u003e\n\n*yet another redundant workflow engine*\n\n**redun** aims to be a more expressive and efficient workflow framework, built on top of the popular Python programming language. It takes the somewhat contrarian view that writing dataflows directly is unnecessarily restrictive, and by doing so we lose abstractions we have come to rely on in most modern high-level languages (control flow, composability, recursion, high order functions, etc). redun's key insight is that workflows can be expressed as [lazy expressions](#whats-the-trick), which are then evaluated by a scheduler that performs automatic parallelization, caching, and data provenance logging.\n\nredun's key features are:\n\n- Workflows are defined by lazy expressions that when evaluated emit dynamic directed acyclic graphs (DAGs), enabling complex data flows.\n- Incremental computation that is reactive to both data changes as well as code changes.\n- Workflow tasks can be executed on a variety of compute backend (threads, processes, AWS batch jobs, Spark jobs, etc). \n- Data changes are detected for in-memory values as well as external data sources such as files and object stores using file hashing.\n- Code changes are detected by hashing individual Python functions and comparing them against historical call graph recordings.\n- Past intermediate results are cached centrally and reused across workflows.\n- Past call graphs can be used as a data lineage record and can be queried for debugging and auditing.\n\nTo learn more, see our [Medium](https://insitro.medium.com/when-data-science-goes-with-the-flow-insitro-introduces-redun-8b06b707a14b) and [AWS HPC](https://aws.amazon.com/blogs/hpc/data-science-workflows-at-insitro-using-redun-on-aws-batch/) blog posts, as well as our [documentation](https://insitro.github.io/redun/design.html), [tutorial](examples/README.md), and [influences](https://insitro.github.io/redun/design.html#influences).\n\n*About the name:* The name \"redun\" is self-deprecating (there are [A LOT](https://github.com/pditommaso/awesome-pipeline) of workflow engines), but it is also a reference to its original inspiration, the [redo](https://apenwarr.ca/log/20101214) build system.\n\n## Install\n\n```sh\npip install redun\n```\n\nSee [developing](https://insitro.github.io/redun/developing.html) for more information on working with the code.\n\n### Postgres backend\n\nTo use postgres as a recording backend, use\n\n```sh\npip install redun[postgres]\n```\n\nThe above assumes the following dependencies are installed:\n* `pg_config` (in the `postgresql-devel` package; on ubuntu: `apt-get install libpq-dev`)\n* `gcc` (on ubuntu or similar `sudo apt-get install gcc`)\n\n### Optional Visualization\n\nTo generate graphviz images and dot files, use\n\n```sh\npip install redun[viz]\n```\n\nThe above assumes the following dependencies are installed:\n* `graphviz` (on ubuntu: `apt-get install graphviz graphviz-dev`, via homebrew: `brew install graphviz`)\n* `gcc` (on ubuntu or similar `sudo apt-get install gcc`)\n\nOn MacOS, you may need to specific graphviz paths explcitly.\nFor example, if you installed graphviz using homebrew:\n```sh\npip install --config-settings=\"--global-option=build_ext\" \\\n            --config-settings=\"--global-option=-I/opt/homebrew/include/\" \\\n            --config-settings=\"--global-option=-L/opt/homebrew/lib/\" \\\n            pygraphviz\n```\n\n## Use cases\n\nredun's general approach to defining workflows makes it a good choice for implementing workflows for a wide-variety of use cases:\n\n- [Bioinformatics](examples/06_bioinfo_batch/)\n- [Cheminformatics](examples/aws_glue/rdkit_workflow.py)\n- [Web or API data extraction](examples/scraping/)\n- [General data science](examples/word_count/)\n- [And much more](examples/)\n\n## Small taste\n\nHere is a quick example of using redun for a familiar workflow, compiling a C program ([full example](examples/02_compile/README.md)). In general, any kind of data processing could be done within each task (e.g. reading and writing CSVs, DataFrames, databases, APIs).\n\n```py\n# make.py\n\nimport os\nfrom typing import Dict, List\n\nfrom redun import task, File\n\n\nredun_namespace = \"redun.examples.compile\"\n\n\n@task()\ndef compile(c_file: File) -\u003e File:\n    \"\"\"\n    Compile one C file into an object file.\n    \"\"\"\n    os.system(f\"gcc -c {c_file.path}\")\n    return File(c_file.path.replace(\".c\", \".o\"))\n\n\n@task()\ndef link(prog_path: str, o_files: List[File]) -\u003e File:\n    \"\"\"\n    Link several object files together into one program.\n    \"\"\"\n    o_files=\" \".join(o_file.path for o_file in o_files)\n    os.system(f\"gcc -o {prog_path} {o_files}\")\n    return File(prog_path)\n\n\n@task()\ndef make_prog(prog_path: str, c_files: List[File]) -\u003e File:\n    \"\"\"\n    Compile one program from its source C files.\n    \"\"\"\n    o_files = [\n        compile(c_file)\n        for c_file in c_files\n    ]\n    prog_file = link(prog_path, o_files)\n    return prog_file\n\n\n# Definition of programs and their source C files.\nfiles = {\n    \"prog\": [\n        File(\"prog.c\"),\n        File(\"lib.c\"),\n    ],\n    \"prog2\": [\n        File(\"prog2.c\"),\n        File(\"lib.c\"),\n    ],\n}\n\n\n@task()\ndef make(files : Dict[str, List[File]] = files) -\u003e List[File]:\n    \"\"\"\n    Top-level task for compiling all the programs in the project.\n    \"\"\"\n    progs = [\n        make_prog(prog_path, c_files)\n        for prog_path, c_files in files.items()\n    ]\n    return progs\n```\n\nNotice, that besides the `@task` decorator, the code follows typical Python conventions and is organized like a sequential program.\n\nWe can run the workflow using the `redun run` command:\n\n```\nredun run make.py make\n\n[redun] redun :: version 0.4.15\n[redun] config dir: /Users/rasmus/projects/redun/examples/compile/.redun\n[redun] Upgrading db from version -1.0 to 2.0...\n[redun] Start Execution 69c40fe5-c081-4ca6-b232-e56a0a679d42:  redun run make.py make\n[redun] Run    Job 72bdb973:  redun.examples.compile.make(files={'prog': [File(path=prog.c, hash=dfa3aba7), File(path=lib.c, hash=a2e6cbd9)], 'prog2': [File(path=prog2.c, hash=c748e4c7), File(path=lib.c, hash=a2e6cbd9)]}) on default\n[redun] Run    Job 096be12b:  redun.examples.compile.make_prog(prog_path='prog', c_files=[File(path=prog.c, hash=dfa3aba7), File(path=lib.c, hash=a2e6cbd9)]) on default\n[redun] Run    Job 32ed5cf8:  redun.examples.compile.make_prog(prog_path='prog2', c_files=[File(path=prog2.c, hash=c748e4c7), File(path=lib.c, hash=a2e6cbd9)]) on default\n[redun] Run    Job dfdd2ee2:  redun.examples.compile.compile(c_file=File(path=prog.c, hash=dfa3aba7)) on default\n[redun] Run    Job 225f924d:  redun.examples.compile.compile(c_file=File(path=lib.c, hash=a2e6cbd9)) on default\n[redun] Run    Job 3f9ea7ae:  redun.examples.compile.compile(c_file=File(path=prog2.c, hash=c748e4c7)) on default\n[redun] Run    Job a8b21ec0:  redun.examples.compile.link(prog_path='prog', o_files=[File(path=prog.o, hash=4934098e), File(path=lib.o, hash=7caa7f9c)]) on default\n[redun] Run    Job 5707a358:  redun.examples.compile.link(prog_path='prog2', o_files=[File(path=prog2.o, hash=cd0b6b7e), File(path=lib.o, hash=7caa7f9c)]) on default\n[redun]\n[redun] | JOB STATUS 2021/06/18 10:34:29\n[redun] | TASK                             PENDING RUNNING  FAILED  CACHED    DONE   TOTAL\n[redun] |\n[redun] | ALL                                    0       0       0       0       8       8\n[redun] | redun.examples.compile.compile         0       0       0       0       3       3\n[redun] | redun.examples.compile.link            0       0       0       0       2       2\n[redun] | redun.examples.compile.make            0       0       0       0       1       1\n[redun] | redun.examples.compile.make_prog       0       0       0       0       2       2\n[redun]\n[File(path=prog, hash=a8d14a5e), File(path=prog2, hash=04bfff2f)]\n```\n\nThis should have taken three C source files (`lib.c`, `prog.c`, and `prog2.c`), compiled them to three object files (`lib.o`, `prog.o`, `prog2.o`), and then linked them into two binaries (`prog` and `prog2`). Specifically, redun automatically determined the following dataflow DAG and performed the compiling and linking steps in separate threads:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"400\" src=\"examples/02_compile/images/compile-dag.svg\"\u003e\n\u003c/p\u003e\n\nUsing the `redun log` command, we can see the full job tree of the most recent execution (denoted `-`):\n\n```\nredun log -\n\nExec 69c40fe5-c081-4ca6-b232-e56a0a679d42 [ DONE ] 2021-06-18 10:34:28:  run make.py make\nDuration: 0:00:01.47\n\nJobs: 8 (DONE: 8, CACHED: 0, FAILED: 0)\n--------------------------------------------------------------------------------\nJob 72bdb973 [ DONE ] 2021-06-18 10:34:28:  redun.examples.compile.make(files={'prog': [File(path=prog.c, hash=dfa3aba7), File(path=lib.c, hash=a2e6cbd9)], 'prog2': [File(path=prog2.c, hash=c748e4c7), Fil\n  Job 096be12b [ DONE ] 2021-06-18 10:34:28:  redun.examples.compile.make_prog('prog', [File(path=prog.c, hash=dfa3aba7), File(path=lib.c, hash=a2e6cbd9)])\n    Job dfdd2ee2 [ DONE ] 2021-06-18 10:34:28:  redun.examples.compile.compile(File(path=prog.c, hash=dfa3aba7))\n    Job 225f924d [ DONE ] 2021-06-18 10:34:28:  redun.examples.compile.compile(File(path=lib.c, hash=a2e6cbd9))\n    Job a8b21ec0 [ DONE ] 2021-06-18 10:34:28:  redun.examples.compile.link('prog', [File(path=prog.o, hash=4934098e), File(path=lib.o, hash=7caa7f9c)])\n  Job 32ed5cf8 [ DONE ] 2021-06-18 10:34:28:  redun.examples.compile.make_prog('prog2', [File(path=prog2.c, hash=c748e4c7), File(path=lib.c, hash=a2e6cbd9)])\n    Job 3f9ea7ae [ DONE ] 2021-06-18 10:34:28:  redun.examples.compile.compile(File(path=prog2.c, hash=c748e4c7))\n    Job 5707a358 [ DONE ] 2021-06-18 10:34:29:  redun.examples.compile.link('prog2', [File(path=prog2.o, hash=cd0b6b7e), File(path=lib.o, hash=7caa7f9c)])\n```\n\nNotice, redun automatically detected that `lib.c` only needed to be compiled once and that its result can be reused (a form of [common subexpression elimination](https://en.wikipedia.org/wiki/Common_subexpression_elimination)).\n\nUsing the `--file` option, we can see all files (or URLs) that were read, `r`, or written, `w`, by the workflow:\n\n```\nredun log --file\n\nFile 2b6a7ce0 2021-06-18 11:41:42 r  lib.c\nFile d90885ad 2021-06-18 11:41:42 rw lib.o\nFile 2f43c23c 2021-06-18 11:41:42 w  prog\nFile dfa3aba7 2021-06-18 10:34:28 r  prog.c\nFile 4934098e 2021-06-18 10:34:28 rw prog.o\nFile b4537ad7 2021-06-18 11:41:42 w  prog2\nFile c748e4c7 2021-06-18 10:34:28 r  prog2.c\nFile cd0b6b7e 2021-06-18 10:34:28 rw prog2.o\n```\n\nWe can also look at the provenance of a single file, such as the binary `prog`:\n\n```\nredun log prog\n\nFile 2f43c23c 2021-06-18 11:41:42 w  prog\nProduced by Job a8b21ec0\n\n  Job a8b21ec0-e60b-4486-bcf4-4422be265608 [ DONE ] 2021-06-18 11:41:42:  redun.examples.compile.link('prog', [File(path=prog.o, hash=4934098e), File(path=lib.o, hash=d90885ad)])\n  Traceback: Exec 4a2b624d \u003e (1 Job) \u003e Job 2f8b4b5f make_prog \u003e Job a8b21ec0 link\n  Duration: 0:00:00.24\n\n    CallNode 6c56c8d472dc1d07cfd2634893043130b401dc84 redun.examples.compile.link\n      Args:   'prog', [File(path=prog.o, hash=4934098e), File(path=lib.o, hash=d90885ad)]\n      Result: File(path=prog, hash=2f43c23c)\n\n    Task a20ef6dc2ab4ed89869514707f94fe18c15f8f66 redun.examples.compile.link\n\n      def link(prog_path: str, o_files: List[File]) -\u003e File:\n          \"\"\"\n          Link several object files together into one program.\n          \"\"\"\n          o_files=\" \".join(o_file.path for o_file in o_files)\n          os.system(f\"gcc -o {prog_path} {o_files}\")\n          return File(prog_path)\n\n\n    Upstream dataflow:\n\n      result = File(path=prog, hash=2f43c23c)\n\n      result \u003c-- \u003c6c56c8d4\u003e link(prog_path, o_files)\n        prog_path = \u003cee510692\u003e 'prog'\n        o_files   = \u003cf1eaf150\u003e [File(path=prog.o, hash=4934098e), File(path=lib.o, hash=d90885ad)]\n\n      prog_path \u003c-- argument of \u003ca4ac4959\u003e make_prog(prog_path, c_files)\n                \u003c-- origin\n\n      o_files \u003c-- derives from\n        compile_result   = \u003cd90885ad\u003e File(path=lib.o, hash=d90885ad)\n        compile_result_2 = \u003c4934098e\u003e File(path=prog.o, hash=4934098e)\n\n      compile_result \u003c-- \u003c45054a8f\u003e compile(c_file)\n        c_file = \u003c2b6a7ce0\u003e File(path=lib.c, hash=2b6a7ce0)\n\n      c_file \u003c-- argument of \u003ca4ac4959\u003e make_prog(prog_path, c_files)\n             \u003c-- argument of \u003ca9a6af53\u003e make(files)\n             \u003c-- origin\n\n      compile_result_2 \u003c-- \u003c8d85cebc\u003e compile(c_file_2)\n        c_file_2 = \u003cdfa3aba7\u003e File(path=prog.c, hash=dfa3aba7)\n\n      c_file_2 \u003c-- argument of \u003c74cceb4e\u003e make_prog(prog_path, c_files)\n               \u003c-- argument of \u003c45400ab5\u003e make(files)\n               \u003c-- origin\n```\n\nThis output shows the original `link` task source code responsible for creating the program `prog`, as well as the full derivation, denoted \"upstream dataflow\". See the full example for a [deeper explanation](examples/02_compile#data-provenance-for-files) of this output. To understand more about the data structure that powers these kinds of queries, see [call graphs](https://insitro.github.io/redun/design.html#call-graphs).\n\nWe can change one of the input files, such as `lib.c`, and rerun the workflow. Due to redun's automatic incremental compute, only the minimal tasks are rerun:\n\n```\nredun run make.py make\n\n[redun] redun :: version 0.4.15\n[redun] config dir: /Users/rasmus/projects/redun/examples/compile/.redun\n[redun] Start Execution 4a2b624d-b6c7-41cb-acca-ec440c2434db:  redun run make.py make\n[redun] Run    Job 84d14769:  redun.examples.compile.make(files={'prog': [File(path=prog.c, hash=dfa3aba7), File(path=lib.c, hash=2b6a7ce0)], 'prog2': [File(path=prog2.c, hash=c748e4c7), File(path=lib.c, hash=2b6a7ce0)]}) on default\n[redun] Run    Job 2f8b4b5f:  redun.examples.compile.make_prog(prog_path='prog', c_files=[File(path=prog.c, hash=dfa3aba7), File(path=lib.c, hash=2b6a7ce0)]) on default\n[redun] Run    Job 4ae4eaf6:  redun.examples.compile.make_prog(prog_path='prog2', c_files=[File(path=prog2.c, hash=c748e4c7), File(path=lib.c, hash=2b6a7ce0)]) on default\n[redun] Cached Job 049a0006:  redun.examples.compile.compile(c_file=File(path=prog.c, hash=dfa3aba7)) (eval_hash=434cbbfe)\n[redun] Run    Job 0f8df953:  redun.examples.compile.compile(c_file=File(path=lib.c, hash=2b6a7ce0)) on default\n[redun] Cached Job 98d24081:  redun.examples.compile.compile(c_file=File(path=prog2.c, hash=c748e4c7)) (eval_hash=96ab0a2b)\n[redun] Run    Job 8c95f048:  redun.examples.compile.link(prog_path='prog', o_files=[File(path=prog.o, hash=4934098e), File(path=lib.o, hash=d90885ad)]) on default\n[redun] Run    Job 9006bd19:  redun.examples.compile.link(prog_path='prog2', o_files=[File(path=prog2.o, hash=cd0b6b7e), File(path=lib.o, hash=d90885ad)]) on default\n[redun]\n[redun] | JOB STATUS 2021/06/18 11:41:43\n[redun] | TASK                             PENDING RUNNING  FAILED  CACHED    DONE   TOTAL\n[redun] |\n[redun] | ALL                                    0       0       0       2       6       8\n[redun] | redun.examples.compile.compile         0       0       0       2       1       3\n[redun] | redun.examples.compile.link            0       0       0       0       2       2\n[redun] | redun.examples.compile.make            0       0       0       0       1       1\n[redun] | redun.examples.compile.make_prog       0       0       0       0       2       2\n[redun]\n[File(path=prog, hash=2f43c23c), File(path=prog2, hash=b4537ad7)]\n```\n\nNotice, two of the compile jobs are cached (`prog.c` and `prog2.c`), but compiling the library `lib.c` and the downstream link steps are correctly rerun.\n\nCheck out the [examples](examples/) for more example workflows and features of redun. Also, see the [design notes](https://insitro.github.io/redun/design.html) for more information on redun's design.\n\n## Data provenance exploration\n\nAll workflow executions are recorded into a database that can be explored using the [Console (TUI)](https://insitro.github.io/redun/design.html#call-graphs). The Console is convenient for debugging large complex workflows, as well as understanding how to reproduce and extend past work.\n\n\u003ca href=\"docs/source/_static/console-execution.svg\"\u003e\u003cimg width=\"45%\" src=\"docs/source/_static/console-execution.svg\"\u003e \u003ca href=\"docs/source/_static/console-job.svg\"\u003e\u003cimg width=\"45%\" src=\"docs/source/_static/console-job.svg\"\u003e\n\n## Mixed compute backends\n\nIn the above example, each task ran in its own thread. However, more generally each task can run in its own process, Docker container, [AWS Batch job](examples/05_aws_batch), or [Spark job](examples/aws_glue). With [minimal configuration](examples/05_aws_batch/.redun/redun.ini), users can lightly annotate where they would like each task to run. redun will automatically handle the data and code movement as well as backend scheduling:\n\n```py\n@task(executor=\"process\")\ndef a_process_task(a):\n    # This task runs in its own process.\n    b = a_batch_task(a)\n    c = a_spark_task(b)\n    return c\n\n@task(executor=\"batch\", memory=4, vcpus=5)\ndef a_batch_task(a):\n    # This task runs in its own AWS Batch job.\n    # ...\n\n@task(executor=\"spark\")\ndef a_spark_task(b):\n    # This task runs in its own Spark job.\n    sc = get_spark_context()\n    # ...\n```\n\nSee the [executor documentation](https://insitro.github.io/redun/executors.html) for more.\n\n## What's the trick?\n\nHow did redun automatically perform parallel compute, caching, and data provenance in the example above? The trick is that redun builds up an [expression graph](https://en.wikipedia.org/wiki/Abstract_semantic_graph) representing the workflow and evaluates the expressions using [graph reduction](https://en.wikipedia.org/wiki/Graph_reduction). For example, the workflow above went through the following evaluation process:\n\n\u003cimg width=\"800\" src=\"examples/02_compile/images/expression-graph.svg\"\u003e\n\nFor a more in-depth walk-through, see the [scheduler tutorial](examples/03_scheduler).\n\n## Why not another workflow engine?\n\nredun focuses on making multi-domain scientific pipelines easy to develop and deploy. The automatic parallelism, caching, code, and data reactivity, as well as data provenance features, make it a great fit for such work. However, redun does not attempt to solve all possible workflow problems, so it's perfectly reasonable to supplement it with other tools. For example, while redun provides a very expressive way to define [task parallelism](https://en.wikipedia.org/wiki/Task_parallelism), it does not attempt to perform the kind of fine-grain [data parallelism](https://en.wikipedia.org/wiki/Data_parallelism) more commonly provided by Spark or Dask. Fortunately, redun does not perform any \"dirty tricks\" (e.g. complex static analysis or call stack manipulation), and so we have found it possible to safely combine redun with other frameworks (e.g. pyspark, pytorch, Dask, etc) to achieve the benefits of each tool.\n\nLastly, redun does not provide its own compute cluster but instead builds upon other systems that do, such as cloud provider services for batch Docker jobs or Spark jobs.\n\nFor more details on how redun compares to other related ideas, see the [influences](https://insitro.github.io/redun/design.html#influences) section.\n","funding_links":[],"categories":["Software","Python","Ranked by starred repositories","Next Generation Sequencing"],"sub_categories":["Trends","Workflow Managers"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finsitro%2Fredun","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finsitro%2Fredun","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finsitro%2Fredun/lists"}