{"id":21202029,"url":"https://github.com/radiantone/entangle","last_synced_at":"2025-07-20T02:08:13.138Z","repository":{"id":49435788,"uuid":"353935875","full_name":"radiantone/entangle","owner":"radiantone","description":"A lightweight (serverless) native python parallel processing framework based on simple decorators and call graphs.","archived":false,"fork":false,"pushed_at":"2022-08-24T08:27:01.000Z","size":2440,"stargazers_count":104,"open_issues_count":1,"forks_count":7,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-10-10T14:48:21.632Z","etag":null,"topics":["artificial-intelligence","containers","dataflow","dataflow-engine","decorator-composition","devops","gpu","gpu-computing","hpc","parallel","parallel-processes","parallel-workflows","python3","scripting","supercomputing","workflow-composition","workflow-managers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/radiantone.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":["radiantone"],"patreon":null}},"created_at":"2021-04-02T07:03:28.000Z","updated_at":"2024-09-19T09:35:59.000Z","dependencies_parsed_at":"2022-08-26T12:50:20.432Z","dependency_job_id":null,"html_url":"https://github.com/radiantone/entangle","commit_stats":null,"previous_names":[],"tags_count":39,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/radiantone%2Fentangle","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/radiantone%2Fentangle/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/radiantone%2Fentangle/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/radiantone%2Fentangle/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/radiantone","download_url":"https://codeload.github.com/radiantone/entangle/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225622846,"owners_count":17498168,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","containers","dataflow","dataflow-engine","decorator-composition","devops","gpu","gpu-computing","hpc","parallel","parallel-processes","parallel-workflows","python3","scripting","supercomputing","workflow-composition","workflow-managers"],"created_at":"2024-11-20T20:12:31.011Z","updated_at":"2024-11-20T20:12:31.744Z","avatar_url":"https://github.com/radiantone.png","language":"Python","readme":"*This version: 0.2.3*\n\n![logo](./images/logo.png)\n\n*Current development version is here: [0.2.4](https://github.com/radiantone/entangle/tree/0.2.4)*\n\nA lightweight (serverless) native python parallel processing framework based on simple decorators and call graphs, supporting both *control flow* and *dataflow* execution paradigms as well as de-centralized CPU \u0026 GPU scheduling. \n\n\u003e For a quick look at what makes Entangle special, take a look at [Design Goals](#design-goals).\n\n## New In This Release\n\n- Bug fixes to process.py, ssh.py\n- Distributed dataflow example\n- Dataflow decorator re-write. Now works with ssh for distributed dataflow. Fixes prior issues with local dataflows.\n- Retry usage example \n- Dockerfile provided for quick and easy experimentation.\n- Workflows can now return the call graph structure upon completion. See [Graph Example](#graph-example)\n- Support for workflow futures (if that's your thing) See [Workflow Future Example](#workflow-future-example)\n\n## Quick Usage\n\nWith Entangle you can run simple, hardware parallelized code with conditional logic that looks like this.\n\n```python\nresult = add(\n            add(\n                num(6),\n                two() if False else one()\n            ),\n            subtract(\n                five(),\n                two()\n            )\n)\nprint(result())\n```\nor train two AI models in parallel using tensorflow container utilizing dedicated CPU and GPU usage.\n```python\n@process\n@docker(image=\"tensorflow/tensorflow:latest-gpu\", packages=['tensorflow_datasets'])\ndef train_modelA():\n    # train it\n    return\n\n@process\n@docker(image=\"tensorflow/tensorflow:latest-gpu\", packages=['tensorflow_datasets'])\ndef train_modelB():\n    # train it\n    return\n\n@workflow\ndef train_models(*args):\n    # I'm training a bunch of models in parallel!\n    return\n\nworkflow = train_models(\n    train_modelA(),\n    train_modelB()\n)\n\nresult = workflow()\n```\n### Docker\nTo quickly get started with Entangle, build and run a docker container from the included Dockerfile.\n\n```bash\n$ docker build -t entangle .\n$ docker run -it entangle:latest\nroot@9579336b3e34:/# python -m entangle.examples.example\n```\nOr if you have the [NVIDIA Docker Environment](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker) setup you can test the numba GPU vector example.\n\n```bash\n$ docker run -it --gpus all entangle\nroot@13428af4a37b:/# python -m entangle.examples.example3\n(0.2957176749914652, 0.41134210501331836)\n```\n## Wiki Articles\n- [How Does Entangle Compare to Other Parallel Compute Frameworks?](https://github.com/radiantone/entangle/wiki)\n- [Development Roadmap \u0026 Contributions](https://github.com/radiantone/entangle/wiki/Development)\n\n## Outline\n\n* [Overview](#overview)\n  * [What does \"Entangle\" mean?](#what-does-entangle-mean)\n  * [Important Notes](#important-notes)\n* [Installation](#installation)\n* [Design Goals](#design-goals)  \n* [Architecture](#architecture)\n* [Declarative Infrastructure](#declarative-infrastructure)\n* [Execution](#execution)  \n* [Workflows](#workflows)  \n* [Process Behavior](#process-behavior)  \n* [Composition](#composition)  \n  * [Decorator Composition](#decorator-composition)\n  * [Workflow Composition](#workflow-composition)\n* [Containers](#containers)  \n* [Dataflows](#dataflows)\n  * [Dataflow vs Workflows](#dataflows-vs-workflows)\n  * [DAG Dataflow](#dag-dataflow)\n  * [Results Comparison](#results-comparison)\n  * [Advantages of Strict Dataflow](#advantages-of-strict-dataflow)\n* [Schedulers](#schedulers)\n* [Distributed Flows](#distributed-flows)  \n* [Examples](#examples)\n    * [GPU Example](#gpu-example)\n    * [Shared Memory Example](#shared-memory-example)\n    * [AI Example](#ai-example)\n    * [Dataflow Examples](#dataflow-examples)\n      * [Data-Driven Branching](#data-driven-branching)\n      * [Distributed Dataflow](#distributed-dataflow)\n    * [Docker Example](#docker-example)\n    * [Scheduler Example](#scheduler-example)\n    * [Graph Example](#graph-example)\n    * [Workflow Future Example](#workflow-future-example)\n* [Logging](#logging)\n* [Design Tool](#design-tool)  \n\n## Overview\n\nEntangle is a *different* kind of parallel compute framework for multi-CPU/GPU environments. \nIt allows for simple workflow design using *plain old python* and special decorators that control the type of parallel compute and infrastructure needed.\n\nOne key feature of entangle is fine-grained control over individual functions in a workflow. You could easily describe multiple functions running across multiple compute environments all interacting as if they were simple local python functions.\nNo central scheduler or workflow manager is needed allowing you to choose where and how functions operate with *declarative infrastructure*.\n\nAnother unique quality is the use of composition to build parallel workflows dynamically.\n\n### What does \"Entangle\" mean?\n\nThe term is derived from a quantum physics phenomena called *[quantum entanglement](https://en.wikipedia.org/wiki/Quantum_entanglement#:~:text=Quantum%20entanglement%20is%20a%20physical,including%20when%20the%20particles%20are)* which involves the state of a pair or group of particles affecting one another *at a distance* instantaneously.\n\nIn this context, it is a metaphor for how tasks send data (particles) to one another in the context of a connected microflow.\n\n### IMPORTANT NOTES!\n\nPlease keep in mind that Entangle is *in development* and is classified as `Pre-Alpha`. Some of the functionality shown here is incomplete. If you clone this repo and want to experiment be sure to update often as things break, improve, get fixed etc. quite frequently. The `main` branch will always contain the most current release. All development for the next version is done on the development branch for the next released listed at the top of this document.\n\n## Installation\n\nNOTE: At the moment entangle only works with python 3.8 due to how coroutines work there and also shared memory features.\n\nFrom PyPi\n\n```shell\n$ pip install --upgrade py-entangle\n$ python -m entangle.examples.example\n```\nFrom repo root\n\n*python3.8*\n\n```shell\n\n$ virtualenv --python=python3.8 venv\n$ source venv/bin/activate\n(venv) $ python setup.py install\n(venv) $ python -m entangle.examples.example\n```\n\n*miniconda3*\n1. Install [miniconda3](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html) with python3.8 for linux\n```shell\n$ conda init   \n$ python setup.py install\n$ python -m entangle.examples.example\n```\n\n### Installing Numba\n\n\u003e NOTE: Numba package is disabled by default in `setup.py`. If you want this package, just uncomment it; however some OS specific steps might be required.\n\nOn some systems you might encounter the following error when trying to install  `numba`.\n```bash\nRuntimeError: Could not find a `llvm-config` binary.\n```\n\nTry the following remedy (for ubuntu systems)\n\n```bash\n$ sudo apt-get install -y --no-install-recommends  llvm-10 llvm-10-dev\n$ export LLVM_CONFIG=/usr/bin/llvm-config-10\n$ pip3 install numba\n```\n\n### Testing\n\n```shell\n$ pytest --verbose --color=yes --disable-pytest-warnings --no-summary --pyargs entangle.tests\n```\nor if you don't have GPU\n```shell\n$ pytest --verbose --color=yes --pyargs entangle.tests.test_entangle\n```\nor just do this\n```shell\n$ python setup.py test\n```\n### Cleaning\nClean all build files, directories, temp files and any files created by examples and tests.\n\n```shell\n$ python setup.py clean\n```\n### Miniconda\n\nIf you are planning to run or use GPU enabled code it is recommended to set up a [miniconda3](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html) virtualenv.\n\n## Design Goals\n\n* Small \u0026 Simple\n* Easy to Understand\n* API-less\n* Plain Old Python\n* True Parallelism\n* Pluggable \u0026 Flexible\n* Composition Based\n* Shared-Nothing\n* Serverless \u0026 Threadless\n* True Dataflow Support\n* CPU/GPU Scheduling\n* Distributed Dataflow\n\n## Architecture\n\nEntangle is designed without a central scheduler or workflow manager. Rather, each function is decorated with special descriptors that turn them into their own workflow managers.\nThese *decorators* implement logic to parallelize and *gather* values from its dependent arguments, which are executed as separate processes. As each function is assigned a dedicated CPU, the workflow is thus an ensemble of parallel, independent micro-flows that resolve themselves and pass their values into queues until the workflow completes.\n\nThis offers an extreme *shared nothing* design that maximizes CPU usage in a multi-CPU environment.\n\nEach function (or task) is given a process and scheduled to a CPU by the operating system. Since python Processes are native bound OS processes, this inherits the benefit of the operating system scheduler which is optimized for the underlying hardware.\nArguments that satisfy the function are run in parallel in the same fashion. The parent function then uses asyncio coroutines to monitor queues for the results from the processes. This keeps the CPU usage down while the dependent functions produce their results and eliminates the need for monitor threads.\n\nAs a workflow executes, it fans out over CPUs. Each process acting as it's own scheduler to spawn new processes and resolve arguments, while also monitoring queues for incoming results asynchronously.\nThis makes the workflow a truly emergent, dynamic computing construct vs a monolithic service managing all the pieces. Of course, this is not to say one approach is better, just that entangle takes a different approach based on its preferred tradeoffs.\n![arch](./images/arch.png)\n\n### Tradeoffs\n\nEvery design approach is a balance of tradeoffs. Entangle favors CPU utilization and *true* parallelism over resource managers, centralized (which is to say network centric) schedulers or other shared services.\nIt favors simplicity over behavior - leaving specific extensions to you, attempting to be minimal and un-opinionated. It tries to be *invisible* to the end user as much as possible. It strives for the basic principle that, *\"if it looks like it should work, it should work.\"*\n\nEntangle leans on the OS scheduler to prioritize processes based on the behavior of those processes and underlying resource utilizations. It therefore does not provide its own redundant (which is to say *centralized*) scheduler or task manager. Because of this, top-down visibility or control of workflow processes is not as easy as with centralized task managers.\n\nEntangle prefers the non-API approach, where it looks like regular python expressions, over strict API's or invocation idioms. This makes it easier to pick up and use and plays well with 3rd party frameworks too.\n\n### Use Cases\n\nBecause of these tradeoffs, there are certain use cases that align with entangle and others that probably do not.\n\nIf you want top-down visibility \u0026 control of workflows and tasks, Entangle is probably not ready for you.\n\nIf you have lots of CPUs, entangle could be for you! If you want easy python workflows that span local and remote cloud resources, entangle could be for you.\nIf you want to write custom handlers that enrich or execute code in custom ways for your needs, entangle makes this easy for you.\n\n#### Orchestration\n\nOne focused use case for entangle is when you want to orchestrate across different compute nodes, remote APIs and other disparate endpoints in a single workflow, with inherent parallelism.\n\n![workflow](./images/workflow.png)\n\nEach step of the workflow has different parameters, needs and protocols used to communicate with it.\nSuch a workflow might simply look like:\n\n```python\ndata = data_refinement(\n    get_source_data()\n)\nresult = measure_vectors(\n    vector1(\n        data(\"vector1\")\n    ),\n    vector2(\n        data(\"vector2\")\n    ),\n    vector3(\n        data(\"vector3\")\n    )\n)\n```\n\n#### GPU Processing\nAnother use case is the need to run multiple parallel tasks that operate on matrix data using a GPU.\nEntangle makes this quite easy as seen in [GPU Example](#gpu-example), [Docker Example](#docker-example) and [Shared Memory Example](#shared-memory-example)\n\n#### DevOps\nFor devops use cases Entangle allows you to write simple, parallel workflow graphs using *plain old python*. This let's you write efficient parallel devops pipelines with ease. Build simple workflows that do powerful things like orchestrating across multiple clouds, services, repositories etc in an efficient dataflow parallel design.\n\n\n### What Entangle is not\nHere are some things entangle is not, *out-of-the-box*. This isn't to say entangle can't do these things. In fact, entangle is designed to be a low level framework for implementing these kinds of things.\n\n* Entangle does not yet perform fail over (TBD)\n* Entangle is not a batch process framework (TBD)\n* Entangle is not map/reduce\n* Entangle is not a centralized task manager\n\n## Declarative Infrastructure\n\nEntangle allows you to target specific infrastructure environments or needs using simple decorators.\n\nFor example, to specify a process run on local hardware you can use the @local decorator\n\n```python\n@process\n@local\ndef myfunc():\n    return\n```\n\nIf you want to execute a function in AWS EC2 or fargate, you could write it as:\n\n```python\n@process\n@aws(keys=[])\n@ec2(ami='ami-12345')\ndef myfunc():\n    return\n\n@process\n@aws(keys=[])\n@fargate(ram='2GB', cpu='Xeon')\ndef myfunc():\n    return\n```\n\nor using docker containers\n```python\n@process\n@docker(image=\"tensorflow/tensorflow:latest-gpu\")\ndef reduce_sum():\n    import tensorflow as tf\n    return tf.reduce_sum(tf.random.normal([1000, 1000]))\n```\n![infrastructure](./images/infrastructure.png)\n\nIf you have a custom on-prem environment you can write a simple decorator that deploys the task to that and use it alongside other infrastructure decorators.\n\n### Where, What \u0026 How: Using Mixins\n\nEntangle uses the concept of *mixins* to associate infrastructure needs (where) with compute (what) and concurrency needs (how). \nThus, it allows you to mix and match combinations of these within a single workflow or dataflow.\n\nFor example, you might need to get data from `AWS Lambda`, run `GPU Algorithm on that data inside a container` then send those results to 10 CPUs in a `compute cloud` for *parallel analysis*, then gather those results and send them to a `web service` on your network for storage or rendering.\nAll these steps have different locations, infrastructure requirements, compute needs, processing times, and protocols.\n\n## Execution\n\nAs we mentioned above, entangle workflows will fan out during execution and occupy CPUs throughout the workflow. The OS will determine the priority of processes based on their resource needs at the time of execution.\nHere is a simple workflow and diagram showing how the parallel execution unfolds.\n\n```python\nresult = add(\n   mult(\n      one(),\n      two()\n   ),\n   three()\n)\n```\n\nThis execution order applies to workflows in entange. If you use `@dataflow` decorator the execution follows that of a dataflow compute model. Refer to the section [Dataflows](#dataflows) for more information.\n\n![execution](./images/execution.png)\n\n\n### Threads vs Processes\n\nIn Python, threads do not execute in parallel to one another, it only gives the illusion of such. Python handles the context switching between threads and is limited by the GIL.\nProcesses on the other hand, are not controlled by a GIL and can thus truly run in parallel. The host operating system governs the sheduling of processes and entangle is designed to exploit this benefit.\n\n\n## Workflows\n\nA workflow in Entangle is just a fancy term for a call graph of function processes. The example above in [Execution](#execution) is a simple workflow.\nWorkflows execute in *natural dependency ordering* that you'd expect from any python function call. The inner most dependencies are invoked first so they can return their values to parent functions.\n\nNote that this paradigm is pretty much the way most imperative languages operate today, but it does differ from *dataflows* which we talk about down below [Dataflows](#dataflows).\n\n### Imperative vs Structured Declaration\n\nIn Entangle, there are two ways you can write your workflows, depending which is more convenient for you. Both produce in the same execution sequence and results.\n\nLet's look at the example below:\n\n```python\nresult = add(\n            add(\n                num(6),\n                two() if False else one()\n            ),\n            subtract(\n                five(),\n                two()\n            )\n)\n```\nThis represents the *structured* paradigm, based off a more strict type of lambda math notation where functions invoke functions until the top-most value is produced.\n\nWe can also write this as a sequence of *imperative* declarations\n\n```python\n_five = five()\n_two = two()\n_sub = subtract(_five,_two)\n_num = num(6)\n_two2 = two() if False else one()\n_add1 = add(_num,_two2)\nresult = add(_add1,_sub)\n```\n\n## Process Behavior\n\nKeyword parameters on the `@process` decorator allow you to control some meta-behavior of the process.\n\n### Wait\n\nWait indicates how long a function should wait before its arguments arrive. It is a sibling to *timeout* however it is different.\n```python\n@process(wait=20)\ndef values(*args):\n    values = [arg for arg in args]\n\n    return values\n\n\no = values(\n    one(),\n    train()\n)\n```\nIn the above example, it is saying that the `values` function will wait up to 20 seconds for *both* `one()` and `train()` functions to complete and return values otherwise it will throw a `ProcessTimeoutException`.\n### Timeout\n\nTimeout is more self-evident. It is the wait period in seconds, entangle will allow a process to run.\n\n```python\n\n# Wait at most, 3 seconds for this task to complete\n@process(timeout=3)\ndef task():\n    return True\n\n# Wait indefinitely for this task to complete\n@process\ndef taskB():\n    return False\n```\n\nWhen a process times out, a `ProcessTimeoutException` will be thrown by Entangle and the process will be terminated if it is still alive.\n\n## Composition\n\nEntangle offers a couple different ways to use composition effectively: with *decorators* and with *workflows*.\n\n### Decorator Composition\n\nYou can compose your tasks by combining process and infrastructure decorators.\n\nAgain, in the example below, we are declaring a *process* and *local* infrastructure for our task to run by composing two decorators together.\n```python\n@process\n@local\ndef taskA():\n    return\n```\nor, specifying that the task run as a process inside AWS fargate, unchanged.\n```python\n@process\n@aws(keys=[])\n@fargate(ram='2GB', cpu=4)\ndef taskA():\n    return\n```\n### Workflow Composition\nComposing workflows is just as simple. You can write code that itself constructs workflows on the fly easily.\n\n```python\nfrom entangle.process import process\nfrom entangle.http import request\nfrom entangle.workflow import workflow\n\n@process\n@request(url='https://datausa.io/api/data', method='GET')\ndef mydata(data):\n    import json\n    data = json.loads(data)\n    return int(data['data'][0]['Year'])\n\n@process\ndef two():\n    return 2\n\n@process\ndef add(a, b):\n    v = int(a) + int(b)\n    print(\"ADD: *\"+str(v)+\"*\")\n    return v\n\n@workflow\ndef workflow1():\n    return add(\n        mydata(drilldowns='Nation', measures='Population'),\n        two()\n    )\n\n@workflow\ndef workflow2(value):\n    return add(\n        value(),\n        two()\n    )\n\nresult = workflow2(workflow1)\n\nprint(result())\n```\n\nThe key to making this work is the *deferring of execution* trait of Entangle which we will discuss in a later post.\nBut essentially it allows for separation of workflow *declaration* from *execution*. Doing this allows you to treat workflows as objects and pass them around anywhere a normal python function (or workflow) is expected. Prior to execution.\n\n## Containers\n\nEntangle supports two container technologies: *docker* and *singularity*(TBD). These are used with the associated decorators `@docker` and `@singularity`. \nUsing containers allows you to run functions that have complex OS or python depdendencies not native to your hosting environment.\n\nFor a complete example, please see [Docker Example](#docker-example)\n\n## Dataflows \n\n\nEntangle supports two kinds of execution flow, *dataflow*[[8]](#references) and *workflow* (or what is more traditionally called *control flow*). They both complete a DAG-based execution graph but in slightly different ways and with different advantages to the programmer.\n\nAs wikipedia states[[8]](#references):\n\n\u003e Dataflow is a software paradigm based on the idea of disconnecting computational actors into stages (pipelines) that can execute concurrently. Dataflow can also be called stream processing or reactive programming.[1]\n\nHowever, Merriam-Webster's simple definition[[9]](#references) illuminates a key trait of dataflows - \"...*as data becomes available*\"\n\n\u003e : a computer architecture that utilizes multiple parallel processors to perform simultaneous operations as data becomes available\n\n### Data Readiness\n\nIn many parallel data computations the arrival or readiness of some data might lag behind other data, perhaps coming from longer computations or farther away.\nTrue dataflow models allow the computation to proceed on a parallel path *as far as it can go* with the currently available data.\nThis means dependent operations are not held up by control flow execution order in some cases and the overall computation is optimized.\n\n![dataflow](./images/dataflow2.png)\n\n### Dataflows vs Workflows\n\nAuthor, slikts [[1]](#references), descibes these differences very nicely (from [[1]](#references)).\n\n\u003e Control flow refers to the path the point of execution takes in a program, and sequential programming that focuses on explicit control flow using control structures like loops or conditionals is called imperative programming. In an imperative model, data may follow the control flow, but the main question is about the order of execution.\n\u003e\n\u003eDataflow abstracts over explicit control flow by placing the emphasis on the routing and transformation of data and is part of the declarative programming paradigm. In a dataflow model, control follows data and computations are executed implicitly based on data availability.\n\u003e\n\u003e Concurrency control refers to the use of explicit mechanisms like locks to synchronize interdependent concurrent computations. Dataflow is also used to abstract over explicit concurrency control.\n### Simple Example\nLet's start with a simple workflow example:\n\n`A(B(),C())`\n\nIn traditional *control flow* or what I call *lambda based*[[10]](#references) execution, the programming language's *dependency analysis* will determine the order of execution. In this example `B()` and `C()` are dependencies of `A()` and thus need to complete *before* `A()` can be executed. In other words, they are *inputs* to `A()`. Basic stuff.\n\nThis means the execution of each compute function is aware of the specific dependent functions it must resolve first.\nWe call this *control depedency* [[2]](#references).\n\nLet's say the dependency was reversed. Whereby, a value computed by `A()` was a dependency of *both* `B()` and `C()`. How would we write this in conventional *control flow*?\n\nWe might do something like this.\n\n```python\nB(A())\nC(A())\n```\n\nBreaking our previous single expression into multiple expressions. However, in this case, `A()` is being invoked twice, which could produce different values.\nSo we might introduce a variable\n\n```python\na = A()\nB = B(a)\nC = C(a)\n```\n\nNow we have 3 expressions that must run in a proper order. We have done some of the work by making a separate expression for our dependent value `a`. But for large dataflows this can be a bigger burden on the programmer to unravel all the dependencies and put them in proper order.\n\nWhat if the execution of an expression was not computed using the traditional *dependency analysis* most languages use today but instead was defined by stricter *dataflow* semantics?\n\n### DAG Dataflow\nIn dataflow, a DAG represents the flow of values from compute nodes where each node computes its value once and the value is *emitted* or sent to directionally connected nodes in the DAG.\n\n\n![dataflow](./images/dataflow.png)\n\nThis paradigm makes it easier to express our intentions of sharing values from `A()` by computing it once and sending the results to `B()` and `C()`. Neither `B()` nor `C()` explicitly depend on `A()`. The dataflow DAG provides the dependency structure for all the compute nodes.\n\nNow let's rewrite our expression if it were executed in strict *dataflow* order.\n\n```python\nA(\n   B(),\n   C()\n)\n```\n\nHere, the dataflow engine executing this expression understands the intention to compute `A()` first, then *in parallel* compute `B()` and `C()` with the *same* result computed only once from `A()` as their input.\nWritten imperatively, this would equate to:\n\n```python\na = A()\nB(a)\nC(a)\n```\nwhere `B(a)` and `C(a)` run in parallel.\n\n**IMPORTANT!** The dataflow syntax provides the necessary graph structure for a dataflow engine to know explicitly which functions can operate in parallel.\n\n### Results Comparison\n\nSo what are the differences in the results from our *workflow* version and our *dataflow* version? It should be clear that the workflow version takes as input 2 values (B(),C()) and produces 1 value, A().\n\nHowever, our dataflow version is different. It takes as input 1 value A() and produces two results, B() and C(), in parallel. So the computations are different!\n\n### Advantages of Strict Dataflow\n\nAs was pointed out in the intro to this section, dataflow provides declarative *data dependency* modelling for a computation. This is sometimes a more natural way of thinking about a problem for the human programmer.\nIt allows a clean separation between the initial state of a dataflow and various desired outcomes that would be more difficult to model using *control flow* programming, as the programmer will have to use multiple imperative steps to introduce the proper execution order and indicating which computations are parallel is unclear from linear ordering.\n\nDataflow has improved efficiencies when it comes to data-centric computations as well because it only computes nodes once per DAG execution.\nThis approach requires no *caching* or *variables* that might be required with imperative-based control flow.\n\n#### Naturally Parallel\n\nA dataflow DAG is a naturally and implicitly parallel model - by its declarative structure. For CPU-bound, data centric tasks it is simple and easy to understand for this reason.\n\n#### Detailed Example\n\nFor a more detailed example of using `@dataflow` in entangle see [Dataflow Examples](#dataflow-examples).\n### References\n\n1. Concurrency Glossary - https://slikts.github.io/concurrency-glossary/\n2. Dependency Graphs - https://en.wikipedia.org/wiki/Dependency_graph\n3. Dataflow Programming - https://en.wikipedia.org/wiki/Dataflow_programming\n4. Data-Flow vs Control-Flow for Extreme Level Computing - https://ieeexplore.ieee.org/document/6919190\n5. Advances in Dataflow Programming Languages - https://futureofcoding.org/notes/dataflow/advances-in-dataflow-programming-langauges.html\n6. Data dependency - https://en.wikipedia.org/wiki/Data_dependency\n7. An introduction to a formal theory of dependence analysis - https://link.springer.com/article/10.1007/BF00128174\n8. Dataflow - https://en.wikipedia.org/wiki/Dataflow\n9. Dataflow - https://www.merriam-webster.com/dictionary/dataflow\n10. Functional Programming/Lambda Calculus -https://www.tutorialspoint.com/functional_programming/functional_programming_lambda_calculus.htm\n\n## Schedulers\n\nEntangle supports a composition based mechanism for attaching schedulers to workflows and functions.\nThe scheduler class will control access to CPU resources based on its constraints. For example, if you want to run a workflow with potentially 20 parallel tasks, but only want to allocate 4 CPUs to execute the workflow, the scheduler class can ensure entangle doesn't spawn more processes than requested.\nSchedulers wrap individual functions and pull CPU \"cookies\" off a scheduler queue to hand off to processes. Each cookie contains a CPU identifier that the process then binds to. When a cookie is placed on the queue by a process it means that CPU (id) is available for use.\n\nParallel processes thus use the queue mechanism to *self-organize* around the allocated CPUs by requesting cookies, assigning their CPU affinity to that cpu id, running their behaviors and returning the cookie to the queue when complete.\nThis approach requires no centralized scheduler server as the workfow processes all use the same multiprocessing.Queue to retrieve CPU cookies.\n\n![scheduler](./images/scheduler.png)\n\n### Pluggable Schedulers\n\nEntangle allows you to provide your own scheduler class using the decorator.\n\n```python\n@scheduler(cpus=4,impl='my.package.MyScheduler'}\ndef myfunc():\n    return\n```\nCurrently, the scheduler class need only implement one method.\n\n`def register(self, f, cpus=12):`\n\nand return a `function` or `partial` that wraps the provided function with scheduler behavior.\n\nTo see the implementation of `DefaultScheduler` click [here](https://github.com/radiantone/entangle/blob/main/entangle/scheduler.py).\n\nFor a workflow example using scheduler see [Scheduler Example](#scheduler-example)  below.\n\n## Distributed Flows\n\nEntangle allows you to pass a workflow (or dataflow) to a remote machine for execution. When combined with `@scheduler` decorators this also forwards scheduler behavior to the remote machine where it manages the received workflow there.\nThis type of propagation requires no centralized (i.e. shared) scheduler or services and thus scales very well.\n\nMoreover, parts of a workflow can be sent to different machines for a truly distributed workflow.\n\n### SSH Decorator\n\nFunctions or flows (graph of functions) are remoted by using the `@ssh` decorator like the example below.\n\n```python\n@ssh(user='me', host='radiant', key='/home/me/.ssh/id_rsa.pub', python='/home/me/venv/bin/python')\n@scheduler(**scheduler_config)\n@thread\ndef workflow2():\n    pass\n```\n\nIn this example, we have declared a (contrived) workflow that adds the return values of 2 embedded functions and returns it.\nThe `@ssh` decorator indicates that this workflow is to be copied and executed on the remote server `myserver` as user `me` using the python executable at `/home/me/venv/bin/python`.\nEntangle will marshall the codes to the remote server and execute them there. \n\n### Scheduler Propagation\n\nIf any dependent or subsequent functions are invoked on the remote server, any decorators that apply to those will be enforced.\nIf you use `@scheduler` then it will utilize the *scheduler queue* to request CPU cookies. If you also use another `@ssh` decorator then that dependent function will be shipped to a 3rd remote server and the process repeated there.\n\n*diagram here*\n\nEach time a workflow decorated with `@scheduler` is sent to a remote machine, that scheduler then manages its portion of the workflow and any dependent functions that it might resolve.\nThis pattern forms a sort of *distributed tree* of schedulers that work in parallel across multiple machines, yet fully resolve to complete the root workflow.\n\nLet's take a closer look at this example, which uses 3 different machines to solve its workflow.\n\n```python\n\n@ssh(user='darren', host='miko', key='/home/darren/.ssh/id_rsa.pub', python='/home/darren/venv/bin/python')\n@scheduler(**config)\n@process\ndef two():\n    # run some codes\n    return 2\n\n@ssh(user='darren', host='radiant', key='/home/darren/.ssh/id_rsa.pub', python='/home/darren/venv/bin/python')\n@scheduler(**config)\n@process\ndef three():\n    # run some codes\n    return 3\n\n@scheduler(**config)\n@process\ndef add(a, b):\n    v = int(a.get_result()) + int(b)\n    return v\n\n@ssh(user='darren', host='phoenix', key='/home/darren/.ssh/id_rsa.pub', python='/home/darren/venv/bin/python')\n@scheduler(**config)\n@process\ndef workflow():\n\n    _add = add(\n        three(),\n        two()\n    )\n    return _add()\n```\n\nIn the above example, the `workflow()` is first sent to machine `phoenix` and executed there. It wraps the function `add` which also executes on `phoenix` because it has no `@ssh` decorator and the workflow is already there.\n\nThe `add()` function requires the functions `three()` and `two()` be solved first. These two functions are sent to machines `miko` and `radiant` to be solved.\nThe results are returned to the `add` function running on `phoenix` and the result of the `workflow()` is returned to the calling machine, or the machine where the workflow was executed on.\n\nOnce the `workflow()` reaches `phoenix` the `@scheduler` attached to the workflow manages the CPU's there according to its constraints.\nSince the `add()` function has two dependencies that can run in parallel the `@schedular` can request 2 CPUs and run them in parallel.\n\n\n*diagram*\n\n## Examples\n\n* [GPU Example](#gpu-example)\n* [Shared Memory Example](#shared-memory-example)\n* [AI Example](#ai-example)\n* [Docker Example](#docker-example)\n* [Dataflow Examples](#dataflow-examples)  \n  * [Data-Driven Branching](#data-driven-branching)\n  * [Distributed Dataflow](#distributed-dataflow)\n* [Scheduler Example](#scheduler-example)  \n* [Graph Example](#graph-example)\n* [Workflow Future Example](#workflow-future-example)\n* [Retry Example](#retry-example)\n* [General Example](#general-example)\n\nThere are a variety of example workflows and dataflows you can run. In addition to the sample code provided below you can run these using the following commands.\n```shell\n$ python -m entangle.examples.example\n$ python -m entangle.examples.example2\n$ python -m entangle.examples.example3\n$ python -m entangle.examples.example4\n$ python -m entangle.examples.example5\n$ python -m entangle.examples.example6\n$ python -m entangle.examples.example_graph.py\n$ python -m entangle.examples.example_graph_future.py\n$ python -m entangle.examples.example_with_future.py\n$ python -m entangle.examples.lambdaexample\n$ python -m entangle.examples.listexample\n$ python -m entangle.examples.listexample2\n$ python -m entangle.examples.dataflowexample\n$ python -m entangle.examples.dataflowexample2\n$ python -m entangle.examples.dockerexample\n$ python -m entangle.examples.aiexample\n$ python -m entangle.examples.retry_example\n$ python -m entangle.examples.schedulerexample\n$ python -m entangle.examples.schedulerexample2\n$ python -m entangle.examples.sshdatafloweexample\n$ python -m entangle.examples.sshschedulerexample\n$ python -m entangle.examples.timeoutexample\n```\nFor a complete list of the examples source code and binders to run them please visit the wiki.\n\n### GPU Example\nThis example assumes you have installed `nvidia-cuda-toolkit` and associated python packages along with `numba`.\n\nIn this example, two vectorized functions with different sized matrices are run in parallel, and their times are gathered.\n```python\nimport numpy as np\nfrom entangle.process import process\nfrom timeit import default_timer as timer\nfrom numba import vectorize\n\n@process\ndef dovectors1():\n\n    @vectorize(['float32(float32, float32)'], target='cuda')\n    def pow(a, b):\n        return a ** b\n\n    vec_size = 100\n\n    a = b = np.array(np.random.sample(vec_size), dtype=np.float32)\n    c = np.zeros(vec_size, dtype=np.float32)\n\n    start = timer()\n    pow(a, b)\n    duration = timer() - start\n    return duration\n\n@process\ndef dovectors2():\n\n    @vectorize(['float32(float32, float32)'], target='cuda')\n    def pow(a, b):\n        return a ** b\n\n    vec_size = 100000000\n\n    a = b = np.array(np.random.sample(vec_size), dtype=np.float32)\n    c = np.zeros(vec_size, dtype=np.float32)\n\n    start = timer()\n    pow(a, b)\n    duration = timer() - start\n    return duration\n\n@process\ndef durations(*args):\n\n    times = [arg for arg in args]\n\n    return times\n\ndp = durations(\n    dovectors1(),\n    dovectors2()\n)\n\nprint(dp())\n\n```\nWhich outputs something like\n```python\n[0.21504536108113825, 0.3445616390090436]\n```\n\n### Shared Memory Example\n\nThe default return value conduit in Entangle is the Queue. Task return values are marshalled back through queues where they are gathered and provided as function parameters to the parent process task.\nThis method is not desirable for very large data sets such as matrices in GPU computations. The below example shows how Entangle uses python 3.8's shared memory feature to implicitly share volatile memory across native parallel processes.\n\n\n![sharedmemory](./images/memory.png)\n```python\nimport numpy as np\nfrom entangle.process import process\nfrom timeit import default_timer as timer\nfrom numba import vectorize\n\n@process(shared_memory=True)\ndef dopow(names, smm=None, sm=None):\n    (namea, nameb, shapea, shapeb, typea, typeb) = names\n\n    start = timer()\n    shma = sm(namea)\n    shmb = sm(nameb)\n\n    # Get matrixes from shared memory\n    np_shma = np.frombuffer(shma.buf, dtype=typea)\n    np_shmb = np.frombuffer(shmb.buf, dtype=typeb)\n\n    @vectorize(['float32(float32, float32)'], target='cuda')\n    def pow(a, b):\n        return a ** b\n\n    pow(np_shma, np_shmb)\n    duration = timer() - start\n    print(\"Powers Time: \", duration)\n\n@process(shared_memory=True)\ndef createvectors(smm=None, sm=None):\n\n    vec_size = 100000000\n\n    start = timer()\n    a = b = np.array(np.random.sample(vec_size), dtype=np.float32)\n    c = np.zeros(vec_size, dtype=np.float32)\n\n    # create shared memory for matrices\n    shma = smm.SharedMemory(a.nbytes)\n    shmb = smm.SharedMemory(b.nbytes)\n\n    names = (shma.name, shmb.name, a.shape, b.shape, a.dtype, b.dtype)\n\n    duration = timer() - start\n    print(\"Create Vectors Time: \", duration)\n    return names\n\ndp = dopow(\n    createvectors()\n)\n\ndp()\n```\n\nOutputs\n\n```bash\nCreate Vectors Time:  0.8577492530457675\nPowers Time:  0.8135421359911561\n```\n\n#### SharedMemoryManager \u0026 SharedMemory\n\nIn the example above, you will notice two special keywords being passed into the functions, \n\n`def createvectors(smm=None, sm=None):`\n\n`smm` is a handle to the `SharedMemoryManager` being used for this workflow and `sm` is a handle to the `SharedMemory` class needed to acquire the shared memory segments by name.\nIf you set `shared_memory=True` then you *must* include these keyword arguments in your method or an error will occur.\n\n\n---\n\nNow, you might be asking yourself, if one of the design goals was *shared-nothing* then why are we talking about *shared memory*?\nWhen we say \"shared\" (in shared-nothing) we refer to resources that have to be synchronized or locked when accessed by parallel processes, thereby creating bottlenecks in the execution.\nThe shared memory example here does not introduce any contention, rather, it is used in a pipeline fashion.\nIn this approach, a given shared memory address is only updated by one process at a time (e.g. using it to return its data to the waiting process). Multiple shared memory segments can be created during the course of a workflow\nfor parallel running processes.\n\n\n### AI Example\nHere is an example that uses tensorflow to train a model and return the summary.\n```python\nfrom entangle.logging.debug import logging\nfrom entangle.docker import docker\nfrom entangle.process import process\n\n@process\n@docker(image=\"tensorflow/tensorflow:latest-gpu\", packages=['tensorflow_datasets'])\ndef train():\n    import tensorflow as tf\n    import tensorflow_datasets as tfds\n\n    (ds_train, ds_test), ds_info = tfds.load(\n        'mnist',\n        split=['train', 'test'],\n        shuffle_files=True,\n        as_supervised=True,\n        with_info=True,\n    )\n\n    def normalize_img(image, label):\n        return tf.cast(image, tf.float32) / 255., label\n\n    ds_train = ds_train.map(\n        normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)\n    ds_train = ds_train.cache()\n    ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)\n    ds_train = ds_train.batch(128)\n    ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)\n\n    ds_test = ds_test.map(\n        normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)\n    ds_test = ds_test.batch(128)\n    ds_test = ds_test.cache()\n    ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)\n\n    model = tf.keras.models.Sequential([\n        tf.keras.layers.Flatten(input_shape=(28, 28)),\n        tf.keras.layers.Dense(128, activation='relu'),\n        tf.keras.layers.Dense(10)\n    ])\n\n    model.compile(\n        optimizer=tf.keras.optimizers.Adam(0.001),\n        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n        metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],\n    )\n\n    model.fit(\n        ds_train,\n        epochs=6,\n        verbose=0,\n        validation_data=ds_test,\n    )\n\n    return model.summary()\n\nmodel = train()\nprint(model())\n\n```\n\n### Docker Example\nIn this example we are running a process that spawns the decorated function inside a docker container and waits for the result.\nWe compose this using the `@process` and `@docker` decorators to achieve the design. The function `reduce_sum` is run *inside* the docker container using image `tensorflow/tensorflow:latest-gpu` and the result is returned seamlessly to the workflow.\n```python\nfrom entangle.docker import docker\nfrom entangle.process import process\n\nimport logging\nlogging.basicConfig(\n    format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)\n\n@process\n@docker(image=\"tensorflow/tensorflow:latest-gpu\")\ndef reduce_sum():\n    import tensorflow as tf\n    return tf.reduce_sum(tf.random.normal([1000, 1000]))\n\nrs = reduce_sum()\nprint(rs())\n\n```\n\n#### Tensorflow GPU Container\n\nThe above example launches a GPU enabled docker on the `nvidia docker` platform (running on your local machine). Tensorflow by default will consume the entire GPU for its processing, however if you want to run parallel GPU jobs that only consume GPU memory *as needed*, then you need to use:\n\n`@docker(image=\"tensorflow/tensorflow:latest-gpu\", consume_gpu=False)`\n\n---\n\n![docker](./images/docker.png)\n\n### Dataflow Examples\n\nThe example below demonstrates the dataflow capability of Entangle. This is a different compute paradigm from workflows. Please read the section on [Dataflows vs Workflows](#dataflows-vs-workflows) for complete explanation of the difference.\n\n\u003e NOTE: We use threads as our execution unit in this example as it makes seeing the output possible. With `@process` you won't see the aggregate output on your console, instead it will be logged to `entangle.log` file.\n\u003e With Entangle you decide whether to use concurrency (threads) or parallelism (processes). Entangle is itself, threadless.\n```python\nimport threading\nimport time\nfrom entangle.logging.debug import logging\nfrom entangle.dataflow import thread\nfrom entangle.dataflow import process\nfrom entangle.dataflow import dataflow\n\ndef triggered(func, result):\n    print(\"triggered: {} {}\".format(func.__name__, result))\n\n@dataflow(callback=triggered)\n@thread\ndef printx(x):\n    print('printx: {}'.format(threading.current_thread().name))\n    return(\"X: {}\".format(x))\n\n@dataflow(callback=triggered)\n@thread\ndef printy(y):\n    print('printy: {}'.format(threading.current_thread().name))\n    return(\"Y: {}\".format(y))\n\n@dataflow(callback=triggered)\n@thread\ndef printz(z):\n    print('printz: {}'.format(threading.current_thread().name))\n    return(\"Z: {}\".format(z))\n\n@dataflow(callback=triggered)\n@thread\ndef echo(e):\n    print('echo: {}'.format(threading.current_thread().name))\n    return \"Echo! {}\".format(e)\n\n@dataflow(executor='thread', callback=triggered, maxworkers=3)\ndef emit(a, **kwargs):\n    print('emit: {}'.format(threading.current_thread().name))\n    return a+\"!\"\n\n# Create the dataflow graph \nflow = emit(\n    printx(\n        printz(\n            echo()\n        )\n    ),\n    printy(\n        printz()\n    ),\n    printy()\n)\n\n# Invoke the dataflow graph with initial input\nflow('emit')\n\ntime.sleep(2)\n\n# Call flow again with different input value\nflow('HELLO')\n```\n#### Data-Driven Branching\n\nIt's useful to have a data flow that routes to different paths depending on input data.\nEntangle makes this relatively easy. The example below embeds a lambda expression directly in the dataflow structure that chooses either `printx()` or `printy()` as the next compute node depending on what the input value is - *after emit has generated the value*.\n\nIn the snippet below `emit` first produces a value based on some input, the result is emitted to either `printx()` or `printy()` depending on the value of the result.\nNote that this is computed during the execution of the DAG, not at declaration time.\n```python\nflow = emit(\n    lambda x: printx() if x == 'emit' else printy()\n)\n\nflow('emit')\n```\n\n![lambda](./images/lambda.png)\n\nFull example:\n```python\nimport threading\nimport time\nfrom entangle.dataflow import thread\nfrom entangle.dataflow import dataflow\n\ndef triggered(func, result):\n    print(\"triggered: {} {}\".format(func.__name__, result))\n\n@dataflow(callback=triggered)\n@thread\ndef printx(x):\n    print('printx: {}'.format(threading.current_thread().name))\n    return(\"X: {}\".format(x))\n\n@dataflow(callback=triggered)\n@thread\ndef printy(y):\n    print('printy: {}'.format(threading.current_thread().name))\n    return(\"Y: {}\".format(y))\n\n@dataflow(executor='thread', callback=triggered, maxworkers=3)\ndef emit(a, **kwargs):\n    print('emit: {}'.format(threading.current_thread().name))\n    return a+\"!\"\n\n# Create the dataflow graph \n# Defer whether we will forward to printx() or printy() depending\n# on the result receive from emit. This won't be known until the data is ready.\nflow = emit(\n    lambda x: printx() if x == 'emit' else printy()\n)\n\n# Invoke the dataflow graph with initial input\nflow('emit')\n\ntime.sleep(2)\n\n# Call flow again with different input value\nflow('HELLO')\n```\n\nWhich outputs:\n\n```text\n   printx: emit MainThread\ntriggered: inner X: emit\nprintz: ThreadPoolExecutor-3_0\ntriggered: inner Z: X: emit\n----------------------------\nprinty: MainThread\ntriggered: inner Y: HELLO\n```\n### Distributed Dataflow\n\nIn the example below, we combine `@dataflow` with `@ssh` to get instant distributed dataflow!\n\n```python\nimport threading\nimport time\n\nfrom entangle.logging.debug import logging\nfrom entangle.ssh import ssh\nfrom entangle.process import process\nfrom entangle.dataflow import dataflow\n\ndef triggered(func, result):\n    print(\"triggered: {} {}\".format(func.__name__, result))\n\n@dataflow(callback=triggered)\n@ssh(user='darren', host='miko', key='/home/darren/.ssh/id_rsa.pub', python='/home/darren/venv/bin/python')\n@process\ndef printz(z):\n    print('printz: {}'.format(threading.current_thread().name))\n    with open('/tmp/printz.out', 'w') as pr:\n        pr.write(\"Z: {}\".format(z))\n    return \"Z: {}\".format(z)\n\n@dataflow(callback=triggered)\n@ssh(user='darren', host='radiant', key='/home/darren/.ssh/id_rsa.pub', python='/home/darren/venv/bin/python')\n@process\ndef printx(x):\n    print('printx: {}'.format(threading.current_thread().name))\n    with open('/tmp/printx.out', 'w') as pr:\n        pr.write(\"X: {}\".format(x))\n    return \"X: {}\".format(x)\n\n@dataflow(callback=triggered)\n@process\ndef printy(y):\n    print('printy: {}'.format(threading.current_thread().name))\n    return \"Y: {}\".format(y)\n\n@dataflow(callback=triggered)\n@ssh(user='darren', host='radiant', key='/home/darren/.ssh/id_rsa.pub', python='/home/darren/venv/bin/python')\n@process\ndef echo(e):\n    print('echo: {}'.format(threading.current_thread().name))\n    with open('/tmp/echo.out', 'w') as pr:\n        pr.write(\"Echo! {}\".format(e))\n    return \"Echo! {}\".format(e)\n\n@dataflow(callback=triggered, maxworkers=3)\ndef emit(value):\n    print('emit: {}'.format(threading.current_thread().name))\n    return value+\"!\"\n\nif __name__ == '__main__':\n    results = []\n\n    # Create the dataflow graph\n    flow = emit(\n        printx(\n            printz(\n                echo()\n            )\n        ),\n        printy(\n            printz()\n        ),\n        printy()\n    )\n\n    result = flow('emit')\n```\n### Scheduler Example\n\n```python\nfrom entangle.logging.debug import logging\nfrom entangle.process import process\nfrom entangle.http import request\nfrom entangle.workflow import workflow\nfrom entangle.scheduler import scheduler\n\nscheduler_config = {'cpus': 2,\n                    'impl': 'entangle.scheduler.DefaultScheduler'}\n\n@scheduler(**scheduler_config)\n@process\ndef two():\n    return 2\n\n@scheduler(**scheduler_config)\n@process\ndef three():\n    return 3\n\n@scheduler(**scheduler_config)\n@process\ndef add(a, b):\n    print(\"add: {} {}\".format(a,b))\n    v = int(a) + int(b)\n    print(\"ADD: *\"+str(v)+\"*\")\n    return v\n\n@scheduler(**scheduler_config)\n@workflow\ndef workflow2():\n    return add(\n        three(),\n        two()\n    )\n\nresult = workflow2()\n\nprint(result())\n```\n\n### Graph Example\n\nThe follow example code shows how we can collect the call graph trace for our workflow and display it.\n\n```python\nimport json\nimport time\nimport asyncio\nfrom entangle.logging.debug import logging\nfrom entangle.process import process\n\n@process\ndef one():\n    return 1\n\n@process\ndef two():\n    return 2\n\n@process\ndef five():\n    return 5\n\n@process\ndef num(n):\n    return n\n\n@process\ndef add(a, b):\n    v = int(a) + int(b)\n    print(\"ADD: *\"+str(v)+\"*\")\n    return v\n\n@process\ndef subtract(a, b):\n    return int(a) - int(b)\n\nif __name__ == '__main__':\n\n    workflow = add(\n        add(\n            num(6),\n            two() if False else one()\n        ),\n        subtract(\n            five(),\n            add(\n                subtract(\n                    num(8),\n                    two()\n                ),\n                one()\n            )\n        )\n    )\n    result = workflow()\n    print(result)\n\n    graph = workflow.graph(wait=True)\n    print(\"GRAPH:\",json.dumps(graph, indent=4))\n```\n\nWhich outputs this graph structure\n\n```python\nGRAPH: {\n    \"add\": [\n        {\n            \"add\": {\n                \"num\": {\n                    \"6\": []\n                },\n                \"one\": []\n            }\n        },\n        {\n            \"subtract\": {\n                \"five\": [],\n                \"add\": {\n                    \"subtract\": {\n                        \"num\": {\n                            \"8\": []\n                        },\n                        \"two\": []\n                    },\n                    \"one\": []\n                }\n            }\n        }\n    ]\n}\n```\n\nWe can also use futures to wait for the graph data to arrive as a callback.\n\n```python\nfuture = workflow.graph(wait=False)\n\ndef show_graph(graph):\n    print(\"GRAPH:\", graph.result())\n\nfuture.add_done_callback(show_graph)\n\nfuture.entangle()\n```\n\n### Workflow Future Example\n\nEntangle allows you to use future results for workflows if the blocking method doesn't meet your use case.\nTo do this, we alter the invocation of the workflow slightly.\n\n```python\ndef callback(result):\n    print(\"CALLBACK:\", result.result())\n\n# set up future callbacks\nfuture = workflow.future(callback=callback)\nprint('Future:', future)\n\n# Trigger workflow. Does not block\nworkflow(proc=True)\n\n# Notify results when available\nfuture.entangle() # Does not block\n```\n### Retry Example\n\nTo specify how many times a function should be retried before throwing an exception with a sleep value in seconds between retries.\n\n```python\n@process(retry=5. sleep=1)\ndef five():\n    import time\n    val = int(str(time.time()).split('.')[1]) % 5\n    if val != 0:\n        raise Exception(\"Not a FIVE!\")\n    return 5\n```\n\n### General Example\nAn example of how entangle will be used (still in development)\n```python\n\nfrom entangle.process import process\nfrom entangle.thread import thread\nfrom entangle.task import task\nfrom entangle.local import local\nfrom entangle.aws import ec2\nfrom entangle.aws import lmbda\nfrom entangle.http import request\n\n@process(timeout=60)\n@local(cpus=4)\ndef add(a, b):\n    return a + b\n\n@process(cache=True)\n@aws(keys=[])\n@ec2\ndef one():\n    return 1\n\n@thread\n@local\ndef two():\n    return 2\n\n@lmbda(function='name')\n@aws(keys=[])\ndef proxy():\n    # lambda proxy\n    pass\n\n@process\ndef subtract(a, b):\n    return int(a) - int(b)\n\n@process\ndef five():\n    return 5\n\n@process\ndef num(n):\n    return n\n\n@process\n@request(url='http://..../', method='POST')\ndef request(data):\n    # Manipulate http response data here and return new result\n    return data\n\n# 1,2,3 get passed to lambda function and result returned\nresult = proxy(1,2,3)\n# Pass key:value params and get result from your function\nresult = request(key1=value, key2=value )\n\n# parallel workflow is just \"plain old python\"\nresult = add(\n            add(\n                num(6),\n                two() if False else one()\n            ),\n            subtract(\n                five(),\n                two()\n            )\n        )\n\nprint(result())\n\n```\n## Logging\n\nLogging in Entangle is intended to be convenient and provide some useful out-of-the-box defaults that \"just work\".\n\nThere are 3 default loggers you can import.\n\n```python\nfrom entangle.logging.info import logging\nfrom entangle.logging.debug import logging\nfrom entangle.logging.file import logging\n```\nAnd the details of each are:\n\n*info*\n```python\nimport logging\n\nlogging.basicConfig(\n    format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)\n```\n\n*debug*\n```python\nimport logging\n\nlogging.basicConfig(\n    format='%(asctime)s : %(levelname)s : %(message)s', level=logging.DEBUG)\n```\n\n*file*\n```python\nimport logging\n\nlogging.basicConfig(filename='entangle.log',\n                    format='%(asctime)s : %(levelname)s : %(message)s', level=logging.DEBUG)\n```\n\nYou can of course provide your own logging configuration, but be sure to include it at the top of your file so the various entangle modules pick it up.\n## Design Tool\n\nA prototype visual design tool for Entangle is shown below. More details will be posted on thye wiki [here](https://github.com/radiantone/entangle/wiki/Design-Tool). \n\n\n![ui](./images/ui2.png)\n","funding_links":["https://github.com/sponsors/radiantone"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fradiantone%2Fentangle","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fradiantone%2Fentangle","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fradiantone%2Fentangle/lists"}