{"id":18686719,"url":"https://github.com/wayfair-incubator/dagger","last_synced_at":"2025-09-25T23:29:37.569Z","repository":{"id":61718646,"uuid":"545717935","full_name":"wayfair-incubator/dagger","owner":"wayfair-incubator","description":"A simple Python-based distributed workflow engine","archived":false,"fork":false,"pushed_at":"2024-10-30T01:55:14.000Z","size":1320,"stargazers_count":56,"open_issues_count":3,"forks_count":5,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-10-30T04:01:40.680Z","etag":null,"topics":["aerospike","asyncio","faust-streaming","kafka","orchestrator","python","rocksdb","scalable","state-machine","workflow-engine"],"latest_commit_sha":null,"homepage":"https://wayfair-incubator.github.io/dagger/latest","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wayfair-incubator.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-04T21:31:18.000Z","updated_at":"2024-09-27T10:26:16.000Z","dependencies_parsed_at":"2023-10-02T16:53:47.210Z","dependency_job_id":"522c0895-f25e-4f88-9ca9-de9bf2117cf2","html_url":"https://github.com/wayfair-incubator/dagger","commit_stats":{"total_commits":122,"total_committers":6,"mean_commits":"20.333333333333332","dds":"0.21311475409836067","last_synced_commit":"bc0d4860ad077107c803880ec44fdb38f4205adf"},"previous_names":[],"tags_count":17,"template":false,"template_full_name":"wayfair-incubator/oss-template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wayfair-incubator%2Fdagger","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wayfair-incubator%2Fdagger/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wayfair-incubator%2Fdagger/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wayfair-incubator%2Fdagger/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wayfair-incubator","download_url":"https://codeload.github.com/wayfair-incubator/dagger/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247464199,"owners_count":20942969,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aerospike","asyncio","faust-streaming","kafka","orchestrator","python","rocksdb","scalable","state-machine","workflow-engine"],"created_at":"2024-11-07T10:28:45.609Z","updated_at":"2025-09-25T23:29:32.509Z","avatar_url":"https://github.com/wayfair-incubator.png","language":"Python","funding_links":[],"categories":["Library (embedded usage)"],"sub_categories":[],"readme":"![CI pipeline status](https://github.com/wayfair-incubator/dagger/workflows/CI/badge.svg?branch=main)\n[![PyPI](https://img.shields.io/pypi/v/wf-dagger)](https://pypi.org/project/wf-dagger/)\n![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg)\n![Maintainer](https://img.shields.io/badge/Maintainer-Wayfair-7F187F)\n![codecov](https://codecov.io/gh/wayfair-incubator/dagger/branch/main/graph/badge.svg)\n![Checked with mypy](https://img.shields.io/badge/mypy-checked-blue)\n![Code style: black](https://img.shields.io/badge/code%20style-black-black.svg)\n\n# Dagger\n\n**Dagger** is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous and\nsynchronous long-running business logic in a scalable and resilient way.\nDagger requires Python 3.7 or later for the new `async/await`_ syntax, and variable type annotations.\n\n## Pizza Ordering and Delivery Workflow Example\nHere's an example of how to use the library to build and run a Pizza Ordering Workflow:\n\n![Pizza Workflow](docs/images/workflow.jpg)\n\nThe PizzaWorkflow consists of 2 Processes: \n\n* Order : Responsible for communicating with the order service to place a pizza order(CommandTask) and wait for the order\nto be ready(ListenerTask)\n* Delivery: Once the order is ready, this process communicates with the delivery service to start delivery of the pizza\norder(CommandTask)\n\n### Step 1 Instantiate Dagger\n\n```python\nworkflow_engine = Dagger(\n                    broker=KAFKA_ADMIN_CLIENT_URL,\n                    store=\"aerospike://\",\n                    consumer_auto_offset_reset=\"latest\",\n                    task_update_topic=\"task_update_topic\",\n                    trigger_interval=600,\n                    aerospike_config=aerospike_config,\n                    enable_changelog=False,\n                    web_port=6066,\n                    serializer=\"raw\",\n                )\n```\n\n### Step 2 Define Leaf Command and Listener Tasks For Order Process\n\nLet's assume that the Order Microservice processes incoming orders over a Kafka Topic - `pizza_order_topic` with the \nJSON schema\n\n```json\n{\n  \"order_id\": \"id\",\n  \"pizza_type\": \"1\",\n  \"customer_id\": \"customer_id\"\n}\n```\n\nUsing this information let's build out the OrderCommandTask by overriding the `execute` method that implements the \nbusiness logic on how to send the payload to the Order Service over a Kafka topic\n\n```python\nclass OrderCommandTask(KafkaCommandTask[str, str]):\n    async def execute(\n        self,\n        runtime_parameters: Dict[str, str],\n        workflow_instance: ITemplateDAGInstance,\n    ) -\u003e None:\n        payload = {\n            \"order_id\": runtime_parameters[\"order_id\"],\n            \"customer_id\": runtime_parameters[\"customer_id\"],\n            \"pizza_type\": runtime_parameters[\"pizza_type\"],\n        }\n        await workflow_engine.topics[self.topic].send(\n            value=json.dumps(payload)\n        )\n```\n\nAfter executing the `OrderCommandTask`, the workflow should enter a `WAIT_STATE` until it receives a message from the \nOrderService about the status of the order. Let's assume that Order Service sends a message on a Kafka Topic: \norder_status_topic when the order is ready in the following JSON format\n\n```json\n{\n  \"order_id\": \"id\",\n  \"status\": \"READY\"\n}\n```\n\nLet's model the `OrderListenerTask` to process this message on the `order_status_topic` by implementing the \n`get_correlatable_keys_from_payload` and `on_message` methods on the Listener. It also needs to specify the `correletable_key`\nas `order_id` to look up the payload\n\n```python\nclass PizzaWaitForReadyListener(KafkaListenerTask[str, str]):\n    correlatable_key = \"order_id\"\n\n    async def get_correlatable_keys_from_payload(\n        self, payload: Any\n    ) -\u003e List[TaskLookupKey]:\n        tpayload = json.loads(payload)\n        key = tpayload[self.correlatable_key]\n        return [(self.correlatable_key, key)]\n\n    async def on_message(\n            self, runtime_parameters: Dict[str, VT], *args: Any, **kwargs: Any\n    ) -\u003e bool :\n        logger.info(f\"Pizza Order is Ready\")\n        return True\n```\n\nWhen the order service sends a status message on the `order_status_topic`, Dagger invokes the `get_correlatable_keys_from_payload`\nto determine which workflow instance that message belongs to. Once it determines the workflow instance, it invokes \n`on_message` on the corresponding ListenerTask\n\nNow that we have the LEAF tasks modeled, lets attach them to the parent `Order` Process\n\n```python\ndef pizza_ordering_process(\n    process_name: str = \"Order\"\n) -\u003e IProcessTemplateDAGBuilder:\n    dag_builder = DAGBuilderHelper(dagger_app=workflow_engine)\n    root_task = dag_builder.build_and_link_tasks(\n        [\n           dag_builder.generic_command_task_builder(\n                topic=\"pizza_order_topic\",\n                task_type=OrderCommandTask,\n                process_name=process_name,\n            ),\n            dag_builder.generic_listener_task_builder(\n                topic=\"PizzaWaitForReadyListener\",\n                task_type=PizzaWaitForReadyListener,\n                process_name=process_name,\n            ),\n        ]\n    )\n    return dag_builder.generic_process_builder(process_name=process_name, root_task=root_task)\n```\n\nThe Order Process is in `COMPLETED` when both the CommandTask and the PizzaWaitForReadyListener are `COMPLETED` and then\nthe workflow transitions to execute the next Process `Delivery`\n\n### Step 3 Define Leaf Command Tasks For Delivery Process\n\nLet's assume that the delivery service just requires an HTTP POST request with the following schema\n\n```json\n{\n  \"order_id\": \"id\",\n  \"customer_id\": \"customer_id\"\n}\n```\n\nWe can model the DeliveryCommandTask to POST this payload by implementing the `execute` method as follows\n\n```python\nclass DeliveryCommandTask(ExecutorTask[str, str]):\n    async def execute(\n        self, runtime_parameters: Dict[str, VT], workflow_instance: ITask = None\n    ) -\u003e None:\n        payload = {\n            \"order_id\": runtime_parameters[\"order_id\"],\n            \"customer_id\": runtime_parameters[\"customer_id\"],\n        }\n        async with aiohttp.ClientSession() as session:\n            async with session.post(url=\"http://www.deliverysvc.com\", json=payload):\n                pass\n```\n\nLet's attach this to the parent `Delivery` Process\n\n```python\ndef pizza_delivery_process(\n    process_name: str = \"Delivery\",\n) -\u003e IProcessTemplateDAGBuilder:\n    dag_builder = DAGBuilderHelper(dagger_app=workflow_engine)\n    root_task = dag_builder.build_and_link_tasks(\n        [\n            dag_builder.generic_executor_task_builder(\n                task_type=DeliveryCommandTask,\n                name=process_name,\n            )\n        ]\n    )\n    return dag_builder.generic_process_builder(\n        process_name=process_name, root_task=root_task\n    )\n```\n\n### Step 4 Define the Sequence of Process Execution and register the workflow definition using `register_template`\n\nBased on the workflow, we want the `Order` Process to execute first before the `Delivery` Process. The workflow ensures\nthat the `Delivery` tasks are executed only after both the tasks in the `Order` process are in a terminal state\n\n```python\n@Dagger.register_template(\"PizzaWorkflow\")\ndef register_pizza_workflow(template_name: str) -\u003e ITemplateDAG:\n    dag_builder_helper = DAGBuilderHelper(workflow_engine)\n    order_process = dag_builder_helper.build_and_link_processes(\n        [\n            pizza_ordering_process(process_name=\"Order\"),\n            pizza_delivery_process(process_name=\"Delivery\"),\n        ]\n    )\n    return dag_builder_helper.generic_template(\n        template_name=template_name, root_process=order_process\n    )\n```\n\n### Step 5 Define an API to instantiate and execute pizza order workflows\n\n```python\nasync def create_and_submit_pizza_delivery_workflow(\n    order_id: str, customer_id: str, pizza_type: int\n):\n    pizza_workflow_template = workflow_engine.template_dags[\"PizzaWorkflow\"]\n    pizza_workflow_instance = await pizza_workflow_template.create_instance(\n        uuid.uuid1(),\n        repartition=False,  # Create this instance on the current worker\n        order_id=order_id,\n        customer_id=customer_id,\n        pizza_type=pizza_type,\n    )\n    await workflow_engine.submit(pizza_workflow_instance, repartition=False)\n```\n\n### Step 6 Start the worker\n\n```python\nworkflow_engine.main()\n```\n\n## Dagger is\n\n### Simple\n\nDagger is extremely easy to use. To get started applications need to install this library, define a DAG using the\ndefault templates or extending them based on the use case, creating instances of these DAG's and scheduling them for\nexecution. The library hides all the complexity of producing and consuming from Kafka, maintaining Kafka Streams\ntopology processing and also persistence and recovery of created tasks\n\n### Highly Available\n\nDagger is highly available and can survive network problems and server crashes. In the case of node failure, it can\nautomatically recover the state store(representing task data)\nor failover to a standby node\n\n### Distributed\n\nStart more instances of your application as needed to distribute the load on the system\n\n### Fast\n\nA single-core worker instance can already process tens of thousands of tasks every second. Dagger uses a fast key-value\nlookup store based on rocksDB replicated to kafka topics for fault tolerance\n\n## Installation\n\nYou can install dagger via the Wayfair artifactory or from source.\n\nTo install using `pip`:\n\n```shell\npip install wf-dagger\n```\n\ndagger has a dependency on `faust-streaming` for kafka stream processing\n\n## FAQ\n\n### Which version of python is supported?\n\ndagger supports python version \u003e= 3.7\n\n### What kafka versions are supported?\n\ndagger supports kafka with version \u003e= 0.10.\n\n## Roadmap\n\nSee the [open issues](https://github.com/wayfair-incubator/dagger/issues) for a list of proposed features (and known\nissues).\n\n## Contributing\n\nContributions are what make the open source community such an amazing place to learn, inspire, and create. Any\ncontributions you make are **greatly appreciated**. For detailed contributing guidelines, please\nsee [CONTRIBUTING.md](CONTRIBUTING.md)\n\n## License\n\nDistributed under the `MIT LICENSE` License. See `LICENSE` for more information.\n\n## Contact\n\nVikram Patki - vpatki@wayfair.com\n\nProject Link: [https://github.com/wayfair-incubator/dagger](https://github.com/wayfair-incubator/dagger)\n\n## Acknowledgements\n\nThis template was adapted from\n[https://github.com/othneildrew/Best-README-Template](https://github.com/othneildrew/Best-README-Template).\n\n## Documentation\n\nCheck out the [project documentation][dagger-docs].\n\nFor an overview on the repository structure and how to work with the code base, read the\n[Development Guide][development-docs].\n\n[dagger-docs]: https://wayfair-incubator.github.io/dagger/latest/\n[development-docs]: https://wayfair-incubator.github.io/dagger/latest/development-guide/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwayfair-incubator%2Fdagger","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwayfair-incubator%2Fdagger","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwayfair-incubator%2Fdagger/lists"}