{"id":21020952,"url":"https://github.com/awto/kafka-workflow","last_synced_at":"2025-05-15T08:31:55.840Z","repository":{"id":148888947,"uuid":"453429807","full_name":"awto/kafka-workflow","owner":"awto","description":"Simple Workflow As Code on Kafka","archived":false,"fork":false,"pushed_at":"2024-12-06T21:21:44.000Z","size":151,"stargazers_count":9,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-03T05:51:13.511Z","etag":null,"topics":["business-process","business-process-automation","business-process-management","distributed-systems","kafka","kafka-streams","microservice-framework","microservice-orchestration","microservices-architecture","workflow","workflow-as-code","workflow-automation","workflow-management","workflow-management-system"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/awto.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-29T15:02:16.000Z","updated_at":"2024-12-06T21:21:48.000Z","dependencies_parsed_at":null,"dependency_job_id":"d9731ec1-c802-4a3c-a1f4-179beca1e3f4","html_url":"https://github.com/awto/kafka-workflow","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awto%2Fkafka-workflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awto%2Fkafka-workflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awto%2Fkafka-workflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awto%2Fkafka-workflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/awto","download_url":"https://codeload.github.com/awto/kafka-workflow/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254304639,"owners_count":22048445,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["business-process","business-process-automation","business-process-management","distributed-systems","kafka","kafka-streams","microservice-framework","microservice-orchestration","microservices-architecture","workflow","workflow-as-code","workflow-automation","workflow-management","workflow-management-system"],"created_at":"2024-11-19T10:44:07.678Z","updated_at":"2025-05-15T08:31:55.783Z","avatar_url":"https://github.com/awto.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Workflow-as-code on Kafka\n\n[![CI](https://github.com/awto/kafka-workflow/actions/workflows/main.yml/badge.svg)](https://github.com/awto/kafka-workflow/actions/workflows/main.yml)\n\nThere is an alternative JVM version - [javactrl-kafka](https://github.com/javactrl/javactrl-kafka).\n\nThe project is a minimalistic but feature-complete workflow-as-code approach implementation.\n\nDefine workflows as usual JavaScript/TypeScript async functions except `await` expressions there may await for events much longer (hours, days, months, etc.). \n\nWorkflow scripts run on [Kafka Streams](https://kafka.apache.org/documentation/streams/) clusters. The scripts store their state in their stream's local state and listen to events on a dedicated continuation topic (`\"workflow-resume\"`).\n\nKafka takes all the burden of making such workflows highly scalable, elastic, fault-tolerant, distributed, and much more. In addition, workflows are simple, easy to read, easy to write, easy to maintain, and easy to integrate with other components of Kafka-based infrastructure.\n\nTypical use cases include:\n\n  * Business Process Automation\n  * Microservices Orchestration\n  * Distributed Transactions\n  * Infrastructure Provisioning\n  * Monitoring and Polling\n  * Data Pipelines\n\nThe workflow code in JavaScript looks like this:\n\n```javascript\nexport async function main() {\n  const compensations = [];\n  try {\n    const car = await reserveCar();\n    compensations.push(cancelCar.bind(undefined, car));\n    const hotel = await reserveHotel();\n    compensations.push(cancelHotel.bind(undefined, hotel));\n    const flight = await reserveFlight();\n    compensations.push(cancelFlight.bind(undefined, flight));\n  } catch (e) {\n    await Promise.all(compensations.map(i =\u003e i()));\n  }\n}\n```\n\nThere are workflow examples in [src/main/js/packages](src/main/js/packages)  - [workflow-ecommerce](src/main/js/packages/) \n\nKafka Workflow tool compiles JavaScript workflow definitions using [effectful.js](https://github.com/awto/effectfuljs) transpiler into another a lower level JavaScript. It is an own implementation of async functions. Moreover, the whole script state can serialize and deserialize the entire state using [@effectfuljs/serialization](https://github.com/awto/effectfuljs/tree/main/packages/serialization) library.\n\nKafka Streams processor uses GraalVM JS engine to run the low-level JavaScript file. The engine has a high node.js compatibility level, so most npm packages are available in the scripts.\n\n## Usage\n\nThis project is in a proof of concept stage at the moment. However, since it is tiny, it isn't that hard to reproduce the same but considering the specific needs of your projects, use this one as a template or entirely from scratch.\n\nBuilding it currently requires JDK 17. It is mainly for code readability and can be easily changed to any earlier JDK version (supported by Kafka Streams).\n\nTo run a workflow, execute a class `org.js.effectful.kafka.workflow.Engine`.  It expects \"workflow-resume\" and other topics required by workflow to be already available. The first argument is a path to a built `.js` file. The optional second argument is a property file passed to Kafka Streams. \n\nExamples also use Scheduler stream for running delayed jobs. It is just a simple demo class, and it doesn't fit for production usage. In production, you'd better use something based on a third-party scheduler, such as Quartz, some cloud service, cron, and maybe your message broker already has some scheduling. To run this demo scheduler execute `org.js.effectful.kafka.workflow.Scheduler` class.\n\n## How to write workflow scripts\n\nWorkflow script is a TypeScript/JavaScript file exporting async \"main\" function. For example [workflow-ecommerce/src/index.ts](src/main/js/packages/workflow-ecommerce/src/index.ts) and [workflow-trip-booking-saga/src/index.ts](src/main/js/packages/workflow-trip-booking-saga/src/index.ts).\n\nCreate a plain node package and add \"@effectful/Kafka-workflow\" dependency (along with [a few other 3rd party dependencies](https://github.com/awto/kafka-workflow/blob/main/src/main/js/packages/workflow-ecommerce/package.json). Transpile the script into a single independent JavaScript file using webpack. There is a TypeScript project helper in \"@effectful/kafka-workflow/webpack-config-ts\". It takes two arguments - an index file and an output directory. There is an example in [workflow-ecommerce/webpack.config.js](https://github.com/awto/kafka-workflow/blob/main/src/main/js/packages/workflow-ecommerce/webpack.config.js)). \n\nImport the runtime library:\n\n```javascript\nimport * as W from \"@effectful/kafka-workflow\" \n```\n\nThere is a dedicated `\"workflow-resume\"` topic to pass events to the workflow program. \n\nTo start a new workflow, send a record into `workflow-resume` topic with a value is a string beginning with `\"new:\"` prefix, and the rest is a JSON passed to `main` function as its argument. The key is a unique thread identifier (string). The same key identifies the thread in the next records (but its value shouldn't start with `\"new:\"` there). \n\nTo output a record into a topic, use `W.output` function. Its first parameter is a string to put in the record's value, the second parameter is a topic name, and the third is an optional key, which is a current thread identifier by default. To use a topic in `W.output` add its name into `W.config.outputTopics` set.\n\nThere is a shortcut `W.outputJSON`, it wraps its first argument with `JSON.stringify`.\n\nThe most important function here is `W.suspend`. It returns a `W.Suspend` object which we can pass as an argument of `await` expression to suspend the whole program execution and save its state into local storage. `W.Suspend` is Thenable, but better not to use it with `then` in the current version - this can generate a not serializable state.\n\nThe suspended program will be resumed when `\"workflow-resume\"` gets a record with the current thread as its key and a JSON as a value with \"ref\" field equal to `W.Suspend` object's id on which the code is currently suspended (in `await` expression). \n\nIf the JSON has \"error\" field, it will be a raised exception in `await` exception. Otherwise, the JSON's \"value\" field is a result of the `await` expression.\n\nThe program can suspend in many points simultaneously. And, like usual JavaScript, it can use `Promise.all`/`Promise.race`. \n\nSo if we have a code like this:\n\n```javascript\nconst car = await reserveCar();\nconst hotel = await reserveHotel();\n```\n\nand we want to start the reservation of a hotel immediately without waiting for a car we can change the code to:\n\n```javascript\nconst [car, hotel] = await Promise.all([reserveCar(), reserveHotel()]);\n```\n\nNote `Promise.all`/`Promise.race` functions don't return a promise here. Instead, they are monkey-patched versions that support suspensions. \n\n## Cancelation\n\nCalling of `W.cancel(asyncValue)` cancels the `asyncValue` execution. The current bottom `await` expression, which blocks the async value from being settled, will throw an exception with class `W.CancelToken`. It obviously won't cancel any running external job, but you can write a `try-catch` to properly cancel it (if possible).\n\nFor example:\n\n```javascript\nasync function timeout(ms: number) {\n  const resume = W.suspend();\n  W.output(`${ms}`, \"workflow-scheduler\", `${W.threadId}|{\"ref\":\"${resume.id}\"}`);\n  try {\n    await resume;\n  } catch (e: any) {\n    if (e instanceof W.CancelToken)\n      W.output(null, \"workflow-scheduler\", `${W.threadId}|{\"ref\":\"${resume.id}\"}`);\n    throw e;\n  }\n  return { type: \"timeout\" };\n}\n```\n\nHere, we write null to a topic with the same key to cancel a previously scheduled job.\n\n`Promise.all`/`Promise.race` also adapted to benefit from cancelation. Namely, if any argument of `Promise.all` is rejected, the implementation cancels the other not yet settled values. For `Promise.race` after anything is settled, the others are canceled.\n\nCancelation is essential to avoid some concurrency bugs. \n\nSay, in this example:\n\n```javascript\ntry {\n   await Promise.race([\n     (async () =\u003e {\n        await timeout(100);\n        throw new Error(\"timeout\");\n       })(),\n     (async () =\u003e {\n        const hotel = await reserveHotel();\n        compensations.push(async () =\u003e { await cancelHotel(hotel); });\n      })()]);\ncatch(e) {\n   await Promise.all(compensations.map(i =\u003e i()));\n}\n```\n\nSuppose the timeout arrives before `reserveHotel` returns the value. In that case, it won't be canceled because the `catch` there will be executed before `compensations.push` is run, so it will be an empty array.\n\nThere are `W.all`/`W.any` functions which are versions of `Promise.all`/`Promise.any` but without cancelation (they still propagate cancelation signals to arguments, though).\n\n## Debugging\n\nCurrently, there is no special debugger for workflow scripts, but probably there'll be some soon. However, the workflow script is a usual async TS/JS program. So before transpiling it into a workflow definition, we can debug it as usual TS/JS program with any debugger of choice (I would recommend my productivity-boosting [effectful debugger](https://marketplace.visualstudio.com/items?itemName=effectful.debugger) for this - it has time traveling, data breakpoints, and more).\n\nIf `EFFECTFUL_KAFKA_WORKFLOW_MOCK` environment variable isn't empty, the import will load the library with mocks for API functions for testing and debugging.\n\n## Running multiple workflows\n\nOnly one index file is possible in the current version, which means we can run only one workflow. However, we can integrate a workflow dispatcher in this master index file. It will use some argument of \"new:\" messages to run any workflow. There are many better not implemented (but simple to implement) options.\n\n---\n\n## Possible extensions\n\nThe project's current goal is to provide a simple example for workflow definitions. However, many possible (easy to implement) extensions make workflows even simpler and more reliable.\n\n### TODO: Better serialization\n\nThe whole script state is stored into a schema-less JSON using [@effectful/serialization](https://github.com/awto/effectfuljs/tree/main/packages/serialization) library. It is worth adding support for typed binary serialization, especially for data processing workflows.\n\nIn the current version, many values are still not serializable. Functions must be registered in a serialization library if we want function references to be serializable. This is, however, an easy-to-solve limitation because it is a transpiler. Effectful debugger already has all the functions with captured variables serializable by default (including many runtime objects, not serializable here, such as Promise).\n\nNot serializable values must be registered with `S.regOpaqueObject` (`S.regConstructor` for classes constructor) and `bind` method usage instead of closures for functions. Here `S` is an import of `@effectful/serialization`. You can also use any third-party serialization library instead. \n\n### TODO: Implicit parallelism\n\nEffectfulJS has experimental support of implicit parallelism, but its implementation is highly experimental and doesn't yet support state serialization. Nevertheless, it can significantly clean up the resulting code. However, for efficient usage, it needs a debugger.\n\n### TODO: Debugger\n\nEffectful JS is used for debugger's implementation already, namely [Effectful Debugger](https://marketplace.visualstudio.com/items?itemName=effectful.debugger). Its primary but not yet finished goal is to add debugging features to effectful programs. However, it already works for plain JS/TS and has a few extra productivity-boosting features such as time-traveling, persistent state, data breakpoints.\n\nFor Kafka, we'll have time traveling for free. Since topics (if compaction is disabled) will keep the whole history.\n\n### TODO: Conflicts resolutions\n\nAsync computations are notorious for introducing non-determinism in JavaScript and concurrency-related problems. Say, for example, we have code like this:\n\n```javascript\nconst car = await availableCar();\nif (account.balance \u003e= car.cost) {\n  await reserve(car);\n  account.balance-=car.cost; \n}\n```\n\nHere we have a common concurrency bug. The balance there may become negative if something else reduced it when this thread rested in `await reserve`. It is not a problem specific to long-running workflows scripts, it would work the same way with usual JavaScript async functions. \n\nHowever, we can leverage Kafka again to fix this in workflow scripts and make them deterministic. We can use a technique similar to Software Transactional Memory, which is based on the same log concept. It is simple to record changes in script objects and local variables and detect conflicts. If there is any, we roll back the whole transaction and start it from scratch. We can also handle side effects here by simulating some exceptions and the output places, thus forcing them to be canceled.\n\nWe can use the feature to define even more tidy workflows.\n\n```javascript\nconst car = await availableCar();\nif (account.balance \u003e= car.cost) {\n  await reserve(car);\n  account.balance -= car.cost; \n} else { \n  await W.retry; \n}\n```\n\nIf the balance amount isn't enough, we retry the whole transaction, hoping it will have more money next. But it won't replay it immediately, only when there is a change to any variable read in this thread.\n\n### TODO: Always running workflows\n\nInstead of long running workflow scripts we can have always running scripts. In this case, when the workflow starts, it never ends so that we can write code like this:\n\n```javascript\nlet paid = 0\nfor(const i of subscriptions) {\n  for (cost j of i.payments) {\n    paid += j\n  }\n}\n```\n\nThe script runs like a usual JS. First we load subscriptions and payments from some DB. If it is plain JS we need to re-execute the whole script to keep the \"paid\" variable up to date. But we can also derive which part of the execution trace to recalculate. And in this case, this can be just a single iteration.\n\nIt is, however, a big task, with quite a few complex things to solve, e.g., how to update a script to a new version (we don't want to recalculate the whole program there too, only some affected parts). Moreover, the Kafka log doesn't fit here too. For example, we need a higher-level logarithmic time access tree instead of a constant time access sequence. However, we can probably implement this kind of data structure on the Kafka log.\n\n### TODO: Other runners\n\nThis approach doesn't require Kafka and will work on any streams processor. It only needs a join capability of a stream with a state. The system will inherit all the reliability and scalability from the runner.  \n\nSome RDBMS may be a runner too, since the concept of tables is dual to streams. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fawto%2Fkafka-workflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fawto%2Fkafka-workflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fawto%2Fkafka-workflow/lists"}