{"id":20516368,"url":"https://github.com/chloro-pn/tunnel","last_synced_at":"2025-04-14T00:35:45.846Z","repository":{"id":176643725,"uuid":"657685200","full_name":"chloro-pn/tunnel","owner":"chloro-pn","description":"Tunnel is a Pipeline Execution Engine based on C++20 coroutine","archived":false,"fork":false,"pushed_at":"2023-08-17T07:47:44.000Z","size":463,"stargazers_count":29,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-27T14:51:51.164Z","etag":null,"topics":["asynchronous-programming","clickhouse","coroutines","cpp20-coroutine","flink","olap","parallel-computing","pipeline","taskflow","workflow"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chloro-pn.png","metadata":{"files":{"readme":"README.md","changelog":"ChangeLog.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-06-23T15:47:59.000Z","updated_at":"2025-03-11T01:48:22.000Z","dependencies_parsed_at":null,"dependency_job_id":"542568f3-49fc-4526-8341-4da860bf49ad","html_url":"https://github.com/chloro-pn/tunnel","commit_stats":{"total_commits":69,"total_committers":1,"mean_commits":69.0,"dds":0.0,"last_synced_commit":"32462310435f8ee876a632b516fceeb810a810be"},"previous_names":["chloro-pn/tunnel"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chloro-pn%2Ftunnel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chloro-pn%2Ftunnel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chloro-pn%2Ftunnel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chloro-pn%2Ftunnel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chloro-pn","download_url":"https://codeload.github.com/chloro-pn/tunnel/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248802501,"owners_count":21163856,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asynchronous-programming","clickhouse","coroutines","cpp20-coroutine","flink","olap","parallel-computing","pipeline","taskflow","workflow"],"created_at":"2024-11-15T21:28:32.257Z","updated_at":"2025-04-14T00:35:45.825Z","avatar_url":"https://github.com/chloro-pn.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"![tunnel icon](https://github.com/chloro-pn/draw_io_repo/blob/master/tunnel.svg)\n## Tunnel | [中文](./README_CN.md)\n\n![](https://tokei.rs/b1/github/chloro-pn/tunnel) ![](https://tokei.rs/b1/github/chloro-pn/tunnel?category=files) ![Static Badge](https://img.shields.io/badge/c%2B%2B-20-blue)\n\nTunnel is a cross platform, lightweight, and highly adaptable task execution framework based on `C++20 coroutine`. You can use it to build task execution engines with complex dependencies, or pipeline execution engines.The idea of this project comes from the execution engine of `ClickHouse`. \n\nThis project has the following features:\n\n* The user's processing logic does not need to focus on scheduling, synchronization, or mutual exclusion. You only need to design a reasonable DAG structure to achieve the ability of **multi-core parallel execution**;\n\n* Thanks to the powerful customization capabilities of c++20 coroutine, you can **easily integrate with other asynchronous systems or network io** (which means that `tunnel` can be easily expanded into a distributed task execution framework, which is also one of the long-term goals of this project);\n\n* Thanks to async_simple with good design and interface, you can **control which Executor each node in the Pipeline is scheduled on**, which is beneficial for resource isolation and management;\n\n* Supports **passing parameters between nodes**, although each `Pipeline` only supports one parameter type. If you need to pass different types of data between different nodes, please use `std::any` or `void *` and perform runtime conversion;\n\n## Compiler Requirement\n* This project is based on the c++20 standard.\n* This project uses `bazel` to build the system.\n* This project is based on `async_simple`, so first ensure that your compiler (`clang`, `gcc`, `Apple clang`) supports compiling `async_simple`.\n* This project supports `MacOS`, `Linux`, and `Windows `operating systems.\n\n## Dependencies\n* async_simple\n* googletest\n* chriskohlhoff/asio\n* rigtorp/MPMCQueue\n* gflags\n* spdlog\n\n## Design\nFirstly, you need to understand several basic concepts:\n\n* **`Processor`**: `Processor` is the basic unit for scheduling execution, and each `Processor` holds 0, 1, or more `input_port` and 0, 1, or more `output_port`. But it will not hold 0 `input_port` and 0 `output_port` at the same time.\n\n* **`port`**：`port` is a tool for transferring data between `Processor`, and some `ports` share the same queue. `port` and are divided into `input_port` and `output_port`, `input_port` reads data from the queue, and `output_port` writes data to the queue.\n\n* **`pipeline`**：a `pipeline` is composed of multiple `processors`. These `processors` are connected through queue and have the structure of a Directed Acyclic Graph. The `pipeline` can be sent to the `Executor` for scheduling and execution.\n\n* **`Executor`**：the `Executor` concept in `async_simple`.\n\nThe above are the four most basic concepts in this project, followed by some derived concepts:\n* `Source`：`Source` is a type of `Processor` that does not have an `input_port` and is the node that generates data.\n* `EmptySource`：`EmptySource` is a type of `Source` that only generates a EOF info.\n* `ChannelSource`：`ChannelSource`is a type of `Source` that read data from bind_channel.\n* `Sink`：`Sink` is a type of `Processor` that does not have an `output_port` and is a node that consumes data.\n* `DumpSimk`：`DumpSimk` is a type of `Sink` that reads and discards data.\n* `ChannelSink`：`ChannelSink`is a type of `Sink` that read data and write to bind_channel.\n* `TransForm`：`TransForm` is a type of `Processor` that exists only to provide a different `Processor` type from `Source` and `Sink`.\n* `SimpleTransForm`：`SimpleTransForm` is a type of `TransForm` that only has one `input_port` and one `output_port`, used to perform simple transformations. Most of the user's logic should be accomplished through inheritance of this class.\n* `NoOpTransform`：`NoOpTransform` is a type of `SimpleTransForm` that is only used for placeholders.\n* `Concat`：`Concat` is a type of `Processor` that has one or more `input_ports` and one `output_port`, and it can be used for sequential consumption.\n* `Dispatch`：`Dispatch` is a type of `Processor` that has one `input_port` and one or more `output_ports`, and it can be used for division.\n* `Filter`：`Filter` is a type of `TransForm` that can be used for filtering.\n* `Accumulate`：`Accumulate` is a type of `TransForm` that can be used for accumulation.\n* `Fork`：`Fork` is a type of `Processor` that has one `input_port` and one or more `output_port`, it can be used for replication.\n\n**NOTE**：This project does not have a `Merge` node, but implements the `Merge` function through other methods. The reason is that the `Merge` node requires multiple `input_ports`, but we cannot know which `input_port` currently has data coming, so we need to suspend waiting for a certain `input_port`, which is unreasonable. This project achieves this function by sharing multiple `port` queues, as detailed in the `Merge` interface of the `Pipeline`.\n\n---\nThe inheritance relationship of node types is as follows. Types marked in **red** indicate the need for inheritance implementation, while types marked in **blue** indicate that they can be directly used：\n\n---\n![node_type](https://github.com/chloro-pn/draw_io_repo/blob/master/nodes.drawio.svg)\n\n## Doc\n\n**hello world**\n\nhere is a Hello World program:\n```c++\n#include \u003cfunctional\u003e\n#include \u003ciostream\u003e\n#include \u003cstring\u003e\n\n#include \"async_simple/coro/SyncAwait.h\"\n#include \"async_simple/executors/SimpleExecutor.h\"\n#include \"tunnel/pipeline.h\"\n#include \"tunnel/sink.h\"\n#include \"tunnel/source.h\"\n\nusing namespace tunnel;\n\nclass MySink : public Sink\u003cstd::string\u003e {\n public:\n  virtual async_simple::coro::Lazy\u003cvoid\u003e consume(std::string \u0026\u0026value) override {\n    std::cout \u003c\u003c value \u003c\u003c std::endl;\n    co_return;\n  }\n};\n\nclass MySource : public Source\u003cstd::string\u003e {\n public:\n  virtual async_simple::coro::Lazy\u003cstd::optional\u003cstd::string\u003e\u003e generate() override {\n    if (eof == false) {\n      eof = true;\n      co_return std::string(\"hello world\");\n    }\n    co_return std::optional\u003cstd::string\u003e{};\n  }\n  bool eof = false;\n};\n\nint main() {\n  Pipeline\u003cstd::string\u003e pipe;\n  pipe.AddSource(std::make_unique\u003cMySource\u003e());\n  pipe.SetSink(std::make_unique\u003cMySink\u003e());\n  async_simple::executors::SimpleExecutor ex(2);\n  async_simple::coro::syncAwait(std::move(pipe).Run().via(\u0026ex));\n  return 0;\n}\n```\n\nAs you can see, users need to inherit some Processors to implement custom processing logic, then combine these Processors in a certain structure through Pipeline, and finally start executing the Pipeline.\n\nFor example, for the Source node, only the generate() method needs to be rewritten to generate data. Users need to ensure that an empty `std::optional\u003cT\u003e{}` representing EOF information is ultimately returned, otherwise the Pipeline will not stop executing; For Sink nodes, the consume() method needs to be rewritten to consume data.\n\nFor the use of more Processor types, users can read the source files in the tunnel directory.\n\n**about exception**\n\nIf a Processor throws an exception during the pipeline running, tunnel may call `std::abort` to abort the process (`bind_abort_channel == false`), or catch the exception and pass the exit information to other Processors. The Processor receiving the exit information will enter managed mode, and user logic will not be called again in managed mode. It simply reads data from upstream and discards it, After all upstream data is read, EOF information is written to downstream and execution ends.\n\n**about expand pipeline at runtime**\n\nUsers can construct and run a new pipeline in the Processor's processing logic, and connect the data streams of two pipelines through `ChannelSource` and `ChannelSink`. This feature is useful in certain situations, such as when you need to decide how to handle the remaining data based on the data generated during the pipeline execution process.\n\nThere is a simple example in example/embedpipeline.cc.\n\n**about pipeline interface**\n\ntunnel will assign a unique ID to each Processor instance, through which users and tunnel exchange pipeline structure information.\n\nThe API of pipeline follows the principle of only allowing post nodes to be added to leaf nodes. Leaf nodes refer to non-sink nodes that have not yet specified an output_port, for example, there is an empty pipeline:\n* Firstly, add a source node (id == 1) through AddSource, so there is only one leaf node 1 in the pipeline.\n* Then, by using AddTransform, add a post transform node (id == 2) to the source node , and the current leaf node in the pipeline will become 2.\n* Next, add another source node (id == 3) through AddSource, so there are two leaf nodes in the pipeline now, 2 and 3.\n* Finally, add a shared sink post node (id == 4) to all current leaf nodes ( 2 and 3 ) through SetSink. At this point, no leaf nodes exist in the pipeline. A pipeline without leaf nodes is called complete, and only complete pipelines can be executed.\n\nPlease read the content in the doc directory and example directory to learn about the API usage of this project.\n\n## Todo\n1. Support for more types of nodes [**doing**]\n2. Support Pipeline Merge [done]\n3. Topology detection\n4. Schedule event collection [**doing**]\n5. Support active stop of execution [done with throw exception]\n6. Exception handling during execution [done]\n7. Implementing a high-performance Executor [done]\n8. Support for extension of Pipeline at runtime [done]\n9. Support for distributed scheduling (support for network io based on async_simple first)\n\n## License\ntunnel is distributed under the Apache License (Version 2.0).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchloro-pn%2Ftunnel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchloro-pn%2Ftunnel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchloro-pn%2Ftunnel/lists"}