https://github.com/paintdream/iris
Iris is an extensible asynchronous header-only framework written in pure modern C++, including a M:N task scheduler (with coroutine support for C++ 20 optionally) and an advanced DAG-based task dispatcher.
https://github.com/paintdream/iris
coroutines dag lua luabinding multi-threading thread-pool
Last synced: 12 days ago
JSON representation
Iris is an extensible asynchronous header-only framework written in pure modern C++, including a M:N task scheduler (with coroutine support for C++ 20 optionally) and an advanced DAG-based task dispatcher.
- Host: GitHub
- URL: https://github.com/paintdream/iris
- Owner: paintdream
- License: mit
- Created: 2020-07-01T14:20:52.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2025-12-24T15:21:20.000Z (14 days ago)
- Last Synced: 2025-12-26T06:28:28.312Z (13 days ago)
- Topics: coroutines, dag, lua, luabinding, multi-threading, thread-pool
- Language: C
- Homepage:
- Size: 1.61 MB
- Stars: 17
- Watchers: 1
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Iris
Iris is an extensible asynchronous header-only framework written in pure modern C++, including a M:N task scheduler (with coroutine support for C++ 20 optionally) and an advanced DAG-based task dispatcher.
## Build
Iris is header-only. The only thing you need to do is to include the corresponding header files.
Most of Iris classes works with C++ 11 compatible compilers, except some optional features:
* Lua Binding support requires C++ 17 if-constexpr feature. (Visual Studio 2017+, GCC 7+, Clang 3.9+)
* Coroutine support for thread pool scheduler requires C++ 20 standard coroutine feature. (Visual Studio 2019+, GCC 11+, Clang 14+)
All examples could be built by [CMake build system](https://cmake.org/), see CMakeLists.txt for more details.
## License
Iris is distributed under MIT License.
## Concepts
Iris provides a simple M:N task scheduler called **Warp System** which is inspired by [Boost](https://www.boost.org/) Strand System. Let's start illustrating it from basic concepts.
#### Task
A task is the **logical** executing unit in concept of application development . Usually it is represented by a function pointer.
#### Thread
A thread is a **native** execution unit provided by operating system. **Tasks must be run in threads**. Different threads are considered to be possibly running at the same time.
**Multi-threading**, which aims to run several threads within an program, is an effective approach to making full use of CPUs in many-core system. Usually it's very hard to code and debug. Therefore, there are many data structures, programming patterns and frameworks to simplify coding process and make it easier for developers. This project is one of them.
#### Thread Pool
Threads are heavy. It is not efficient to run every task by invoking a brand-new thread. Thread pool is a type of multi-threading framework that could make it more efficient. Thread pool maintains a set of threads called "Worker Thread" **reused** for running tasks. When a new task are required to be run, the thread pool could schedule it to a proper worker thread if there were idled one, or queue it until any worker became idle.
#### Warp
Some tasks are going to read/write at the same objects, or visiting the same thread-unsafe interfaces, indicating that they are not able to run at the same time. See [RACE Condition](https://en.wikipedia.org/wiki/Race_condition) for details. Here we just call them **conflicting** tasks.
To make our programs run correctly, we must establish some techniques prevent unexpected conflicts. Here introduce a new concept: **Warp**.
A warp is a logical container of series conflicting tasks. Tasks belong to the **same warp** are granted to be **mutexed** automatically and thus **neither two of them** can be run at the same time, avoiding race-conditions prospectively. This feature is called **warp restriction**. To make coding easier, we could bind all tasks about a specified object into a specific warp. In this case, we call that this object is totally bound to a warp context.
Besides, tasks among **different** warps **can** be run at the same time respectively.
#### Warp System
The Warp System is a bridge between **warps** and **thread pool**. That is, programmers commit tasks labeled by warp to the system, then the latter schedule them to a thread pool. With some magic techniques applied internally, we finally construct a conflict-free task flow.
The thread count **M** of Warp System is **fixed** when it starts. But the warp count **N** can be dynamically adjusted by programmers freely. So the warp system is a type of flexible M:N task mapping system.
## Quick Start
Let's start with simple programs in [iris_dispatcher_demo.cpp](iris_dispatcher_demo.cpp) .
#### Basic Example: simple explosion
The Warp System runs on a thread pool, and the first thing is to create it. There is a built-in thread pool written in C++ 11 std::thread in [iris_dispatcher.h](iris_dispatcher.h), you can replace it with your own platform-correlated implementation.
```C++
static const size_t thread_count = 4;
iris_async_worker_t<> worker(thread_count);
```
Then we initialize the warps. There is no "warp system class". Each warp is **individual**, just create a vector of them. We call them warp 0, warp 1, etc.
Different from boost strands, the tasks in a warp are **NOT** ordered by default, which means the **final execution order** is not the same order of committing. You can still enable ordering as you like anyway (see declaration of "strand_t" as following code), which is not recommended because ordering may be slightly inefficient than default setting.
```C++
static const size_t warp_count = 8;
using warp_t = iris_warp_t>;
using strand_t = iris_warp_t, true>; // behaves like a strand
std::vector warps;
warps.reserve(warp_count);
for (size_t i = 0; i < warp_count; i++) {
warps.emplace_back(worker); // calls iris_warp_t::iris_warp_t(iris_async_worker_t<>&)
}
```
Then we can schedule a task into the warp you want. Just call **queue_routine**.
```C++
warps[0].queue_routine([]() {/* operations on warps[0] */});
warps[0].queue_routine([]() {/* operations on warps[0] */});
```
That's all you need to do. According to warp restrictions, operation A and operation B are **never executed at the same time**, since they are in the **same** warp.
Otherwise, if we queue_routine tasks to different warps, like:
```C++
warps[0].queue_routine([]() { /* do operation C */});
warps[1].queue_routine([]() { /* do operation D */});
```
According to warp restrictions, operation A and operation B **could be executed at the same time**.
Here is an "explosion" example. In this example, we code a function called "explosion", which randomly folks multiple recursions of writing operations on a integer array described here:
```C++
static int32_t warp_data[warp_count] = { 0 };
```
The restriction is that warp 0 can only write warp_data[0], warp 1 can only write warp_data[1]:
```C++
std::function explosion;
static constexpr size_t split_count = 4;
static constexpr size_t terminate_factor = 100;
explosion = [&warps, &explosion, &worker]() {
if (worker.is_terminated())
return;
warp_t& current_warp = *warp_t::get_current_warp();
size_t warp_index = ¤t_warp - &warps[0];
warp_data[warp_index]++;
// simulate working
std::this_thread::sleep_for(std::chrono::milliseconds(rand() % 40));
warp_data[warp_index]++;
if (rand() % terminate_factor == 0) {
// randomly terminates
worker.terminate();
}
warp_data[warp_index]++;
// randomly dispatch to warp
for (size_t i = 0; i < split_count; i++) {
warps[rand() % warp_count].queue_routine(std::function(explosion));
}
warp_data[warp_index] -= 3;
};
```
Though there is no locks or atomics on operating warp_data, we can still assert that the final value of each warp_data must be 0. The execution of the same warp never overlap in time-line.
#### Advanced Example: garbage collection
There is a function named garbage_collection, which simulates a multi-threaded mark-sweep [garbage collection](http://en.wikipedia.org/wiki/Garbage_collection_(computer_science) ) process.
Garbage collection is a technique for collecting unreferenced objects and deleting them. Mark-sweep is a basic approach for garbage collection. It contains three steps:
1. Scanning all objects and mark them **unvisited**.
2. Traverse from root objects through reference relationships, mark all objects that can be directly or indirectly referenced to **visited**.
3. Rescanning all objects, delete the objects with **unvisited** mark. Thus all objects that not linked with root objects (i.e. garbage) are deleted.
Now suppose we got the definition of basic object node as followed:
```C++
struct node_t {
size_t warp_index = 0;
size_t visit_count = 0; // we do not use std::atomic<> here.
std::vector references;
};
struct graph_t {
std::vector nodes;
};
```
To apply garbage collection, we need to record every **references** from the current node, and traverse them from root object as collecting. We use **visit_count** to record whether the current node is **visited**.
If you are experienced in multi-threaded programming, you may figure out that **visit_count** should be of type std::atomic because there may be several threads performing modification during collecting progress.
But we have decided to make things different.
We are splitting the node visiting operations into multiple warps (recorded by member **warp_index**). For example, node 1-10 are grouped into warp 0, node 11-20 are grouped into warp 1, or just randomly assigned. Any task operations on the nodes in the same warp will be protected by warp system. As a result, the variable **visit_count** is granted to **never** operated by multiple-threads and no atomic or locks are required.
In order to obey the warp restriction, all we need to do is to invoke a task with related node's warp when we are planning to do something on it:
```C++
warps[target_node.warp_index].queue_routine([]() {
// operations on target_node
});
```
Since we have visited a new node, all **references** should be added into next collection process. To preserve the warp restriction, we schedule them into their own warps: (see the line commented with <------)
```C++
graph_t graph;
std::function collector;
std::atomic collecting_count;
collecting_count.store(0, std::memory_order_release);
collector = [&warps, &collector, &worker, &graph, &collecting_count](size_t node_index) {
warp_t& current_warp = *warp_t::get_current_warp();
size_t warp_index = ¤t_warp - &warps[0];
node_t& node = graph.nodes[node_index];
assert(node.warp_index == warp_index);
if (node.visit_count == 0) {
node.visit_count++; // SAFE: only one thread can visit it
for (size_t i = 0; i < node.references.size(); i++) {
size_t next_node_index = node.references[i];
size_t next_node_warp = graph.nodes[next_node_index].warp_index;
collecting_count.fetch_add(1, std::memory_order_acquire);
warps[next_node_warp].queue_routine(std::bind(collector, next_node_index)); // <------
}
}
if (collecting_count.fetch_sub(1, std::memory_order_release) == 1) {
// all work finished.
// ...
}
};
```
That's all, there would be **no** explicit locks and atomics. All dangerous multi-threaded works are done by Warp System. See the full source code of garbage_collection for more details.
#### Discussion
Now let's get back to the beginning, what's the meaning of warps? What if we just use atomics or locks?
The answer contains three aspects:
1. Convenient: The only thing you must remember is the rule that **always schedule task according to warp**. There is no lock-order requirement, dead-locking, busy-waiting, memory order problem and atomic myths.
2. High performance: If we abuse locks and atomics everywhere, for example, allocating separate locks on each object, do locks or atomic operations as long as we need to visit objects, then the program will stuck on bus-locking, kernel-switching and thread-switching, which lead to low performance issues. The warp concept wraps a series of operations or a mount of objects into a logical "scheduling package", reducing switching cost and busy-wait cost, making them more friendly for multi-thread systems.
3. Flexible: you could easily adjust the object/task warping rules as you like. For example, allocating more warps and splitting objects with smaller granularity if you have more CPUs. The system allows programmers to transport a object or a group of tasks from one warp to another dynamically, if they are working on some dynamic overload balancing features.
## Step further
### In-Warp Parallel
In common case, there is only one thread running in a warp context. But what if we want to break the rule temporarily by local code and do some parallelized operations with warp restriction holding for other code? I know it's unsafe, but I just want to do it.
Open the [iris_dispatcher_demo.cpp](iris_dispatcher_demo.cpp) and you could find a piece of code in function "simple_explosion":
```C++
static constexpr size_t parallel_factor = 11;
static constexpr size_t parallel_count = 6;
if (rand() % parallel_factor == 0) {
// read-write lock example: multiple reading blocks writing
std::shared_ptr> shared_value = std::make_shared>(-0x7fffffff);
for (size_t i = 0; i < parallel_count; i++) {
current_warp.queue_routine_parallel([shared_value, warp_index]() {
// only read operations
std::this_thread::sleep_for(std::chrono::milliseconds(rand() % 40));
int32_t v = shared_value->exchange(warp_data[warp_index], std::memory_order_release);
assert(v == warp_data[warp_index] || v == -0x7fffffff);
});
}
}
```
The function **queue_routine_parallel** invokes a special parallelized task on current_warp, which can be run at the same time. As one parallelized task running, other normal tasks on current_warp remains to be **blocked**. After all parallelized task finishes, the normal tasks then could to be scheduled.
**Parallelized tasks to normal tasks is what read locks to write locks**. It's an advanced feature and you must be careful when use them.
### Coroutines
In C++ 20, we can use coroutines to simplify asynchronous program development.
Warp system supports coroutines integration, you could find an example at [iris_coroutine_demo.cpp](iris_coroutine_demo.cpp):
To start with a coroutine, just write a function with return value type "iris_coroutine_t":
```C++
iris_coroutine_t example(warp_t::async_worker_t& async_worker, warp_t* warp, int value) {}
```
In this coroutine function, you could await call **iris_switch** to switch to another warp context:
```C++
if (warp != nullptr) {
warp_t* current = co_await iris_switch(warp);
printf("Switch to warp %p\n", warp);
co_await iris_switch((warp_t*)nullptr);
printf("Detached\n");
co_await iris_switch(warp);
printf("Attached\n");
co_await iris_switch(current);
assert(current == warp_t::get_current_warp());
}
```
co_wait iris_switch returns previous warp. Notice that we can switch to a nullptr warp, which means that we are planning to detach from current warp. Switching from nullptr warp to a valid warp is also allowed respectively.
And we can create and wait a asynchronized task on target warp:
```C++
co_await iris_awaitable(warp, []() {});
```
It is equivalent to switching to warp and switching back. But **iris_awaitable** allows early dispatching before waiting:
```C++
auto awaitble = iris_awaitable(warp, []() {});
awaitable.dispatch();
// do something other
co_await awaitable;
```
iris_coroutine_t is not only a coroutine but also an awaitable object. You could also co_await it to chain your coroutine pipeline.
### DAG-based Task Dispatcher
DAG-based Task Dispatcher, also well-known as Task Graph, is widely used task dispatching techniques for tasks with partial order dependency.
We also provide a DAG-based Task Dispatcher called iris_dispatcher_t (see function "graph_dispatch" at [iris_dispatcher_demo ](iris_dispatcher_demo.cpp) ):
You can create a dispatcher with:
```C++
iris_dispatcher_t dispatcher(worker);
```
The second parameter is an optional function, called after all tasks in dispatcher graph finished.
To add a task to dispatcher, call **allocate**.
```C++
auto d = dispatcher.allocate(&warps[2], []() { std::cout << "Warp 2 task [4]" << std::endl; });
auto a = dispatcher.allocate(&warps[0], []() { std::cout << "Warp 0 task [1]" << std::endl; });
auto b = dispatcher.allocate(&warps[1], []() { std::cout << "Warp 1 task [2]" << std::endl; });
```
Notice that there is a return value with internal type routine_t*. You could call **order** function to order them later.
```C++
dispatcher.order(a, b);
// dispatcher.order(b, a); // will trigger validate assertion
auto c = dispatcher.allocate(nullptr, []() { std::cout << "Warp nil task [3]" << std::endl; });
dispatcher.order(b, c);
// dispatcher.order(c, a); // will trigger validate assertion
dispatcher.order(b, d);
```
Then call **dispatch** to run them.
```C++
dispatcher.dispatch(a);
dispatcher.dispatch(b);
dispatcher.dispatch(c);
dispatcher.dispatch(d);
```
To dispatch more flexible, you can **defer/dispatch** a task dynamically. Notice that **defer** must be called during dispatcher running and **BEFORE** target task actually runs.
```c++
auto b = dispatcher.allocate(&warps[1], [&dispatcher, d]() {
dispatcher.defer(d);
std::cout << "Warp 1 task [2]" << std::endl;
dispatcher.dispatch(d);
});
```
### Polling from external thread
It is a common case that a thread has to be blocked to wait for some signals arrive. For example, suppose you are spinning to wait an atomic variable to be expected value (spin lock, for example), and there are nothing to be done but to spin. In this case, we can try to "borrow" some tasks from thread pool and executing them if our atomic variable is not ready yet.
```C++
while (some_variable.load(std::memory_order_acquire) != expected_value) {
// delay at most 20ms or poll tasks with priority 0 if possible
worker.poll_one(0, std::chrono::milliseconds(20));
}
```
### Exiting
Use iris_warp_t::poll to poll all tasks from all wraps (including their async_worker's tasks) while exiting.
```C++
async_worker.terminate();
async_worker.join();
while (iris_warp_t::poll({ &warp1, &warp2, ... })) {
std::this_thread::sleep_for(std::chrono::milliseconds(20));
}
```