{"id":15047485,"url":"https://github.com/alugowski/poolstl","last_synced_at":"2025-04-10T00:51:07.604Z","repository":{"id":206891286,"uuid":"717651324","full_name":"alugowski/poolSTL","owner":"alugowski","description":"Light and self-contained implementation of C++17 parallel algorithms.","archived":false,"fork":false,"pushed_at":"2024-11-18T18:49:24.000Z","size":154,"stargazers_count":34,"open_issues_count":5,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-10T00:51:01.352Z","etag":null,"topics":["concurrency","cpp11","cpp14","cpp17","emscripten","multithreading","parallel","parallel-computing","parallel-sort","parallel-sorting","sorting","sorting-algorithms","sorting-algorithms-implemented","stl","stl-algorithms","thread","threading","wasm"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alugowski.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-BSD.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-12T05:39:22.000Z","updated_at":"2025-02-22T11:36:49.000Z","dependencies_parsed_at":"2024-01-18T05:42:45.345Z","dependency_job_id":"bd7cc650-f5fb-4fc5-8128-af95300ad499","html_url":"https://github.com/alugowski/poolSTL","commit_stats":{"total_commits":71,"total_committers":1,"mean_commits":71.0,"dds":0.0,"last_synced_commit":"be169357f9b350be2ddda4fdef6be56c3c478c03"},"previous_names":["alugowski/poolstl"],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alugowski%2FpoolSTL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alugowski%2FpoolSTL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alugowski%2FpoolSTL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alugowski%2FpoolSTL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alugowski","download_url":"https://codeload.github.com/alugowski/poolSTL/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248137997,"owners_count":21053775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["concurrency","cpp11","cpp14","cpp17","emscripten","multithreading","parallel","parallel-computing","parallel-sort","parallel-sorting","sorting","sorting-algorithms","sorting-algorithms-implemented","stl","stl-algorithms","thread","threading","wasm"],"created_at":"2024-09-24T20:59:03.418Z","updated_at":"2025-04-10T00:51:07.580Z","avatar_url":"https://github.com/alugowski.png","language":"C++","readme":"[![tests](https://github.com/alugowski/poolSTL/actions/workflows/tests.yml/badge.svg)](https://github.com/alugowski/poolSTL/actions/workflows/tests.yml)\n[![codecov](https://codecov.io/gh/alugowski/poolSTL/branch/main/graph/badge.svg?token=zB7yN8NwUc)](https://codecov.io/gh/alugowski/poolSTL)\n\n# poolSTL\n\nLight, self-contained, thread pool-based implementation of [C++17 parallel standard library algorithms](https://en.cppreference.com/w/cpp/algorithm).\n\nC++17 introduced parallel overloads of standard library algorithms that accept an [*Execution Policy*](https://en.cppreference.com/w/cpp/algorithm/execution_policy_tag) as the first argument.\nPolicies specify limits on how the implementation may parallelize the algorithm, enabling methods like threads, vectorization, or even GPU.\nPolicies can be supplied by the compiler or by libraries like this one.\n\n```c++\nstd::sort(std::execution::par, vec.begin(), vec.end());\n    //    ^^^^^^^^^^^^^^^^^^^ native C++17 parallel Execution Policy      \n```\n\nUnfortunately compiler support [varies](https://en.cppreference.com/w/cpp/compiler_support/17). Quick summary of compilers' default standard libraries:\n\n|                   |    Linux     |    macOS     |   Windows    |\n|:------------------|:------------:|:------------:|:------------:|\n| GCC 9+            | TBB Required | TBB Required | TBB Required |\n| GCC 8-            |      ❌      |      ❌      |      ❌      |\n| Clang (libc++)    |      ❌      |      ❌      |      ❌      |\n| Clang (libstdc++) | TBB Required | TBB Required | TBB Required |\n| Apple Clang       |              |      ❌      |              |\n| MSVC 15.7+ (2017) |              |              |      ✅      |\n| [Parallel STL](https://www.intel.com/content/www/us/en/developer/articles/guide/get-started-with-parallel-stl.html) | TBB Required | TBB Required | TBB Required |\n| **poolSTL**       |      ✅*     |      ✅*     |      ✅*     |\n\nPoolSTL is a *supplement* to fill in the support gaps. It is not a full implementation; only the basics are covered.\nHowever, it is small, easy to integrate, and has no external dependencies. A good backup to the other options.\n\nUse poolSTL exclusively, or only on platforms lacking native support,\nor only if [TBB](https://www.intel.com/content/www/us/en/developer/tools/oneapi/onetbb.html) is not present.\n\nSupports C++11 and higher. Algorithms introduced in C++17 require C++17 or higher.  \nTested in CI on GCC 7+, Clang/LLVM 5+, Apple Clang, MSVC, MinGW, and Emscripten.\n\n## Implemented Algorithms\nAlgorithms are added on an as-needed basis. If you need one [open an issue](https://github.com/alugowski/poolSTL/issues) or contribute a PR.  \n**Limitations:** All iterators must be random access. No nested parallel calls.\n\n### `\u003calgorithm\u003e`\n* [`all_of`](https://en.cppreference.com/w/cpp/algorithm/all_of), [`any_of`](https://en.cppreference.com/w/cpp/algorithm/any_of), [`none_of`](https://en.cppreference.com/w/cpp/algorithm/none_of)\n* [`copy`](https://en.cppreference.com/w/cpp/algorithm/copy), [`copy_n`](https://en.cppreference.com/w/cpp/algorithm/copy_n)\n* [`count`](https://en.cppreference.com/w/cpp/algorithm/count), [`count_if`](https://en.cppreference.com/w/cpp/algorithm/count_if)\n* [`fill`](https://en.cppreference.com/w/cpp/algorithm/fill), [`fill_n`](https://en.cppreference.com/w/cpp/algorithm/fill_n)\n* [`find`](https://en.cppreference.com/w/cpp/algorithm/find), [`find_if`](https://en.cppreference.com/w/cpp/algorithm/find_if), [`find_if_not`](https://en.cppreference.com/w/cpp/algorithm/find_if_not)\n* [`for_each`](https://en.cppreference.com/w/cpp/algorithm/for_each), [`for_each_n`](https://en.cppreference.com/w/cpp/algorithm/for_each_n)\n* [`partition`](https://en.cppreference.com/w/cpp/algorithm/partition)\n* [`sort`](https://en.cppreference.com/w/cpp/algorithm/sort), [`stable_sort`](https://en.cppreference.com/w/cpp/algorithm/stable_sort)\n* [`transform`](https://en.cppreference.com/w/cpp/algorithm/transform)\n\n### `\u003cnumeric\u003e`\n* [`exclusive_scan`](https://en.cppreference.com/w/cpp/algorithm/exclusive_scan) (C++17 only)\n* [`reduce`](https://en.cppreference.com/w/cpp/algorithm/reduce)\n* [`transform_reduce`](https://en.cppreference.com/w/cpp/algorithm/transform_reduce) (C++17 only)\n\nAll in `std::` namespace.\n\n### Other\n* [`poolstl::iota_iter`](include/poolstl/iota_iter.hpp) - Iterate over integers. Same as iterating over output of [`std::iota`](https://en.cppreference.com/w/cpp/algorithm/iota) but without materializing anything. Iterator version of [`std::ranges::iota_view`](https://en.cppreference.com/w/cpp/ranges/iota_view).\n* `poolstl::for_each_chunk` - Like `std::for_each`, but explicitly splits the input range into chunks then exposes the chunked parallelism. A user-specified chunk constructor is called for each parallel chunk then its output is passed to each loop iteration. Useful for workloads that need an expensive workspace that can be reused between iterations, but not simultaneously by all iterations in parallel.\n* `poolstl::pluggable_sort` - Like `std::sort`, but allows specification of sequential sort method. To parallelize [pdqsort](https://github.com/orlp/pdqsort): `pluggable_sort(par, v.begin(), v.end(), pdqsort)`.\n\n## Usage\n\nPoolSTL provides:\n* `poolstl::par`: Substitute for [`std::execution::par`](https://en.cppreference.com/w/cpp/algorithm/execution_policy_tag). Parallelized using a [thread pool](https://github.com/alugowski/task-thread-pool).\n* `poolstl::seq`: Substitute for `std::execution::seq`. Simply calls the regular (non-policy) overload.\n* `poolstl::par_if()`: Choose parallel or sequential at runtime. See below.\n\nIn short, use `poolstl::par` to make your code parallel. Complete example:\n```c++\n#include \u003ciostream\u003e\n#include \u003cpoolstl/poolstl.hpp\u003e\n\nint main() {\n    std::vector\u003cint\u003e v = {0, 1, 2, 3, 4, 5};\n    auto sum = std::reduce(poolstl::par, vec.cbegin(), vec.cend());\n    //                     ^^^^^^^^^^^^\n    //                     Add this to make your code parallel.\n    std::cout \u003c\u003c \"Sum=\" \u003c\u003c sum \u003c\u003c std::endl;\n    return 0;\n}\n```\n\n### Controlling Thread Pool Size with `par.on(pool)`\n\nThe thread pool used by `poolstl::par` is managed internally by poolSTL. It is started on first use.  \nUse your own [thread pool](https://github.com/alugowski/task-thread-pool)\nwith `poolstl::par.on(pool)` for control over thread count, startup/shutdown, etc.:\n\n```c++\ntask_thread_pool::task_thread_pool pool{4};  // 4 threads\n\nstd::reduce(poolstl::par.on(pool), vec.begin(), vec.end());\n```\n\n### Choosing Parallel or Sequential at Runtime with `par_if`\n\nSometimes the choice whether to parallelize or not should be made at runtime. For example, small datasets may not amortize\nthe cost of starting threads, while large datasets do and should be parallelized.\n\nUse `poolstl::par_if` to select between `par` and `seq` at runtime:\n```c++\nbool is_parallel = vec.size() \u003e 10000;\n\nstd::reduce(poolstl::par_if(is_parallel), vec.begin(), vec.end());\n```\n\nUse `poolstl::par_if(is_parallel, pool)` to control the thread pool used by `par`, if selected.\n\n# Examples\n\n### Parallel `for (auto\u0026 value : vec)`\n\n```c++\nstd::vector\u003cint\u003e vec = {0, 1, 2, 3, 4, 5};\n\n// Parallel for-each\nstd::for_each(poolstl::par, vec.begin(), vec.end(), [](auto\u0026 value) {\n    std::cout \u003c\u003c value;  // loop body\n});\n```\n\n### Parallel `for (int i = 0; i \u003c 100; ++i)`\n\n```c++\nusing poolstl::iota_iter;\n\n// parallel for loop\nstd::for_each(poolstl::par, iota_iter\u003cint\u003e(0), iota_iter\u003cint\u003e(100), [](auto i) {\n    std::cout \u003c\u003c i;  // loop body\n});\n```\n\n\n\n### Parallel Sort\n\n```c++\nstd::vector\u003cint\u003e vec = {5, 2, 1, 3, 0, 4};\n\nstd::sort(poolstl::par, vec.begin(), vec.end());\n```\n\n# Installation\n\n### Single File\n\nEach [release](https://github.com/alugowski/poolSTL/releases/latest) publishes a single-file amalgamated `poolstl.hpp`. Simply copy this into your project.\n\n**Build requirements:**\n - Clang and GCC 8 or older: require `-lpthread` to use C++11 threads.\n - Emscripten: compile and link with `-pthread` to use C++11 threads. [See docs](https://emscripten.org/docs/porting/pthreads.html).\n\n### CMake\n\n```cmake\ninclude(FetchContent)\nFetchContent_Declare(\n        poolSTL\n        GIT_REPOSITORY https://github.com/alugowski/poolSTL\n        GIT_TAG main\n        GIT_SHALLOW TRUE\n)\nFetchContent_MakeAvailable(poolSTL)\n\ntarget_link_libraries(YOUR_TARGET poolSTL::poolSTL)\n```\n\nAlternatively copy or checkout the repo into your project and:\n```cmake\nadd_subdirectory(poolSTL)\n```\n\n# Benchmark\n\nSee [benchmark/](benchmark) to compare poolSTL against the standard sequential implementation, and (if available) the\nnative `std::execution::par` implementation.\n\nResults on an M1 Pro (6 power, 2 efficiency cores), with GCC 13:\n```\n-------------------------------------------------------------------------------------------------------\nBenchmark                                                             Time             CPU   Iterations\n-------------------------------------------------------------------------------------------------------\nall_of()/real_time                                                 19.9 ms         19.9 ms           35\nall_of(poolstl::par)/real_time                                     3.47 ms        0.119 ms          198\nall_of(std::execution::par)/real_time                              3.45 ms         3.25 ms          213\nfind_if()/needle_percentile:5/real_time                           0.988 ms        0.987 ms          712\nfind_if()/needle_percentile:50/real_time                           9.87 ms         9.86 ms           71\nfind_if()/needle_percentile:100/real_time                          19.7 ms         19.7 ms           36\nfind_if(poolstl::par)/needle_percentile:5/real_time               0.405 ms        0.050 ms         1730\nfind_if(poolstl::par)/needle_percentile:50/real_time               1.85 ms        0.096 ms          393\nfind_if(poolstl::par)/needle_percentile:100/real_time              3.64 ms        0.102 ms          193\nfind_if(std::execution::par)/needle_percentile:5/real_time        0.230 ms        0.220 ms         3103\nfind_if(std::execution::par)/needle_percentile:50/real_time        1.75 ms         1.60 ms          410\nfind_if(std::execution::par)/needle_percentile:100/real_time       3.51 ms         3.24 ms          204\nfor_each()/real_time                                               94.6 ms         94.6 ms            7\nfor_each(poolstl::par)/real_time                                   18.7 ms        0.044 ms           36\nfor_each(std::execution::par)/real_time                            15.3 ms         12.9 ms           46\nsort()/real_time                                                    603 ms          602 ms            1\nsort(poolstl::par)/real_time                                        112 ms         6.64 ms            6\nsort(std::execution::par)/real_time                                 113 ms          102 ms            6\npluggable_sort(poolstl::par, ..., pdqsort)/real_time               71.7 ms         6.67 ms           10\ntransform()/real_time                                              95.0 ms         94.9 ms            7\ntransform(poolstl::par)/real_time                                  17.4 ms        0.037 ms           38\ntransform(std::execution::par)/real_time                           15.3 ms         13.2 ms           45\nexclusive_scan()/real_time                                         33.7 ms         33.7 ms           21\nexclusive_scan(poolstl::par)/real_time                             11.6 ms        0.095 ms           55\nexclusive_scan(std::execution::par)/real_time                      19.8 ms         15.3 ms           32\nreduce()/real_time                                                 15.2 ms         15.2 ms           46\nreduce(poolstl::par)/real_time                                     4.06 ms        0.044 ms          169\nreduce(std::execution::par)/real_time                              3.38 ms         3.16 ms          214\n```\n\n# poolSTL as `std::execution::par`\n**USE AT YOUR OWN RISK! THIS IS A HACK!**\n\nTwo-line hack for missing compiler support. A no-op on compilers with support.\n\nIf `POOLSTL_STD_SUPPLEMENT` is defined then poolSTL will check for native compiler support.\nIf not found then poolSTL will alias its `poolstl::par` as `std::execution::par`:\n\n```c++\n#define POOLSTL_STD_SUPPLEMENT\n#include \u003cpoolstl/poolstl.hpp\u003e\n```\n\nNow just use `std::execution::par` as normal, and poolSTL will fill in as necessary. See [supplement_test.cpp](tests/supplement_test.cpp).\n\n**Example use case:** You *can* link against TBB, so you'll use native support on GCC 9+, Clang, MSVC, etc.\nPoolSTL will fill in automatically on GCC \u003c9 and Apple Clang.\n\n**Example use case 2:** You'd *prefer* to use the TBB version, but don't want to fail on systems that don't have it.\nSimply use the supplement as above, but have your build system (CMake, meson, etc.) check for TBB.\nIf not found, define `POOLSTL_STD_SUPPLEMENT_NO_INCLUDE` and the supplement will not `#include \u003cexecution\u003e` (and neither should your code!),\nthus dropping the TBB link requirement. The poolSTL supplement fills in.  \nSee the supplement section of [tests/CMakeLists.txt](tests/CMakeLists.txt) for an example.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falugowski%2Fpoolstl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falugowski%2Fpoolstl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falugowski%2Fpoolstl/lists"}