{"id":15047427,"url":"https://github.com/rigtorp/spscqueue","last_synced_at":"2025-05-16T13:07:58.958Z","repository":{"id":37444612,"uuid":"54936219","full_name":"rigtorp/SPSCQueue","owner":"rigtorp","description":"A bounded single-producer single-consumer wait-free and lock-free queue written in C++11","archived":false,"fork":false,"pushed_at":"2024-01-04T14:24:22.000Z","size":169,"stargazers_count":977,"open_issues_count":6,"forks_count":134,"subscribers_count":30,"default_branch":"master","last_synced_at":"2025-04-12T10:57:49.791Z","etag":null,"topics":["concurrency","concurrent-data-structure","cpp","cpp11","header-only","lock-free","queue","spsc-queue"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rigtorp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-03-29T01:30:58.000Z","updated_at":"2025-04-12T06:16:46.000Z","dependencies_parsed_at":"2023-02-10T12:01:46.041Z","dependency_job_id":"402d25ba-e2e6-4193-b8a4-1163cfc40072","html_url":"https://github.com/rigtorp/SPSCQueue","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rigtorp%2FSPSCQueue","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rigtorp%2FSPSCQueue/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rigtorp%2FSPSCQueue/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rigtorp%2FSPSCQueue/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rigtorp","download_url":"https://codeload.github.com/rigtorp/SPSCQueue/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254535829,"owners_count":22087399,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["concurrency","concurrent-data-structure","cpp","cpp11","header-only","lock-free","queue","spsc-queue"],"created_at":"2024-09-24T20:58:09.506Z","updated_at":"2025-05-16T13:07:58.920Z","avatar_url":"https://github.com/rigtorp.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SPSCQueue.h\n\n[![C/C++ CI](https://github.com/rigtorp/SPSCQueue/workflows/C/C++%20CI/badge.svg)](https://github.com/rigtorp/SPSCQueue/actions)\n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://raw.githubusercontent.com/rigtorp/SPSCQueue/master/LICENSE)\n\nA single producer single consumer wait-free and lock-free fixed size queue\nwritten in C++11. This implementation is faster than both\n[*boost::lockfree::spsc*](https://www.boost.org/doc/libs/1_76_0/doc/html/boost/lockfree/spsc_queue.html)\nand [*folly::ProducerConsumerQueue*](https://github.com/facebook/folly/blob/master/folly/docs/ProducerConsumerQueue.md).\n\n## Example\n\n```cpp\nSPSCQueue\u003cint\u003e q(1);\nauto t = std::thread([\u0026] {\n  while (!q.front());\n  std::cout \u003c\u003c *q.front() \u003c\u003c std::endl;\n  q.pop();\n});\nq.push(1);\nt.join();\n```\n\nSee `src/SPSCQueueExample.cpp` for the full example.\n\n## Usage\n\n- `SPSCQueue\u003cT\u003e(size_t capacity);`\n\n  Create a `SPSCqueue` holding items of type `T` with capacity\n  `capacity`. Capacity needs to be at least 1.\n\n- `void emplace(Args \u0026\u0026... args);`\n\n  Enqueue an item using inplace construction. Blocks if queue is full.\n\n- `bool try_emplace(Args \u0026\u0026... args);`\n\n  Try to enqueue an item using inplace construction. Returns `true` on\n  success and `false` if queue is full.\n\n- `void push(const T \u0026v);`\n\n  Enqueue an item using copy construction. Blocks if queue is full.\n\n- `template \u003ctypename P\u003e void push(P \u0026\u0026v);`\n\n  Enqueue an item using move construction. Participates in overload\n  resolution only if `std::is_constructible\u003cT, P\u0026\u0026\u003e::value == true`.\n  Blocks if queue is full.\n\n- `bool try_push(const T \u0026v);`\n\n  Try to enqueue an item using copy construction. Returns `true` on\n  success and `false` if queue is full.\n\n- `template \u003ctypename P\u003e bool try_push(P \u0026\u0026v);`\n\n  Try to enqueue an item using move construction. Returns `true` on\n  success and `false` if queue is full. Participates in overload\n  resolution only if `std::is_constructible\u003cT, P\u0026\u0026\u003e::value == true`.\n\n- `T *front();`\n\n  Return pointer to front of queue. Returns `nullptr` if queue is\n  empty.\n\n- `void pop();`\n\n  Dequeue first item of queue. You must ensure that the queue is non-empty\n  before calling pop. This means that `front()` must have returned a\n  non-`nullptr` before each call to `pop()`. Requires\n  `std::is_nothrow_destructible\u003cT\u003e::value == true`.\n\n- `size_t size();`\n\n  Return the number of items available in the queue.\n\n- `bool empty();`\n\n  Return true if queue is currently empty.\n\nOnly a single writer thread can perform enqueue operations and only a\nsingle reader thread can perform dequeue operations. Any other usage\nis invalid.\n\n## Huge page support\n\nIn addition to supporting custom allocation through the [standard custom\nallocator interface](https://en.cppreference.com/w/cpp/named_req/Allocator) this\nlibrary also supports standard proposal [P0401R3 Providing size feedback in the\nAllocator\ninterface](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0401r3.html).\nThis allows convenient use of [huge\npages](https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html)\nwithout wasting any allocated space. Using size feedback is only supported when\nC++17 is enabled.\n\nThe library currently doesn't include a huge page allocator since the APIs for\nallocating huge pages are platform dependent and handling of huge page size and\nNUMA awareness is application specific.\n\nBelow is an example huge page allocator for Linux:\n\n```cpp\n#include \u003csys/mman.h\u003e\n\ntemplate \u003ctypename T\u003e struct Allocator {\n  using value_type = T;\n\n  struct AllocationResult {\n    T *ptr;\n    size_t count;\n  };\n\n  size_t roundup(size_t n) { return (((n - 1) \u003e\u003e 21) + 1) \u003c\u003c 21; }\n\n  AllocationResult allocate_at_least(size_t n) {\n    size_t count = roundup(sizeof(T) * n);\n    auto p = static_cast\u003cT *\u003e(mmap(nullptr, count, PROT_READ | PROT_WRITE,\n                                   MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,\n                                   -1, 0));\n    if (p == MAP_FAILED) {\n      throw std::bad_alloc();\n    }\n    return {p, count / sizeof(T)};\n  }\n\n  void deallocate(T *p, size_t n) { munmap(p, roundup(sizeof(T) * n)); }\n};\n```\n\nSee `src/SPSCQueueExampleHugepages.cpp` for the full example on how to use huge\npages on Linux.\n\n## Implementation\n\n![Memory layout](https://github.com/rigtorp/SPSCQueue/blob/master/spsc.svg)\n\nThe underlying implementation is based on a [ring\nbuffer](https://en.wikipedia.org/wiki/Circular_buffer).\n\nCare has been taken to make sure to avoid any issues with [false\nsharing](https://en.wikipedia.org/wiki/False_sharing). The head and tail indices\nare aligned and padded to the false sharing range (cache line size).\nAdditionally the slots buffer is padded with the false sharing range at the\nbeginning and end, this prevents false sharing with any adjacent allocations.\n\nThis implementation has higher throughput than a typical concurrent ring buffer\nby locally caching the head and tail indices in the writer and reader\nrespectively. The caching increases throughput by reducing the amount of cache\ncoherency traffic.\n\nTo understand how that works first consider a read operation in absence of\ncaching: the head index (read index) needs to be updated and thus that cache\nline is loaded into the L1 cache in exclusive state. The tail (write index)\nneeds to be read in order to check that the queue is not empty and is thus\nloaded into the L1 cache in shared state. Since a queue write operation needs to\nread the head index it's likely that a write operation requires some cache\ncoherency traffic to bring the head index cache line back into exclusive state.\nIn the worst case there will be one cache line transition from shared to\nexclusive for every read and write operation.\n\nNext consider a queue reader that caches the tail index: if the cached tail\nindex indicates that the queue is empty, then load the tail index into the\ncached tail index. If the queue was non-empty multiple read operations up until\nthe cached tail index can complete without stealing the writer's tail index\ncache line's exclusive state. Cache coherency traffic is therefore reduced. An\nanalogous argument can be made for the queue write operation.\n\nThis implementation allows for arbitrary non-power of two capacities, instead\nallocating a extra queue slot to indicate full queue. If you don't want to waste\nstorage for a extra queue slot you should use a different implementation.\n\nReferences:\n\n- *Intel*. [Avoiding and Identifying False Sharing Among Threads](https://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads).\n- *Wikipedia*. [Ring buffer](https://en.wikipedia.org/wiki/Circular_buffer).\n- *Wikipedia*. [False sharing](https://en.wikipedia.org/wiki/False_sharing).\n\n## Testing\n\nTesting lock-free algorithms is hard. I'm using two approaches to test\nthe implementation:\n\n- A single threaded test that the functionality works as intended,\n  including that the item constructor and destructor is invoked\n  correctly.\n- A multi-threaded fuzz test verifies that all items are enqueued and dequeued\n  correctly under heavy contention.\n\n## Benchmarks\n\nThroughput benchmark measures throughput between 2 threads for a queue of `int`\nitems.\n\nLatency benchmark measures round trip time between 2 threads communicating using\n2 queues of `int` items.\n\nBenchmark results for a AMD Ryzen 9 3900X 12-Core Processor, the 2 threads are\nrunning on different cores on the same chiplet:\n\n| Queue                        | Throughput (ops/ms) | Latency RTT (ns) |\n| ---------------------------- | ------------------: | ---------------: |\n| SPSCQueue                    |              362723 |              133 |\n| boost::lockfree::spsc        |              209877 |              222 |\n| folly::ProducerConsumerQueue |              148818 |              147 |\n\n## Cited by\n\nSPSCQueue have been cited by the following papers:\n\n- Peizhao Ou and Brian Demsky. 2018. Towards understanding the costs of avoiding\n  out-of-thin-air results. Proc. ACM Program. Lang. 2, OOPSLA, Article 136\n  (October 2018), 29 pages. DOI: \u003chttps://doi.org/10.1145/3276506\u003e\n\n## About\n\nThis project was created by [Erik Rigtorp](http://rigtorp.se)\n\u003c[erik@rigtorp.se](mailto:erik@rigtorp.se)\u003e.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frigtorp%2Fspscqueue","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frigtorp%2Fspscqueue","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frigtorp%2Fspscqueue/lists"}