{"id":16923784,"url":"https://github.com/bugaevc/lets-write-sync-primitives","last_synced_at":"2025-04-11T17:05:48.610Z","repository":{"id":71726652,"uuid":"311096028","full_name":"bugaevc/lets-write-sync-primitives","owner":"bugaevc","description":"Let's implement some synchronization primitives, shall we?","archived":false,"fork":false,"pushed_at":"2021-07-05T11:15:55.000Z","size":21,"stargazers_count":19,"open_issues_count":2,"forks_count":1,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-02-04T01:35:10.099Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bugaevc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-11-08T15:42:49.000Z","updated_at":"2024-10-29T12:49:28.000Z","dependencies_parsed_at":"2023-04-05T08:50:58.615Z","dependency_job_id":null,"html_url":"https://github.com/bugaevc/lets-write-sync-primitives","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bugaevc%2Flets-write-sync-primitives","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bugaevc%2Flets-write-sync-primitives/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bugaevc%2Flets-write-sync-primitives/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bugaevc%2Flets-write-sync-primitives/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bugaevc","download_url":"https://codeload.github.com/bugaevc/lets-write-sync-primitives/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239731649,"owners_count":19687872,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T20:00:41.691Z","updated_at":"2025-02-19T20:31:25.980Z","avatar_url":"https://github.com/bugaevc.png","language":"C++","readme":"# Let's write synchronization primitives!\n\nThis is a project for me to explore the internals of various synchronization\nprimitives, as well as generic patterns around atomics and futexes.\n\n# Design decisions\n\nAll of the synchronization primitives are implemented using atomic variables,\nand (with the exception of spinlock) futexes, as supported by the Linux kernel\nand some other systems. All operations feature a *fast path*, which avoids\ncalling into the kernel for futex operations, and only uses atomics; and a slow\npath which makes full use of futexes. In a way, each primitive actually contains\ntwo different implementations, one based on atomics and one based on futexes,\nand switches between them transparently when needed/possible.\n\nThe other idea that the primitives could benefit from, optimistic spinning\n(meaning, transparently upgrading from a spinlock to a mutex), is intentionally\n*not* implemented here. This means the code is somewhat simpler than it could\nbe, but it also means that this implementation is knowingly not as fast as it\ncould possibly be. After all, this is a learning project and not a\nproduction-grade library.\n\nMost of the blocking operations (such as `semaphore.down()` and `mutex.lock()`)\nhave a corresponding `try_xxx()` version that never blocks the calling thread,\nand returns a boolean describing whether the operation could be (and have been)\nperformed immediately, without blocking. Semantically, a `try_xxx()` operation\nsucceeding has the same effect (with respect to happens-before relationships and\nsynchronization guarantees) as executing the corresponding `xxx()` operation,\nand a `try_xxx()` operation failing has the same effect as if it has not been\nexecuted at all. In either case, the calling thread is not blocked. It might be\na good idea to use the non-blocking operations whenever your thread has other\nuseful work to do that can be done without waiting for the other threads.\n\nAll the locks implemented here are not reentrant: a thread already holding the\nlock cannot claim it again. In fact, none of the primitives track which thread\nit is that is holding the lock.\n\nThe implementation of synchronization primitives includes some assertions. Those\nhave been quite helpful during development, but they add unnecessary overhead at\nruntime. Define the standard `NDEBUG` macro to disable assertions (see below).\n\n# Primitives\n\nThe following synchronization primitives are implemented.\n\n## Spinlock\n\nThis is the only primitive here to not use futexes. Instead of sleeping\nproperly when it cannot acquire the lock, it just spins in a cycle.\n\n## Mutex\n\nA mutual exclusion lock. It has the same API as a spinlock, but uses a futex to\nsleep when the lock cannot be taken immediately. If there's no contention,\nlocking the mutex succeeds immediately, without calling into the kernel. In the\nsame way, unlocking an uncontended mutex does not call into the kernel.\n\nA mutex establishes a total order among executions of critical sections.\nEverything written by one execution of a critical session will be seen by the\nfollowing ones.\n\nA mutex can be seen as a special case of a semaphore, or as a special case of a\nread-write lock. However, the mutex is faster than either, because of a far\nsimpler implementation.\n\n## Event\n\nAn event primitive can be used to wait for some sort of event. Multiple threads\ncan wait for the event by calling `event.wait()`, and multiple threads can\nconcurrently announce the event by calling `event.notify()`. Calling\n`event.wait()` is fast if the event has already been announced, and similarly\ncalling `event.notify()` is fast if nobody's waiting.\n\nNote that many other event implementations additionally provide an\n`event.clear()` method that \"clears\" the event, so that it can be waited for and\nannounced again. This implementation does not, because this usage pattern is\ninherently racy. Use a condition variable for this.\n\nEverything that has happened before the event was announced for the first time\nwill be seen after `event.wait()` call returns.\n\nBoth barriers and condition variables (and also semaphores) can all be regarded\nas generalizations of events.\n\n## Once\n\nA once primitive can be used to execute a critical session only once, even if\nmultiple threads reach the critical section at the same time. This is typically\nused for lazy initialization. The same effect can be trivially achieved by using\na mutex and a boolean variable, but a once primitive is more efficient. In\nparticular, once the critical section has been executed, subsequent calls of\n`once.perform()` are very fast and don't block each other.\n\nThe once primitive establishes a happens-before relationship between the\ncompletion of the critical session and the `once.perform()` calls returning.\nAfter the call returns, the calling thread will see everything written by the\ncritical session.\n\n## Barrier\n\nA barrier is similar to an event, except a barrier waits for *several* threads\nto \"check in\" (and unlike with an event, it's invalid to over-check-in). You\nhave to declare the number of required check-ins when constructing a barrier.\n\nHere's a common way to use a barrier:\n\n```cpp\nBarrier barrier { num_threads };\nfor (size_t i = 0; i \u003c num_threads; i++) {\n    spawn_thread([\u0026] {\n        some_work_1();\n        barrier.check_in();\n        some_work_2();\n        barrier.wait();\n        some_work_3();\n    });\n}\n```\n\nHere, `some_work_3()` will see the results of `some_work_1()` calls (but not of\n`some_work_2()` calls!) made by all the threads in the group. It's highly\nbeneficial to perform some (useful) work in between calling `barrier.check_in()`\nand `barrier.wait()`, because the `barrier.check_in()` call will be very fast if\nnobody is waiting, and similarly `barrier.wait()` will be very fast if everyone\nhas already checked in (so there's no need to actually wait). In case you have\nabsolutely nothing to do there, and do call `barrier.wait()` immediately after\ncalling `barrier.check_in()`, there's a combined `barrier.check_in_and_wait()`\ncall that is slightly faster than doing the two calls separately.\n\nNote that waiting can be done from any thread, not just those that have checked\nin. For example, a barrier can also be used in the following manner:\n\n```cpp\nBarrier barrier { num_threads };\nfor (size_t i = 0; i \u003c num_threads; i++) {\n    spawn_thread([\u0026] {\n        some_work_1();\n        barrier.check_in();\n    });\n}\nbarrier.wait();\nsome_work_2();\n```\n\nHere, `some_work_2()` will see the results of `some_work_1()` of all threads.\n\n## Readers-writer lock\n\nA readers-writer lock is a generalization of a mutex. Either a single writer or\nmultiple readers can hold the lock at one time; readers do not block each other.\nIf most of the accesses are reads with only some occasional writes, it's\nprobably a good idea to use a readers-writer lock instead of a plain mutex; all\nreader operations are very fast if no writers are involved. That being said, the\nimplementation is more complex and a bit less efficient than that of a plain\nmutex.\n\nThere are two caveats to using a readers-writer lock (at least as implemented\nhere):\n* The lock prefers writers over readers; if there are writers waiting to acquire\n  the lock, newly arriving readers will not be allowed to take the lock. This\n  means that the readers can't completely starve the writers, and everyone will\n  get the lock eventually. But this also means that slow readers can actually\n  block other readers.\n* It is not possible to \"upgrade\" a held lock from reading to writing (meaning\n  lock the lock for writing if you already hold it for reading). To see why,\n  consider what would happen if that was allowed, and two readers both tried to\n  upgrade the lock at the same time. Instead, you should drop the reading lock,\n  and then re-acquire it for writing; and be prepared that something might have\n  changed while you were not holding the lock.\n\n  That being said, you can call `lock.try_upgrade()` to try and upgrade the lock\n  from reading to writing, which is slightly faster than dropping and\n  re-acquring the lock and, if successfull, guarantees that the lock has been\n  held \"at least for reading\" the whole time. It is also always possible to\n  downgrade a lock from writing to reading.\n\nA readers-writer lock establishes a happens-before relationship between a\nwriter unlocking the lock and a reader or a writer subsequently locking the\nlock, as well as between a reader or a writer unlocking the lock and a writer\nsubsequently locking the lock. It does not, however, establish any\nhappens-before relationships between several readers locking and unlocking the\nlock if there's no writer locking the lock in between them.\n\nHere's an example of using a readers-writer lock to protect a rarely updated\nvalue:\n\n```cpp\nstd::string hostname;\nRWLock lock;\n\nstd::string get_hostname() {\n    lock.lock_read();\n    std::string hostname_copy = hostname;\n    lock.unlock_read();\n    return hostname_copy;\n}\n\nvoid set_hostname(std::string new_hostname) {\n    lock.lock_write();\n    hostname = new_hostname;\n    lock.unlock_write();\n}\n```\n\nAnd here's an example of how to properly handle the case where you would want to\nupgrade the lock:\n\n```cpp\nSomeCache cache;\nRWLock lock;\n\nvalue_t get(key_t key) {\n    lock.lock_read();\n    value_t value = cache[key];\n    // If there's no value and we're\n    // the only reader, proceed to\n    // calculating the value.\n    if (!value \u0026\u0026 lock.try_upgrade()) {\n        goto calculate;\n    }\n    lock.unlock_read();\n\n    if (value) {\n        return value;\n    }\n\n    lock.lock_write();\n    // Somebody might have already\n    // put it there, so recheck.\n    value = cache[key];\n    if (!value) {\ncalculate:\n        value = cache[key] = calculate_value(key);\n    }\n    lock.unlock_write();\n    return value;\n}\n```\n\n## Semaphore\n\nA semaphore is a different generalization of a mutex. A semaphore keeps an\ninternal counter which can be incremented with `semaphore.up()` and decremented\nwith `semaphore.down()`. The counter cannot become negative, so if the counter\nis zero when `semaphore.down()` is called, it blocks until somebody else calls\n`semaphore.up()` and thus allows the `semaphore.down()` call to proceed.\n\nIf the initial value of the counter is one, a semaphore functions like a mutex,\nwith `semaphore.down()` acting like `mutex.lock()`, and `semaphore.up()` acting\nlike `mutex.unlock()`. If the initial value is zero, a semaphore functions like\nan event primitive (only supporting a single thread that waits and a single\nthread that announces the event). A semaphore with a higher initial value can be\nused to model a shared resource with a limited number of access slots.\n\nA semaphore is by far the most complex of the primitives implemented here. You\nshould probably use a specialized synchronization primitive such as a mutex or\nan event instead. That being said, both `semaphore.down()` and `semaphore.up()`\nshould be fast as long as no thread has to wait.\n\nIt's not very clear what happens-before relationships exactly a semaphore\nestablishes, but it should, at least, establish a happens-before relationship\nbetween someone incrementing the counter from zero and someone subsequently\ndecrementing it. This implementation, in addition to that, establishes a\nhappens-before relationship between anyone incrementing the counter (not\nnecessarily from zero) and someone subsequently decrementing it.\n\n## Condition variable\n\nA condition variable can be seen as another generalization of the event\nprimitive. Unlike an event, a condition variable can be used to properly track a\ncondition that changes between being true and false, repeatedly. Also unlike an\nevent, a condition variable doesn't store whether the condition is true or false\ninternally; instead, it's up to the user to somehow store and check the\ncondition. The only requirement, as far as the condition variable is concerned,\nis that the condition must only be checked and modified while holding a mutex;\nand the condition variable must be given access to this mutex.\n\nThe `condvar.wait()` method must be called with the mutex held; it atomically\nunlocks the mutex, and starts waiting for some other thread to notify this\nthread (spurious wake-ups are allowed); after completing the wait it locks the\nmutex again and returns to the caller with the mutex held, the caller should\nthen inspect the state. Waiting threads can be woken up with either\n`condvar.notify_one()` (typically used for conditions that should be handled or\nconsumed by waiting threads, such as a queue of events that should be drained by\nworker threads) or `condvar.notify_all()` (typically used for announcing events\nthat the waiting threads are not expected to consume). Neither\n`condvar.notify_one()` nor `condvar.notify_all()` must be called with the mutex\nheld; although it's correct to call them while either holding or not holding the\nmutex, it's much faster to call them without holding the mutex.\n\nA condition variable itself does not establish any happens-before relationships.\nHowever, it must be used with a mutex that does establish such relationships.\n\n# Building\n\nLet's write synchronization primitives is built with\n[Meson](https://mesonbuild.com/). Here are a few useful build configurations:\n\n* The default debug build:\n  ```\n  $ meson build\n  ```\n* Debug build with ThreadSanitizer enabled, to check for data races:\n  ```\n  $ meson build -Db_sanitize=thread\n  ```\n* Release build with static linking, optimizations, and disabled assertions:\n  ```\n  $ meson build -Dbuildtype=release -Ddefault_library=static -Db_lto=true -Db_ndebug=true\n  ```\n\nUse `ninja` to build and `ninja test` to run the tests.\n\n# Resources\n\n* [`futex(2)`](https://man7.org/linux/man-pages/man2/futex.2.html) and\n  [`futex(7)`](https://man7.org/linux/man-pages/man7/futex.7.html) manual pages\n* [Futexes Are Tricky](https://akkadia.org/drepper/futex.pdf)\n* [Locking in WebKit](https://webkit.org/blog/6161/locking-in-webkit/)\n* [`std::memory_order`](https://en.cppreference.com/w/cpp/atomic/memory_order)\n* [Mutexes Are Faster Than Spinlocks](https://matklad.github.io/2020/01/04/mutexes-are-faster-than-spinlocks.html)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbugaevc%2Flets-write-sync-primitives","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbugaevc%2Flets-write-sync-primitives","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbugaevc%2Flets-write-sync-primitives/lists"}