{"id":19148322,"url":"https://github.com/nicowilliams/ctp","last_synced_at":"2025-09-09T01:30:58.723Z","repository":{"id":142162765,"uuid":"49340180","full_name":"nicowilliams/ctp","owner":"nicowilliams","description":"C Thread Primitives","archived":false,"fork":false,"pushed_at":"2023-09-19T21:04:11.000Z","size":214,"stargazers_count":7,"open_issues_count":6,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-09T11:12:16.680Z","etag":null,"topics":["c","concurrent-data-structure","concurrent-data-structures","lock-free","lock-less","lockless","rcu"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nicowilliams.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-01-09T20:30:04.000Z","updated_at":"2025-01-22T23:11:46.000Z","dependencies_parsed_at":null,"dependency_job_id":"cbb00d36-7d64-470a-9dc6-7c3edce75ef4","html_url":"https://github.com/nicowilliams/ctp","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/nicowilliams/ctp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicowilliams%2Fctp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicowilliams%2Fctp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicowilliams%2Fctp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicowilliams%2Fctp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nicowilliams","download_url":"https://codeload.github.com/nicowilliams/ctp/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicowilliams%2Fctp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274232031,"owners_count":25245855,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-08T02:00:09.813Z","response_time":121,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","concurrent-data-structure","concurrent-data-structures","lock-free","lock-less","lockless","rcu"],"created_at":"2024-11-09T07:53:43.130Z","updated_at":"2025-09-09T01:30:58.455Z","avatar_url":"https://github.com/nicowilliams.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003e NOTE: This repo is mirrored at https://github.com/cryptonector/ctp and https://github.com/nicowilliams/ctp\n\n# Q: What is it?  A: A user-land-RCU-like API for C, permissively licensed\n\nThis repository's only current feature is a read-copy-update (RCU) like,\nthread-safe variable (TSV) for C.  More C thread-primitives may be added\nin the future, thus the repository's name being quite generic.\n\nA TSV lets readers safely keep using a value read from the TSV until\nthey read the next value.  Memory management is automatic: values are\nautomatically destroyed when the last reference to a value is released\nwhether explicitly, or implicitly at the next read, or when a reader\nthread exits.  References can also be relinquished manually.  Reads are\n_lock-less_ and fast, and _never block writers_.  Writers are serialized\nbut otherwise interact with readers without locks, thus writes *do not\nblock reads*.\n\n\u003e In one of the two implementations included readers only execute atomic\n\u003e memory loads and stores, though they loop over that when racing with a\n\u003e writer.  As aligned loads and stores are typically atomic on modern\n\u003e archictures, this means no expensive atomic operations are needed --\n\u003e not even a single atomic increment or decrement.\n\nThis is not unlike a Clojure `ref`, or like a Haskell `msync`.  It's\nalso similar to RCU, but unlike RCU, this has a very simple API with\nnothing like `synchronize_rcu()`, and doesn't require any cross-CPU\ncalls nor the ability to make CPUs/threads run, and it has no\napplication-visible concept of critical sections, therefore it works in\nuser-land with no special kernel support.\n\n - One thread needs to create the variable (as many as desired) once by\n   calling `thread_safe_var_init()` and providing a value destructor.\n\n   \u003e There is currently no static initializer, though one could be\n   \u003e added.  One would typically do this early in `main()` or in a\n   \u003e `pthread_once()` initializer.\n\n - Most threads only ever need to call `thread_safe_var_get()`.\n\n   \u003e Reader threads _may_ also call `thread_safe_var_release()` to allow\n   \u003e a value to be freed sooner than otherwise.\n\n - One or more threads may call `thread_safe_var_set()` to set new\n   values on the TSVs.\n\nThe API is:\n\n```C\n    typedef struct thread_safe_var *thread_safe_var;    /* TSV */\n    typedef void (*thread_safe_var_dtor_f)(void *);     /* Value destructor */\n\n    /* Initialize a TSV with a given value destructor */\n    int  thread_safe_var_init(thread_safe_var *, thread_safe_var_dtor_f);\n\n    /* Get the current value of the TSV and a version number for it */\n    int  thread_safe_var_get(thread_safe_var, void **, uint64_t *);\n\n    /* Set a new value on the TSV (outputs the new version) */\n    int  thread_safe_var_set(thread_safe_var, void *, uint64_t *);\n\n    /* Optional functions follow */\n\n    /* Destroy a TSV */\n    void thread_safe_var_destroy(thread_safe_var);\n\n    /* Release the reference to the last value read by this thread from the TSV */\n    void thread_safe_var_release(thread_safe_var);\n\n    /* Wait for a value to be set on the TSV */\n    int  thread_safe_var_wait(thread_safe_var);\n```\n\nValue version numbers increase monotonically when values are set.\n\n# Why?  Because read-write locks are terrible\n\nSo you have rarely-changing typically-global data (e.g., loaded\nconfiguration, plugin lists, ...), and you have many threads that read\nthis, and you want reads to be fast.  Worker threads need stable\nconfiguration/whatever while doing work, then when they pick up another\ntask they can get a newer configuration if there is one.\n\nHow would one implement that?\n\nA safe answer is: read-write locks around reading/writing the variable,\nand reference count the data.\n\nBut read-write locks are inherently bad: readers either can starve\nwriters or can be blocked by writers.  Either way read-write locks are a\nperformance problem.\n\nA \"thread-safe variable\", on the other hand, is always fast to read,\neven when there's an active writer, and reading does not starve writers.\n\n# How?\n\nTwo implementations are included at this time.\n\nThe two implementations have slightly different characteristics.\n\n - One implementation (\"slot pair\") has O(1) lock-less and spin-less\n   reads and O(1) serialized writes.\n\n   But readers call free() and the value destructor, and, sometimes have\n   to signal a potentially-waiting writer, which involves acquiring a\n   mutex -- a blocking operation, yes, though on an uncontended\n   resource, so not really blocking.\n\n   This implementation has a pair of slots, one containing the \"current\"\n   value and one containing the \"previous\"/\"next\" value.  Writers make the\n   \"previous\" slot into the next \"current\" slot, and readers read from\n   whichever slot appears to be the current slot.  Values are wrapped\n   with a wrapper that includes a reference count, and they are released\n   when the reference count drops to zero.\n\n   The trick is that writers will wait until the number of active\n   readers of the previous slot is zero.  Thus the last reader of a\n   previous slot must signal a potentially-awaiting writer (which\n   requires taking a lock that the awaiting writer should have\n   relinquished in order to wait).  Thus reading is mostly lock-less and\n   never blocks on contended resources.\n\n   Values are reference counted and so released immediately when the\n   last reference is dropped.\n\n - The other implementation (\"slot list\") has O(1) lock-less reads, with\n   unreferenced values garbage collected by serialized writers in `O(N\n   log(M))` where N is the maximum number of live threads that have read\n   the variable and M is the number of values that have been set and\n   possibly released).  If writes are infrequent and readers make use of\n   `thread_safe_var_release()`, then garbage collection is `O(1)`.\n   \n   Readers never call the allocator after the first read in any given\n   thread, and writers never call the allocator while holding the writer\n   lock.\n\n   Readers have to loop over their fast path, a loop that could run\n   indefinitely if there were infinitely many higher-priority writers\n   who starve the reader of CPU time.  To help avoid this, writers yield\n   the CPU before relinquishing the write lock, thus ensuring that some\n   readers will have the CPU ahead of any awaiting higher-priority\n   writers.\n\n   This implementation has a list of referenced values, with the head of\n   the list always being the current one, and a list of \"subscription\"\n   slots, one slot per-reader thread.  Readers allocate a slot on first\n   read, and thence copy the head of the values list to their slots.\n   Writers have to perform garbage collection on the list of referenced\n   values.\n\n   Subscription slot allocation is lock-less.  Indeed, everything is\n   lock-less in the reader, and unlike the slot-pair implementation\n   there is no case where the reader has to acquire a lock to signal a\n   writer.\n\n   Values are released at the first write after the last reference is\n   dropped, as values are garbage collected by writers.\n\nThe first implementation written was the slot-pair implementation.  The\nslot-list design is much easier to understand on the read-side, but it\nis significantly more complex on the write-side.\n\n# Requirements\n\nC89, POSIX threads (though TSV should be portable to Windows),\ncompilers with atomics intrinsics and/or atomics libraries.\n\nIn the future this may be upgraded to a C99 or even C11 requirement.\n\n# Testing\n\nA test program is included that hammers the implementation.  Run it in a\nloop, with or without helgrind, TSAN (thread sanitizer), or other thread\nrace checkers, to look for data races.\n\nBoth implementations perform similarly well on the included test.\n\nThe test pits 20 reader threads waiting various small amounts of time\nbetween reads (one not waiting at all), against 4 writer threads waiting\nvarious small amounts of time between writes.  This test found a variety\nof bugs during development.  In both cases writes are, on average, 5x\nslower than reads, and reads are in the ten microseconds range on an old\nlaptop, running under virtualization.\n\n# Performance\n\nOn an old i7 laptop, virtualized, reads on idle thread-safe variables\n(i.e., no writers in sight) take about 15ns.  This is because the fast\npath in both implementations consists of reading a thread-local variable\nand then performing a single acquire-fenced memory read.\n\nOn that same system, when threads write very frequently then reads slow\ndown to about 8us (8000ns).  (But the test had eight times more threads\nthan CPUs, so the cost of context switching is included in that number.)\n\nOn that same system writes on a busy thread-safe variable take about\n50us (50000ns), but non-contending writes on an otherwise idle\nthread-safe variable take about 180ns.\n\nI.e., this is blindingly fast, especially for intended use case\n(infrequent writes).\n\n# Install\n\nClone this repo, select a configuration, and make it.\n\nFor example, to build the slot-pair implementation, use:\n\n    $ make clean slotpair\n\nTo build the slot-list implementation, use:\n\n    $ make CPPDEFS=-DHAVE_SCHED_YIELD clean slotlist\n\nA GNU-like make(1) is needed.\n\nConfiguration variables:\n\n - `COPTFLAG`\n - `CDBGFLAG`\n -`CC`\n - `ATOMICS_BACKEND`\n\n   Values: `-DHAVE___ATOMIC`, `-DHAVE___SYNC`, `-DHAVE_INTEL_INTRINSICS`, `-DHAVE_PTHREAD`, `-DNO_THREADS`\n\n - `TSV_IMPLEMENTATION`\n\n   Values: `-DUSE_TSV_SLOT_PAIR_DESIGN`, `-DUSE_TSV_SUBSCRIPTION_SLOTS_DESIGN`\n\n - `CPPDEFS`\n\n   `CPPDEFS` can also be used to set `NDEBUG`.\n\nA build configuration system is needed, in part to select an atomic\nprimitive backend.\n\nSeveral atomic primitives implementations are available:\n\n - gcc/clang `__atomic`\n - gcc/clang `__sync`\n - Win32 `Interlocked*()`\n - Intel compiler intrinsics (`_Interlocked*()`)\n - global pthread mutex\n - no synchronization (watch the test blow up!)\n\n# Thread Sanitizer (TSAN) Data Race Reports\n\nCurrently TSAN (GCC and Clang both) produces no reports.\n\nIt is trivial to cause TSAN to produce reports of data races by\nreplacing some atomic operations with non-atomic operations, therefore\nit's clearly the case that TSAN works to find many data races.  That is\nnot proof that TSAN would catch all possible data races, or that the\ntests exercise all possible data races.  A formal approach to proving\nthe correctness of TSVs would add value.\n\n# Helgrind Data Race Reports\n\nCurrently Helgrind produces no race reports.  Using the\n`ANNOTATE_HAPPENS_BEFORE()` and `ANNOTATE_HAPPENS_AFTER()` macros in\n`\u003cvalgrind/helgrind.h\u003e` provides Helgrind with the information it needs.\n\nUse `make ... CPPDEFS=-DUSE_HELGRIND CSANFLAG=` to enable those macros.\n\n\u003e It is important to not use TSAN and Helgrind at the same time, as that\n\u003e makes Helgrind crash.  To disable TSAN set `CSANFLAG=` on the `make`\n\u003e command-line.\n\nAs with TSAN, it is trivial to make Helgrind report data races by not\nusing `CPPDEFS=-DUSE_HELGRIND` (but still using `CSANFLAG=`) then\nrunning `helgrind ./t`, which then reports data races at places in the\nsource code that use atomic operations to... avoid data races.\n\n# TODO\n\n - Don't create a pthread-specific variable for each TSV.  Instead share\n   one pthread-specific for all TSVs.  This would require having the\n   pthread-specific values be a pointer to a structure that has a\n   pointer to an array of per-TSV elements, with\n   `thread_safe_var_init()` allocating an array index for each TSV.\n\n   This is important because there can be a maximum number of\n   pthread-specifics and we must not be the cause of exceeding that\n   maximum.\n\n - Add an attributes optional input argument to the init function.\n\n   Callers should be able to express the following preferences:\n\n    - OK for readers to spin, yes or no.        (No  -\u003e slot-pair)\n    - OK for readers to alloc/free, yes or no.  (No  -\u003e slot-list, GC)\n    - Whether version waiting is desired.       (Yes -\u003e slot-pair)\n\n   On conflict give priority to functionality.\n\n - Add a version predicate to set or a variant that takes a version\n   predicate.  (A version predicate -\u003e do not set the new value unless\n   the current value's version number is the given one.)\n\n - Add an API for waiting for values older than some version number to\n   be released?\n  \n   This is tricky for the slot-pair case because we don't have a list of\n   extant values, but we need it in order to determine what is the\n   oldest live version at any time.  Such a list would have to be\n   doubly-linked and updating the double links to remove items would be\n   rather difficult to do lock-less-ly and thread-safely.  We could\n   defer freeing of list elements so that only tail elements can be\n   removed.  When a wrapper's refcount falls to zero, signal any waiters\n   who can then garbage collect the lists with the writer lock held and\n   find the oldest live version.\n\n   For the slot-list case the tricky part is that unreferenced values\n   are only detected when there's a write.  We could add a refcount to\n   the slot-list case, so that when refcounts fall to zero we signal any\n   waiter(s), but because of the way readers find a current value...\n   reference counts could go down to zero then back up, so we must still\n   rely on GC to actually free, and we can only rely on refcounts to\n   signal a waiter.\n\n   It seems we need a list and refcounts, so that the slot-pair and\n   slot-list cases become quite similar, and the only difference\n   ultimately is that slot-list can spin while slot-pair cannot.  Thus\n   we might want to merge the two implementations, with attributes of\n   the variable (see above) determining which codepaths get taken.\n\n   Note too that both implementations can (or do) defer calling of the\n   value destructor so that reading is fast.  This should be an option.\n\n - Add a static initializer?\n\n - Add a better build system.\n\n - Add an implementation using read-write locks to compare performance\n   with.\n\n - Use symbol names that don't conflict with any known atomics libraries\n   (so those can be used as an atomics backend).  Currently the atomics\n   symbols are loosely based on Illumos atomics primitives.\n\n - Support Win32 (perhaps by building a small pthread compatibility\n   library; only mutexes and condition variables are needed).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnicowilliams%2Fctp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnicowilliams%2Fctp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnicowilliams%2Fctp/lists"}