{"id":19112134,"url":"https://github.com/ocaml-multicore/dscheck","last_synced_at":"2025-04-16T14:36:48.972Z","repository":{"id":41998982,"uuid":"422352035","full_name":"ocaml-multicore/dscheck","owner":"ocaml-multicore","description":"Experimental model checker for testing concurrent algorithms","archived":false,"fork":false,"pushed_at":"2024-12-17T10:09:25.000Z","size":135,"stargazers_count":33,"open_issues_count":10,"forks_count":5,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-03-29T05:23:24.505Z","etag":null,"topics":["model-checker","ocaml"],"latest_commit_sha":null,"homepage":"","language":"OCaml","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"isc","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ocaml-multicore.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-28T20:50:54.000Z","updated_at":"2025-03-02T16:34:45.000Z","dependencies_parsed_at":"2024-06-25T15:19:24.989Z","dependency_job_id":"5f0af0be-2671-4e7f-8d80-8d12325e937c","html_url":"https://github.com/ocaml-multicore/dscheck","commit_stats":null,"previous_names":["sadiqj/dscheck"],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ocaml-multicore%2Fdscheck","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ocaml-multicore%2Fdscheck/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ocaml-multicore%2Fdscheck/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ocaml-multicore%2Fdscheck/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ocaml-multicore","download_url":"https://codeload.github.com/ocaml-multicore/dscheck/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249250895,"owners_count":21237963,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["model-checker","ocaml"],"created_at":"2024-11-09T04:31:44.578Z","updated_at":"2025-04-16T14:36:48.949Z","avatar_url":"https://github.com/ocaml-multicore.png","language":"OCaml","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DSCheck — tool for testing concurrent OCaml programs\n\nExperimental model checker for testing concurrent programs. DSCheck explores\ninterleavings of a user-provided program and helps ensure that its invariants\nare maintained regardless of scheduling decisions.\n\n# Contents\n\n1. [Motivation](#motivation)\n2. [Get DSCheck](#get-dscheck)\n3. [Usage](#usage)\n4. [Development](#development)\n5. [Contributions](#contributions)\n6. [References](#references)\n\n# Motivation\n\nAs experience shows, fine-grained concurrency is notoriously challenging to get\nright.\n\n- As the program grows, the number of possible interleavings increases\n  exponentially and quickly becomes too large for a human to reasonably\n  validate. That's exacerbated by the fact that different interleavings often\n  lead to the right outcome for different reasons.\n\n- Certain concurrency bugs manifest rarely and are borderline impossible to\n  reproduce. They may occur under specific system conditions only and disappear\n  when a debugging system is attached.\n\nDSCheck helps manage this complexity by letting us instrument a multicore test\nto explore relevant interleavings. Thus ensuring that all terminal states are\nvalid and no edge cases have been missed.\n\n# Get DSCheck\n\nDscheck can be installed from `opam`: `opam install dscheck`.\n\n# Usage\n\nSample usage on [naive counter](tests/test_naive_counter.ml) is shown below.\n\n```ocaml\nmodule Atomic = Dscheck.TracedAtomic\n(* the test needs to use DSCheck's atomic module *)\n\nlet test_counter () =\n  let counter = Atomic.make 0 in\n  let incr () = Atomic.set counter (Atomic.get counter + 1) in\n  Atomic.spawn incr;\n  Atomic.spawn incr;\n  Atomic.final (fun () -\u003e Atomic.check (fun () -\u003e Atomic.get counter == 2))\n```\n\nThe test spawns two domains (`Atomic.spawn`), each trying to increase the\ncounter. The assertion at the end validates that counter has the expected value\n(`Atomic.final`). This is a classic example of a race condition with two threads\ntrying to perform read-modify-write operation without synchronisation. In\neffect, there is a risk of losing one of the updates. DSCheck finds and reports\nthe offending interleaving to the user:\n\n```\nFound assertion violation at run 2:\n\nsequence 2\n----------------------------------------\nP0                      P1\n----------------------------------------\nstart\nget a\n                        start\n                        get a\nset a\n                        set a\n----------------------------------------\n```\n\n## Validation Soundness\n\nFor model-checking to be sound, tested program must meet the following\nconditions:\n\n- Determinism. Otherwise DSCheck may encounter errors or (more dangerously)\n  terminate successfuly without exploring all traces.\n- Tested programs cannot have races between non-atomic variables. DSCheck does\n  not explore different _behaviours_, (e.g. a non-atomic read may see the most\n  recently written value or a number of stale ones).\n- Domains can communicate through atomic variables only. Validation including\n  higher-level synchronisation primitives is possible but constitutes future\n  work.\n- Tested programs have to be at least lock-free. If any thread cannot finish on\n  its own, DSCheck will explore its transitions ad infinitum. As some remedy,\n  the space of traces can be covered partially by forcing the test to be\n  lock-free. For example, spinlock can be modified to fail explicitely once some\n  artifical limit is reached.\n\n## Validation Logic\n\nAs highlighted in the [Motivation](#motivation), the number of interleavings\ntends to grow exponentialy with size of the program and the number of threads.\nIt follows that the interleavings of even small programs are not just impossible\nfor humans to walk through but also incomputable in reasonable time.\n\nThe key advance that made DSCheck-like model-checkers possible is the emergence\nof dynamic partial-order reduction (DPOR) methods. The application to\nmodel-checking stems from the observation that in real-world programs many\ninterleavings are equivalent and if at least one is covered, so is the entire\nequivalence class. More formally, a particular interleaving is a total order\ninduced by a causal relation between events of different domains\n(partial-order). DSCheck aims to cover exactly one interleaving per trace.\n\nWhile the DPOR algorithms and formalism tend to be quite involved, the emergent\nbehavior is intuitive. Consider the following program:\n\n```ocaml\nlet a = Atomic.make 0 in\nlet b = Atomic.make 0 in\n\n(* Domain P *)\nDomain.spawn (fun () -\u003e\n    Atomic.set a 1;\n    Atomic.set b 1;\n    ) |\u003e ignore;\n\n(* Domain Q *)\nDomain.spawn (fun () -\u003e\n    Atomic.set a 2;\n    ) |\u003e ignore\n```\n\nThere are three possible interleavings: `P.P.Q`, `P.Q.P`, `Q.P.P`. Clearly, the\nordering between _Q_ and the second step of _P_ does not matter. Thus the\nexecution sequences `P.P.Q` and `P.Q.P` are different realizations of the same\ntrace.\n\nDPOR skips the redundant execution sequences and provides an exponential\nimprovement over the naive search. That in turn significantly expands the\nuniverse of checkable programs and makes this enumeration useful.\n\n### Reads\n\nThe leading example showcases reduction of search space based on accesses to\ndisjoint locations. A similar approach can be taken for accesses on overlapping\nlocations that do not conflict. If _P_ and _Q_ had only read memory, there would\nhave been no race between them, in turn requiring DSCheck to explore only a\nsingle interleaving.\n\n```ocaml\nlet a = Atomic.make 0 in\n\n(* P *)\nAtomic.spawn (fun () -\u003e\n    ignore (Atomic.compare_and_set a 1 2));\n\n(* Q *)\nAtomic.spawn (fun () -\u003e\n    ignore (Atomic.compare_and_set a 2 3));\n```\n\nCompare and set is a read-write operation. In this particular case, however,\nboth CASes fail and leave the initial value untouched. DSCheck recognizes such\nspecial cases and avoids exploration of redundant interleavings.\n\n### Causal Ordering\n\n```ocaml\nlet a = Atomic.make 0 in\nlet b = Atomic.make 0 in\n\nAtomic.spawn (fun () -\u003e\n    Atomic.set a 1; (* P *)\n    Atomic.set b 1; (* Q *));\n\nAtomic.spawn (fun () -\u003e\n    Atomic.set a 2;\n    Atomic.set b 2);\n```\n\nIn more general sense, DSCheck tracks causal order between events of different\ndomains and tries to schedule sequences reversing it, where possible. At times,\nit may be counterintuitive. In the example above DSCheck explores 4\ninterleavings. What if we swap the lines `P` and `Q`?\n\n# Development\n\n## Design Notes\n\nDSCheck sees test as a graph, where edge is a step in execution of one of the\nactive domains. The validation involves traversing the graph in a depth-first\nfashion and running user assertions in the leaf states. For example, the graph\nfor [Example Reads](#reads) looks as follows:\n\n```\nStart (a=0) ---\u003e P: CompareAndSet a 1 2 ---\u003e Q: CompareAndSet a 2 3\n|                                                          |\n\\/                                                         \\/\nQ: CompareAndSet a 2 3 ---\u003e P: CompareAndSet a 1 2 ---\u003e Termination (a=0)\n```\n\nDSCheck begins by running the main function of the test, which registers domains\nP and Q. Then, the exploration starts by taking a step into execution of either\ndomains and follows with one step of the other, thus arriving at the terminal\nstate. In the case above both paths are equivalent (hence shared leaf node), but\nit naturally does not have to be the case, e.g. variable `a` could be\ninitialized with `1` making both paths unique traces. Consider the following.\n\n```\nStart (a=1) --\u003e P: CompareAndSet a 1 2 --\u003e Q: CompareAndSet a 2 3\n|                                                       |\n|                                                       \\/\n|                                                       Termination (a=3)\n\\/\nQ: CompareAndSet a 2 3 ---\u003e P: CompareAndSet a 1 2 ---\u003e Termination (a=2)\n```\n\n_P_ and _Q_ are dependent operations and the two interleavings lead to different\noutcomes. Thus DSCheck has to explore both. Skimming over the details, DSCheck\nbegins by exploring the first branch, `P.Q`, to the end and schedules new\ntransitions on the way there. Here, it will notice that _P_ and _Q_ are\npotentially racing operations (since it's a read-write pair on the same\nlocation) and schedule transition _Q_ after start. We call that a _race\nreversal_.\n\nThe DFS exploration may look tricky at first. The key idea to realize is that at\nany step in the sequence, model-checker aims to explore all the traces produced\nby remaining events. For some events _c_-_z_ and execution sequence `x.c.w`,\nDSCheck has to explore all traces produced by the remaining events. If _g_ and\n_h_ are dependent and _A_, _B_ some sequences, it has to explore at least\n`x.c.w.A.g.h.B` and `x.c.w.A.h.g.B` (or equivalent). It does so by choosing a\nrandom path and continuously scheduling sequences reversing encountered races.\n\nThe key optimization techniques identify transitions leading to sequences, which\nare equal to some already explored ones.\n\n- Persistent/source sets. A DPOR algorithm has to execute at least all the\n  transition in the source set at a particular state to ensure that all revelant\n  interleavings are explored. Once such a set has been explored, there's no need\n  to explore any other transitions from that state. See\n  [Comparing Source Sets and Persistent Sets for Partial Order Reduction](https://user.it.uu.se/~bengt/Papers/Full/kimfest17.pdf).\n- Sleep sets. At times, we can suspend exploration of a transition until a\n  relevant event occurs. For example, if _x_ and _c_ are independent and we have\n  explored `x.c`, there's no need to explore `c.x` unless some other event\n  (dependent with _x_) occurs.\n\n## Testing\n\nThe formalism underpinning DPOR leads to a fairly straightforward testing setup.\nFor any two execution sentences, they belong to the same trace if reordering of\ncommutative operations leads from one to the other. For example, operations\n`P:(read a), Q:(read b)` clearly commute (since their reordering leads to the\nsame outcome), while `P:(write a), Q:(write a)` may not. Thus, since one\nsequence can be transformed into another, they are realizations of the same\ntrace.\n\nWhenever DSCheck explores multiple sequences for a single trace, it constitutes\nan inefficiency. Conversely, if a change to the DPOR logic leaves some traces\nwithout unexplored, it is incorrect. Note, the assignment of execution sequences\nto traces uses the definition of dependency relation. If the change improves\ndependency relation rather than DPOR, we would expect to see new pairs of\nequivalent execution sequences and thus groups of multiple traces collapsing\ninto one.\n\nTo facilitate the testing, DSCheck includes a random test generator and a trace\ndeduplication mechanism. For any proposed change, we can generate a large number\nof tests and ensure that the same traces have been explored. Furthermore, if\nreference implementation is suspicious itself or too inefficient, the proposed\nchange can be asserted to explore a superset of traces explored by a random\nsearch.\n\nThe trace deduplication mechanism took a few iterations to get right. Generally,\nthe approach involving extracting traces (happens-before) from sequences and\ncomparing those turned out to be more robust than the attempts to bring the\nexecution sequences into some normal form and compare directly.\n\n## Literature Glossary\n\nLiterature defines a lot of new term. While the rigour is important for\nimplementation, here's brief explanation in the context of DSCheck.\n\n- Event. Modification of shared state or communication between threads.\n- Transition. One step forward into execution of a particular domain. That\n  includes the atomic operation it suspended on and all non-atomic operation\n  precedent the next atomic call. In the case of DSCheck one transition is a\n  single event.\n- Execution sequence. A particular interleaving of a program.\n- Trace. A class of equivalent execution sequences. An optimal DPOR explores\n  exactly one execution sequence per trace.\n- Dependency relationship. A binary relation from two transitions to boolean\n  indicating whether events are dependent. If two adjacent events are\n  independent, then they commute, i.e. swapping two adjacent independent events\n  produces a new execution sequence that constitutes the same trace. Thus, DPOR\n  focuses on reordering pairs of dependent events.\n- Happens-before relationship. A superset of dependency relationship, which\n  includes program order.\n- Reversible race. Two events executed by different domains, which are\n  hb-related directly and not transitively. The latter lets us avoid some\n  redundant exploration.\n- Maximal trace. Trace that terminates all domains.\n- Enabling/disabling transitions. Some transitions may enable or disable\n  transitions in other domains, e.g. domain A taking a lock renders any other\n  acquisition attempts disabled. Currently not implemented but worth mentioning\n  as it is often used in the literature.\n\n## Future Work\n\n- DSCheck was written with validation of lock-free structures in mind and\n  handles single-word atomic operations only. There is a wealth of other\n  thread-safe communication and synchroniation methods and, in principle, we\n  should be able to validate all of them.\n  - Non-atomic accesses. OCaml's memory model gives a precise semantics to\n    concurrent non-atomic accesses. These could be verified with DSCheck as\n    well. The key part seems to be the possibility of reading a stale value.\n    Thus, DSCheck should maintain the list of values that may be read from any\n    non-atomic location and ensure that program works in all cases. See\n    [CDSCHECKER: Checking Concurrent Data Structures Written with C/C++ Atomics](http://plrg.eecs.uci.edu/publications/c11modelcheck.pdf)\n    for more details.\n  - High-level primitives, e.g. lock, channel, join. Currently, DSCheck cannot\n    terminate on any program weaker than lock-free. Blocking primitives need\n    explicit support. Section 5\n    [Source Sets: A Foundation for Optimal Dynamic Partial Order Reduction](https://user.it.uu.se/~parosh/publications/papers/jacm17.pdf)\n    includes a modification of Source- and Optimal-DPOR allowing blocking\n    operations.\n  - Kcas, to validate programs using\n    [kcas](https://github.com/ocaml-multicore/kcas) efficiently. That fits into\n    Source-DPOR and the existing implementation quite naturally as operations\n    with multiple happens-before and happens-after dependencies.\n- Further performance improvements. In particular implementation of wake-up\n  trees to eliminate sleep-set blocking or a leap towards\n  [TruST](https://plv.mpi-sws.org/genmc/popl2022-trust.pdf).\n- Support nested domain spawns and concurrent logic in the main test function.\n  DSCheck lets us spawn n domains in the main test function and validate their\n  interleavings, which is enough to test lock-free algorithms, but many\n  real-world programs are more complicated.\n\n## Reference Implementations\n\n- [CDSChecker](https://github.com/computersforpeace/model-checker) for the\n  original DPOR implementation.\n- [Nidhugg](https://github.com/nidhugg/nidhugg/) for Source-DPOR.\n\n### Nidhugg\n\nNidhugg may come helpful for troubleshooting DSCheck. It's based on the same\npublication and, although aimed at C/C++ programs, it does have sequential\nconsistency mode as well. Install it as per instructions in the repository.\n\nConsider the following program. It spawns two threads, each trying to increment\n`a` with CAS. Note making the variable `zero` local reduces accesses to shared\nmemory and lowers the amount of noise.\n\n```C\n#include \u003cpthread.h\u003e\n#include \u003cstdatomic.h\u003e\n\natomic_int a;\n\nstatic void *t(void *arg)\n{\n  int zero = 0;\n  atomic_compare_exchange_strong(\u0026a, \u0026zero, 1);\n  return NULL;\n}\n\nint main()\n{\n  pthread_t tid[2];\n  atomic_init(\u0026a, 0);\n\n  pthread_create(\u0026tid[0], NULL, t, (void *)(uintptr_t)0);\n  pthread_create(\u0026tid[1], NULL, t, (void *)(uintptr_t)0);\n\n  pthread_join(tid[0], NULL);\n  pthread_join(tid[1], NULL);\n\n  return 0;\n}\n```\n\nSave the program as `test.c` and run using the following command:\n`nidhuggc -- --debug-print-on-reset --sc --source test.c 2\u003e\u00261 | rg \"(Cmp|=)\"`.\n\n```\n === TSOTraceBuilder reset ===\n      (\u003c0.0\u003e,1-4)     CmpXhg(Global(1)(4),0x0,0x1)     SLP:{} - (\u003c0.1\u003e,1)\n          (\u003c0.1\u003e,1-5) CmpXhgFail(Global(1)(4),0x0,0x1) SLP:{}\n =============================\n === TSOTraceBuilder reset ===\n          (\u003c0.1\u003e,1)   CmpXhg(Global(1)(4),0x0,0x1)     SLP:{\u003c0.0\u003e}\n      (\u003c0.0\u003e,1-5)     CmpXhgFail(Global(1)(4),0x0,0x1) SLP:{}\n =============================\n```\n\nThe output shows visited interleavings. It also displays contents of the `sleep`\nand `backtrack` sets at any stage. To get a better understanding of how nidhugg\ntakes particular decision consider adding log statements to\n`TSOTraceBuilder::race_detect`.\n\n# Contributions\n\nContributions are appreciated! Please create issues/PRs to this repo.\n\n# References\n\n- [Source Sets: A Foundation for Optimal Dynamic Partial Order Reduction](https://user.it.uu.se/~parosh/publications/papers/jacm17.pdf)\n- [CDSCHECKER: Checking Concurrent Data Structures Written with C/C++ Atomics](http://plrg.eecs.uci.edu/publications/c11modelcheck.pdf)\n- [Dynamic Partial-Order Reduction for Model Checking Software](https://users.soe.ucsc.edu/~cormac/papers/popl05.pdf)\n- [Comparing Source Sets and Persistent Sets for Partial Order Reduction](https://user.it.uu.se/~bengt/Papers/Full/kimfest17.pdf)\n- [Truly Stateless, Optimal Dynamic Partial Order Reduction](https://plv.mpi-sws.org/genmc/popl2022-trust.pdf)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focaml-multicore%2Fdscheck","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Focaml-multicore%2Fdscheck","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focaml-multicore%2Fdscheck/lists"}