{"id":17862658,"url":"https://github.com/stevana/deterministic-scheduler","last_synced_at":"2025-04-02T21:15:09.646Z","repository":{"id":247743486,"uuid":"826303300","full_name":"stevana/deterministic-scheduler","owner":"stevana","description":"Parallel property-based testing with a deterministic thread scheduler","archived":false,"fork":false,"pushed_at":"2024-08-07T04:41:03.000Z","size":258,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-08T11:34:45.846Z","etag":null,"topics":["linearizability","testing"],"latest_commit_sha":null,"homepage":"https://stevana.github.io/parallel_property-based_testing_with_a_deterministic_thread_scheduler.html","language":"Haskell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stevana.png","metadata":{"files":{"readme":"README-unprocessed.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-09T12:52:41.000Z","updated_at":"2024-08-31T04:56:35.000Z","dependencies_parsed_at":"2024-08-07T08:11:56.922Z","dependency_job_id":"b23ee5d8-5f83-45cf-90df-cece175b1642","html_url":"https://github.com/stevana/deterministic-scheduler","commit_stats":null,"previous_names":["stevana/deterministic-scheduler"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevana%2Fdeterministic-scheduler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevana%2Fdeterministic-scheduler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevana%2Fdeterministic-scheduler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevana%2Fdeterministic-scheduler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stevana","download_url":"https://codeload.github.com/stevana/deterministic-scheduler/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246892848,"owners_count":20850850,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["linearizability","testing"],"created_at":"2024-10-28T08:54:33.109Z","updated_at":"2025-04-02T21:15:09.618Z","avatar_url":"https://github.com/stevana.png","language":"Haskell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Parallel property-based testing with a deterministic thread scheduler\n\nThis post is about how to write tests that can catch race conditions in a\nreproducible way. The approach is programming language agnostic, and should\nwork in most languages that have a decent multi-threaded story. It's a\nwhite-box testing approach, meaning you will have to modify the software under\ntest.\n\n## Background\n\nIn my previous\n[post](https://stevana.github.io/the_sad_state_of_property-based_testing_libraries.html),\nwe had a look at how to mechanically derive parallel tests that can uncover\nrace conditions from a sequential fake[^1]. \n\nOne of the nice things about the approach is that it's a black-box testing\ntechnique, i.e. it doesn't require the user to change the software under test.\nOn the other hand because threads will interleave differently when we\nrerun the tests, there by potentially causing different outcomes. This in turn\ncreates problems for the shrinking of failing test cases[^2].\n\nAs a workaround, I suggested that when a race condition is found in the\nunmodified code, one could swap the shared memory module for one that\nintroduces sleeps around the operations. This creates less non-determinism,\nbecause the jitter of each operation will have less of an impact, and therefore\nhelps shrinking. This isn't a satisfactory solution, of course, and I left a to\ndo item to implement a deterministic scheduler, like the authors do in the\n[paper](https://www.cse.chalmers.se/~nicsma/papers/finding-race-conditions.pdf)\nthat first introduced parallel property-based testing.\n\nThe idea of the deterministic scheduler is that it should be possible to rerun\na multi-threaded program and get exactly the same interleaving of threads each\ntime.\n\nThe deterministic scheduler from the above mentioned paper is called PULSE. It\nwas [supposedly](http://quviq.com/documentation/pulse/index.html) released\nunder the BSD license, however I've not been able to find it. PULSE is written\nin Erlang and the paper uses it to test Erlang code. In Erlang everything is\ntriggered by message passing, so I think that the correct way of thinking about\nwhat PULSE does is that it acts as a person-in-the-middle proxy. With other\nwords, an Erlang process doesn't send a messaged directly to another process,\nbut instead asks the scheduler to send it to the process. That way all messages\ngo via the scheduler and it can choose the message order. Note that a seed can\nbe used to introduce randomness, without introducing non-determinism.\n\nI implemented a proxy scheduler like this in Haskell (using\n`distributed-process`, think Haskell trying to be like Erlang) about 6 years\n[ago](https://github.com/advancedtelematic/quickcheck-state-machine-distributed#readme),\nbut I didn't know how to do it in a non-message-passing setting. I was\ntherefore happy to see that my previous post inspired matklad to write a\n[post](https://matklad.github.io/2024/07/05/properly-testing-concurrent-data-structures.html)\nwhere he shows how he'd do it in a multi-threaded shared memory setting.\n\nIn this post I'll port matklad's approach from Rust to Haskell and hook it up\nto the parallel property-based testing machinery from my previous post.\n\nAnother difference between matklad and the approach in this post is that\nmatklad uses an ad hoc correctness criteria, whereas I follow the parallel\nproperty-based testing paper and use\n[linearisability](https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf). An ad\nhoc criteria can be faster than linearisability checking, but depending\non how complicated your system is, it might be harder to find one.\nLinearisability checking on the other hand follows mechanically (for free) from\na sequential (single-threaded) model/fake. \n\nIf you know what you are doing, then by all means figure out an ad hoc\ncorrectness criteria like matklad does. If on the other hand you haven't tested\nmuch concurrent code before, then I'd recommend starting with the\nlinearisability checking approach that we are about to describe[^3].\n\n## Motivation and overview\n\nIn order to explain what we'd like to do, it's helpful to consider an example\nof a race condition. The text book\n[example](https://en.wikipedia.org/wiki/Race_condition#Example)\nof a race condition is a counter which is incremented by two threads at the\nsame time.\n\nOne possible interleaving of the two threads that yields the correct result is\nthe following:\n\n| Time | Thread 1       | Thread 2       |   | Integer value |\n|:-----|:---------------|:---------------|:-:|:--------------|\n| 0    |                |                |   | 0             |\n| 1    | read value     |                | ← | 0             |\n| 2    | increase value |                |   | 0             |\n| 3    | write back     |                | → | 1             |\n| 4    |                | read value     | ← | 1             |\n| 5    |                | increase value |   | 1             |\n| 6    |                | write back     | → | 2             |\n\nHowever there are other interleavings where one of the threads overwrites the\nother thread's increment, yielding an incorrect result:\n\n| Time | Thread 1       | Thread 2       |   | Integer value |\n|:-----|:---------------|:---------------|:-:|:--------------|\n| 0    |                |                |   | 0             |\n| 1    | read value     |                | ← | 0             |\n| 2    |                | read value     | ← | 0             |\n| 3    | increase value |                |   | 0             |\n| 4    |                | increase value |   | 0             |\n| 5    | write back     |                | → | 1             |\n| 6    |                | write back     | → | 1             |\n\nIn most programming languages the thread interleaving is non-deterministic, and\nso we get irreproducible failures also sometimes known as \"Heisenbugs\".\n\nWhat we'd like to do is to be able to start a program with some seed and if\nthe same seed is used then we get the same thread interleaving and therefore a\nreproducible result.\n\nThe idea, due to matklad, is to insert pauses around each shared memory\noperation (the reads and writes) and have a scheduler unpause one thread at the\ntime. The scheduler is parametrised by a seed, which is fed into a pseudorandom\nnumber generator which in turn lets the scheduler deterministically choose\nwhich thread to unpause.\n\nIn the rest of this post we will port matklad's deterministic scheduler from\nRust to Haskell, hopefully in a way that shows that this can be done in any\nother language with decent multi-threaded programming primitives. Then we'll do\na short recap of how parallel property-based testing works, and finally we'll\nhook up the deterministic scheduler to the parallel property-based testing\nmachinery.\n\n## Deterministic scheduler\n\nThe implementation of the deterministic scheduler can be split up in three\nparts. First we'll implement a way for the spawned threads to communicate with\nthe scheduler, this communication channel will be used to pause and unpause the\nthreads. After that we'll make a wrapper datatype around Haskell's threads\nwhich also includes the communication channel. Finally, we'll have all the\npieces to implement the deterministic scheduler itself.\n\n### Thread-scheduler communication\n\nThe scheduler needs to be able to communicate with the running threads, in\norder to be able to deterministically unpause, or \"step\", one thread at a time.\n\nWe'll use Haskell's `TMVar`s for this, but any kind of shared memory will do.\nHaskell's `MVar`s can be thought of boxes that contain a value, where taking\nsomething out of a box that is empty blocks and putting something into a box\nthat is full blocks as well. Where \"blocks\" means that the run-time will\nsuspend the thread that tries the blocking action and only wake it up when the\n`MVar` changes, i.e. it's an efficient way of waiting compared to\n[busy-waiting](https://en.wikipedia.org/wiki/Busy_waiting) or spinning.\n\nThe `T` in `TMVar`s adds\n[STM](https://en.wikipedia.org/wiki/Software_transactional_memory) transactions\naround `MVar`s, we'll see an example of what these are useful for shortly.\n\nWe'll call our communication channel `Signal`:\n\n``` {.haskell include=src/ManagedThread2.hs snippet=Signal .numberLines}\n```\n\nThere are two ways to create a `Signal`, one for single-threaded and another\nfor multi-threaded execution:\n\n``` {.haskell include=src/ManagedThread2.hs snippet=newSignal .numberLines}\n```\n\nThe idea being that in the single-threaded case the scheduler shouldn't be\ndoing anything. In particular pausing a thread in the single-threaded case is a\nno-op:\n\n``` {.haskell include=src/ManagedThread2.hs snippet=pause .numberLines}\n```\n\nNotice that in the multi-threaded case the pause operation will try to take a\nvalue from the `TMVar` and also notice that the `TMVar` starts off being empty,\nso this will cause the thread to block.\n\nThe way the scheduler can unpause the thread is by putting a unit value into\nthe `TMVar`, which will cause the `takeTMVar` finish.\n\n``` {.haskell include=src/ManagedThread2.hs snippet=unpause .numberLines}\n```\n\nFor our scheduler implementation we'll also need a way to check if a thread is\npaused:\n\n``` {.haskell include=src/ManagedThread2.hs snippet=isPaused .numberLines}\n```\n\nIt's also useful to be able to check if all threads are paused:\n\n``` {.haskell include=src/ManagedThread2.hs snippet=waitUntilAllPaused .numberLines}\n```\n\nNotice that STM makes this easy as we can do this check atomically.\n\n### Managed threads\n\nHaving implemented the communication channel between the thread and the\nscheduler, we are now ready to introduce our \"managed\" threads (we call them\n\"managed\" because they are managed by the scheduler). These threads are\nbasically a wrapper around Haskell's `Async` threads that also includes our\ncommunication channel, `Signal`.\n\n``` {.haskell include=src/ManagedThread2.hs snippet=ManagedThreadId .numberLines}\n```\n\nOur managed thread can be spawned as follows:\n\n``` {.haskell include=src/ManagedThread2.hs snippet=spawn .numberLines}\n```\n\nNoticed that the spawned IO action gets access to the communication channel.\n\nThe `Async` thread API exposes a way to check if a thread is still executing,\nthrew an exception or finished yielding a result. We'll extend this by also\nbeing able to check if the thread is paused as follows.\n\n``` {.haskell include=src/ManagedThread2.hs snippet=getThreadStatus .numberLines}\n```\n\n### Scheduler\n\nWe now got all the pieces we need to implement our deterministic scheduler.\n\nThe idea is to wait until all threads are paused, then step one of them and\nwait until it either pauses again or finishes. If it pauses again, then repeat\nthe stepping. If it finishes, remove it from the list of stepped threads and\ncontinue stepping.\n\n``` {.haskell include=src/ManagedThread2.hs snippet=schedule .numberLines}\n```\n\nWe can now also implement a useful higher-level combinator:\n\n``` {.haskell include=src/ManagedThread2.hs snippet=mapConcurrently .numberLines}\n```\n\n### Example: broken atomic counter\n\nTo show that our scheduler is indeed deterministic, let's implement the race\ncondition between two increments from the introduction.\n\nFirst let's introduce an interface for shared memory. The idea is that we will\nuse two different instances of this interface: a \"real\" one which just does\nwhat we'd expect from shared memory, and a \"fake\" one which pauses around the\nreal operations. \n\n``` {.haskell include=src/ManagedThread2.hs snippet=SharedMemory .numberLines}\n```\n\nThe real one will be used when we deploy the actual software and the fake one\nwhile we do our testing.\n\nWe can now implement our counter example against the shared memory interface:\n\n``` {.haskell include=src/ManagedThread2.hs snippet=AtomicCounter .numberLines}\n```\n\nFinally, we can implement the race condition test using the counter and two\nthreads that do increments:\n\n``` {.haskell include=src/ManagedThread2.hs snippet=test1 .numberLines}\n```\n\nThe test is parametrised by a seed for the scheduler. If we run it with\ndifferent seeds we get different outcomes:\n\n```\n\u003e\u003e\u003e test1\n(0,True,2)\n(1,True,2)\n(2,False,1)\n(3,True,2)\n(4,False,1)\n(5,True,2)\n(6,False,1)\n(7,False,1)\n(8,True,2)\n(9,True,2)\n(10,False,1)\n```\n\nIf we fix the seed to one which makes our test fail:\n\n``` {.haskell include=src/ManagedThread2.hs snippet=test2 .numberLines}\n```\n\nThen we get the same outcome every time:\n\n```\n\u003e\u003e\u003e test2\n(2,False,1)\n(2,False,1)\n(2,False,1)\n(2,False,1)\n(2,False,1)\n(2,False,1)\n(2,False,1)\n(2,False,1)\n(2,False,1)\n(2,False,1)\n```\n\nThese quick tests seem to suggest that our scheduler is in fact deterministic.\n\n## Parallel property-based testing recap\n\nIn our counter example above, we had two concurrent increments, in this case\nit's easy to see what the answer must be (the counter must have the value of\ntwo, if we start counting from zero and increment by one). \n\nHowever for more complicated scenarios it gets less clear, consider:\n\n* Two increments and a get operation all happening concurrently, what's the\n  right return value of the get? It depends, it can be 0, 1 or 2;\n* Consider the counter being on a remote server and clients doing the\n  increments and gets via some network. Imagine a client first does an\n  increment and this request times out or the client crashes, then another\n  client does a get operation, what's the return value of the get? It depends,\n  it can be 0 or 1 depending on if the timeout or crash happened before or\n  after the server received the increment;\n* The above gets a lot more complicated with more operations involved or more\n  complicated data structures than a counter, e.g. key-value store with deletes.\n\nLuckily there's a correctness criteria for concurrent programs like these which\nis based on a sequential model, [Linearizability: a correctness condition for\nconcurrent objects](https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf) by\nHerlihy and Wing (1990), which hides the complexity of non-determinism and\ncrashing threads and works on arbitrary data structures. This is what we use in\nparallel property-based testing.\n\nThe idea in a nutshell: execute commands in parallel, collect a concurrent\nhistory of when each command started and stopped executing, try to find an\ninterleaving of commands which satisfies the sequential model. For a more\ndetailed explanation see my [previous\npost](https://stevana.github.io/the_sad_state_of_property-based_testing_libraries.html#parallel-property-based-testing).\n\n## Integrating the scheduler into the testing\n\nThe point is not to reimplement the parallel property-based testing machinery\nfrom my previous post here, but merely show that integrating the deterministic\nscheduler isn't too much work.\n\nWe need to change the code from the previous post in three different places:\nthe sequential module, the parallel module and the counter example itself.\n\n### Changes to sequential module\n\nFirst we import the library code that we wrote above in this post:\n\n```diff\n+import qualified ManagedThread2 as Scheduler\n```\n\nThen we move the `runCommandMonad` method from the `ParallelModel` class into\nthe `StateModel` class and change it so that it has access to the communication\nchannel to the scheduler (`Signal`):\n\n```diff\n+  runCommandMonad :: proxy state -\u003e CommandMonad state a -\u003e Scheduler.Signal -\u003e IO a\n```\n\nWe then change the `runCommands` function to use `runCommandMonad` and use a\nsingle-threaded `Signal`, i.e. one that doesn't do any pauses:\n \n```diff\n runCommands :: forall state. StateModel state\n-            =\u003e Commands state -\u003e PropertyM (CommandMonad state) ()\n-runCommands (Commands cmds0) = go initialState emptyEnv cmds0\n+            =\u003e Commands state -\u003e PropertyM IO ()\n+runCommands (Commands cmds0) =\n+  hoist (flip (runCommandMonad (Proxy :: Proxy state)) Scheduler.newSingleThreadedSignal) $\n+    go initialState emptyEnv cmds0\n   where\n     go :: state -\u003e Env state -\u003e [Command state (Var (Reference state))]\n        -\u003e PropertyM (CommandMonad state) ()\n```\n\nIn order for this to typecheck we need a helper function that changes the\nunderlying monad of a `PropertyM` (QuickCheck's monadic properties):\n\n```diff\n+hoist :: Monad m =\u003e (forall x. m x -\u003e IO x) -\u003e PropertyM m a -\u003e PropertyM IO a\n+hoist nat (MkPropertyM f) = MkPropertyM $ \\g -\u003e\n+  let\n+    MkGen h = f (fmap (fmap (return . ioProperty)) g)\n+  in\n+    MkGen (\\r n -\u003e nat (h r n))\n```\n\n### Changes to parallel module\n\nAgain we import the deterministic scheduler that we defined in this post:\n\n```diff\n+import qualified ManagedThread2 as Scheduler\n``` \n\nAs we said above, the `runCommandMonad` method was moved into the sequential\ntesting module:\n\n```diff\n-  runCommandMonad :: proxy state -\u003e CommandMonad state a -\u003e IO a\n``` \n\nWe'll reuse QuickCheck's seed for our scheduler, the following helper function\nextracts the seed from QuickCheck's `PropertyM`:\n\n```diff  \n+getSeed :: PropertyM m QCGen\n+getSeed = MkPropertyM (\\f -\u003e MkGen (\\r n -\u003e unGen (f r) r n))\n``` \n\nWe now have all the pieces we need to rewrite `runParallelCommands` to use the\ndeterministic scheduler:\n\n```diff \n runParallelCommands :: forall state. ParallelModel state\n                     =\u003e ParallelCommands state -\u003e PropertyM IO ()\n runParallelCommands cmds0@(ParallelCommands forks0) = do\n+  gen \u003c- getSeed\n+  liftIO (putStrLn (\"Seed: \" ++ show gen))\n   forM_ (parallelCommands cmds0) $ \\cmd -\u003e do\n     let name = commandName cmd\n     monitor (tabulate \"Commands\" [name] . classify True name)\n   monitor (tabulate \"Concurrency\" (map (show . length . unFork) forks0))\n   q   \u003c- liftIO newTQueueIO\n   c   \u003c- liftIO newAtomicCounter\n-  env \u003c- liftIO (runForks q c emptyEnv forks0)\n+  env \u003c- liftIO (runForks q c emptyEnv gen forks0)\n   hist \u003c- History \u003c$\u003e liftIO (atomically (flushTQueue q))\n   let ok = linearisable env (interleavings hist)\n   unless ok (monitor (counterexample (show hist)))\n   assert ok\n   where\n-    runForks :: TQueue (Event state) -\u003e AtomicCounter -\u003e Env state -\u003e [Fork state]\n-             -\u003e IO (Env state)\n-    runForks _q _c env [] = return env\n-    runForks  q  c env (Fork cmds : forks) = do\n-      envs \u003c- liftIO $\n-        mapConcurrently (runParallelReal q c env) (zip [Pid 0..] cmds)\n+    runForks :: RandomGen g =\u003e TQueue (Event state) -\u003e AtomicCounter -\u003e Env state -\u003e g\n+             -\u003e [Fork state] -\u003e IO (Env state)\n+    runForks _q _c env _gen [] = return env\n+    runForks  q  c env gen (Fork cmds : forks) = do\n+      (envs, gen') \u003c- liftIO $\n+        Scheduler.mapConcurrently (runParallelReal q c env) (zip [Pid 0..] cmds) gen\n       let env' = combineEnvs (env : envs)\n-      runForks q c env' forks\n+      runForks q c env' gen' forks\n \n     runParallelReal :: TQueue (Event state) -\u003e AtomicCounter -\u003e Env state\n-                    -\u003e (Pid, Command state (Var (Reference state))) -\u003e IO (Env state)\n-    runParallelReal q c env (pid, cmd) = do\n+                    -\u003e Scheduler.Signal -\u003e (Pid, Command state (Var (Reference state))) -\u003e IO (Env state)\n+    runParallelReal q c env signal (pid, cmd) = do\n       atomically (writeTQueue q (Invoke pid cmd))\n-      eResp \u003c- try (runCommandMonad (Proxy :: Proxy state) (runReal (fmap (lookupEnv env) cmd)))\n+      eResp \u003c- try (runCommandMonad (Proxy :: Proxy state) (runReal (fmap (lookupEnv env) cmd)) signal)\n       case eResp of\n         Left (err :: SomeException) -\u003e\n           error (\"runParallelReal: \" ++ displayException err)\n```\n\n### Changes to the counter example\n\nWe start by replacing our sleeps (`threadDelay`s) and direct manipulation of\nshared memory (`{read,write}IORef`) with operations from the shared memory\ninterface:\n\n```diff\n+import qualified ManagedThread2 as Scheduler\n\n-incrRaceCondition :: IO ()\n-incrRaceCondition = do\n-  n \u003c- readIORef gLOBAL_COUNTER\n-  threadDelay 100\n-  writeIORef gLOBAL_COUNTER (n + 1)\n-  threadDelay 100\n+incrRaceCondition :: Scheduler.SharedMemory Int -\u003e IO ()\n+incrRaceCondition mem = do\n+  n \u003c- liftIO (Scheduler.memReadIORef mem gLOBAL_COUNTER)\n+  Scheduler.memWriteIORef mem gLOBAL_COUNTER (n + 1)\n \n-get :: IO Int\n-get = readIORef gLOBAL_COUNTER\n+get :: Scheduler.SharedMemory Int -\u003e IO Int\n+get mem = Scheduler.memReadIORef mem gLOBAL_COUNTER\n```\n\nWe need to pass in the `Signal` communication channel when constructing the\nshared memory interface. With our change to `runCommandMonad` we have access to the `sig`nal\nwhen translating the `CommandMonad` into `IO`, so we can simply pass the\n`sig`nal through using the reader monad (recall that `ReaderT Scheduler.Signal\nIO a` is isomorphic to `Scheduler.Signal -\u003e IO a`).\n\n```diff\n+  type CommandMonad Counter = ReaderT Scheduler.Signal IO\n+\n+  runCommandMonad _ m sig = runReaderT m sig\n```\n\nThe construction of the fake shared memory interface happens in the `runReal`\nfunction, where `ask` retrieves the `sig`nal via the reader monad:\n\n```diff\n   -- We also need to explain which part of the counter API each command\n   -- corresponds to.\n-  runReal :: Command Counter r -\u003e IO (Response Counter r)\n-  runReal Get  = Get_  \u003c$\u003e get\n-  runReal Incr = Incr_ \u003c$\u003e incrRaceCondition\n+  runReal :: Command Counter r -\u003e CommandMonad Counter (Response Counter r)\n+  runReal cmd = do\n+    sig \u003c- ask\n+    let mem = Scheduler.fakeMem sig\n+    case cmd of\n+      Get  -\u003e liftIO (Get_  \u003c$\u003e get mem)\n+      Incr -\u003e liftIO (Incr_ \u003c$\u003e incrRaceCondition mem)\n```\n\nThe final changes are in the in the parallel property itself. Where we can now\nremove the `replicateM_ 10`, which repeats the test 10 times, because the\nthread scheduling is now deterministic and we don't need to repeat the test in\norder to avoid being unlucky with only getting thread interleavings that don't\nreveal the bug.\n \n```diff\n prop_parallelCounter :: ParallelCommands Counter -\u003e Property\n prop_parallelCounter cmds = monadicIO $ do\n-  replicateM_ 10 $ do\n-    run reset\n-    runParallelCommands cmds\n+  run reset\n+  runParallelCommands cmds\n   assert True\n```\n\nRunning the parallel property gives output such as the following:\n\n```\n\u003e\u003e\u003e quickCheck prop_parallelCounter\nSeed: SMGen 14250666030628800360 1954972351745194697\nSeed: SMGen 13912848539649022280 6105520832690741705\nSeed: SMGen 11982463081258021613 5563494797767522969\nSeed: SMGen 3766496530906674898 8913882928510646053\nSeed: SMGen 9878140450988724144 11431408445192688375\nSeed: SMGen 10677049786290338516 2728325560351012375\nSeed: SMGen 8857820011662424543 17283242182436244785\nSeed: SMGen 8857820011662424543 17283242182436244785\nSeed: SMGen 8857820011662424543 17283242182436244785shrinks)...\nSeed: SMGen 8857820011662424543 17283242182436244785rink)...\nSeed: SMGen 8857820011662424543 17283242182436244785shrinks)...\nSeed: SMGen 8857820011662424543 17283242182436244785rinks)...\nSeed: SMGen 8857820011662424543 17283242182436244785shrinks)...\nSeed: SMGen 8857820011662424543 17283242182436244785shrinks)...\nSeed: SMGen 8857820011662424543 17283242182436244785shrinks)...\nSeed: SMGen 8857820011662424543 17283242182436244785shrinks)...\nSeed: SMGen 8857820011662424543 17283242182436244785shrinks)...\nSeed: SMGen 8857820011662424543 17283242182436244785rinks)...\nSeed: SMGen 8857820011662424543 17283242182436244785shrinks)...\nSeed: SMGen 8857820011662424543 17283242182436244785shrinks)...\nSeed: SMGen 8857820011662424543 17283242182436244785shrinks)...\nSeed: SMGen 8857820011662424543 17283242182436244785shrinks)...\nSeed: SMGen 8857820011662424543 17283242182436244785shrinks)...\n*** Failed! Assertion failed (after 7 tests and 3 shrinks):\nParallelCommands [Fork [Incr,Incr],Fork [Get]]\nHistory [Invoke (Pid 0) Incr,Invoke (Pid 1) Incr,Ok (Pid 0) (Incr_ ()),Ok (Pid 1) (Incr_ ()),Invoke (\nPid 0) Get,Ok (Pid 0) (Get_ 1)]\n```\n\nWe can see that different seeds are used up until the test fails, then\nshrinking is done with the same seed.\n\n## Conclusion and further work\n\nI hope I've managed to give a glimpse of how we can deterministically test\nmulti-threaded code using a deterministic scheduler, and how this technique can\nbe applied to parallel property-based testing.\n\nWhile this seems to work, there are several ways in which it can be improved\nupon:\n\n1. Some seeds don't give the minimal counterexample (the one we saw above with\n   two concurrent increments followed by a get). While shrinking can be improved\n   as already pointed out in my previous post, the problem could also be that\n   shrinking changes the interleavings. Let's say we generated three concurrent\n   increments followed by a get, this triggers the race condition if one of\n   those increments overwrite the other's increment. It could be that trying to\n   shrink away any of the increments (to get to the minimal counterexample)\n   fails because by removing any of them will cause the scheduler to unpause the\n   remaining ones in a different order, and thus potentially failing to trigger\n   the race condition.\n\n   One possible solution to this problem could be to \"tombstone\" the\n   shrunk commands/threads rather than removing them and then change `runReal`\n   so that tombstoned commands get run using an instance of the shared memory\n   interface in which the pauses happen but not the mutation of the memory. The\n   idea being that by doing so the scheduler will still use the pseudorandom\n   number generator for the shrunk commands and thus the original interleavings\n   will be preserved.\n\n   Another possible solution, due to Daniel Gustafsson, is to save more info\n   than merely the seed. In particular the scheduler could generate the first\n   interleaving using a seed, but then save the interleaving (the order in\n   which the threads were unpaused) and consequent reruns can reuse the\n   interleaving (or shrunk subsets of it) instead of the seed and thereby\n   preserve the original interleaving.\n\n2. Currently random interleavings are checked, but we could also imagine\n   enumerating all interleavings up to some depth. This would be more in line\n   with what model checkers do. Perhaps\n   [SmallCheck](https://github.com/Bodigrim/smallcheck) could be used for this?\n   It would also be interesting to compare this approach to what the\n   [dejafu](https://github.com/barrucadu/dejafu) library does.\n\n3. While the approach in this post works in all languages due to its white-box\n   nature, it's interesting to consider what would it would take to turn it\n   into a black-box approach? Where with black-box I mean that the programmer\n   doesn't need to change their code to get the deterministic testing.\n\n   Two black-box approaches that I'm aware of are:\n\n   + Intercepting and recording the syscalls that the multi-threaded program\n     does and then somehow using the recorded trace to deterministically\n     reproduce the same execution when the program is rerun. I believe this is\n     what\n     Mozilla's time travelling debugger,\n     [rr](https://www.youtube.com/watch?v=ytNlefY8PIE), and Facebook's\n     [hermit](https://github.com/facebookexperimental/hermit) does);\n   + Antithesis' deterministic\n     [hypervisor](https://antithesis.com/blog/deterministic_hypervisor/).\n\n   Both of these approaches involve a lot of engineering work though, and I'm\n   curious if we can get there cheaper?\n\n   One thing I'm interested in is: what if we had a programming language that's\n   able to switch between the fake and real shared memory interface, depending\n   on if we are testing or not? The multi-threaded code that the user writes in\n   that case doesn't need to be changed to get the deterministic testing, i.e.\n   a black-box approach.\n\n   Implementing a new language and rewriting all your code in that language is\n   also a lot of work as well though. Perhaps existing languages can be\n   incrementally changed to expose scheduler hooks or allow user defined\n   schedulers? Either way, it seems to me that this should be solved at the\n   language-level, rather than OS-level, but maybe that's partly because I\n   don't understand the OS-level solutions well enough. I'd be curious to hear\n   about other opinions or ideas.\n\n4. We've looked at linearisability, but what about other consistency models?\n   For example, eventual consistency? See John Hughes et al's [*Mysteries of\n   Dropbox Property-Based Testing of a Distributed Synchronization\n   Service*](https://publications.lib.chalmers.se/records/fulltext/232551/local_232551.pdf)\n   (2016) as well as matklad's\n   [post](https://matklad.github.io/2024/07/05/properly-testing-concurrent-data-structures.html)\n   for hints.\n\n5. Partial-order reduction: during concurrent execution sometimes we can\n   commute two operations without changing the outcome, e.g. the interleaving\n   of two increments doesn't matter, they all end up the same state. We can\n   exploit this fact to check less histories;\n\n6. We looked at shared memory, but there are other ways of getting data races,\n   e.g. via concurrent file system access, mmaped memory, etc. All these other\n   ways of concurrently mutating some state would require interfaces with fake\n   implementations that insert pauses around the actual mutation. It would be\n   interesting to take an example of a concurrent (and persisted?) data\n   structure, e.g. the LMAX Disruptor or Aeron's [log\n   buffers](https://github.com/real-logic/aeron/wiki/Data-Structures), and\n   implement and test it in the same way we tested the counter.\n\nIf you've feedback, comments or are interested in working on any of the above,\nfeel free to get in [touch](https://stevana.github.io/about.html).\n\n## Acknowledgments\n\nThanks to Daniel Gustafsson for discussing tombstone solution to the problem\nwhere shrinking can cause different thread interleavings.\n\n\n[^1]: If you haven't heard of\n    [fakes](https://martinfowler.com/bliki/TestDouble.html) before, think of\n    them as a more elaborate test double than a mock. A mock of a component expects\n    to be called in some particular way (i.e. exposes only some limited subset of\n    the components API), and typically throw an exception when called in any\n    other way. While a fake exposes the full API and can be called just like the\n    real component, but unlike the real component it takes some shortcuts. For\n    example a fake might lose all data when restarted (i.e. keeps all data in\n    memory, while the real component persists the data to some stable storage).\n\n[^2]: The reason for shrinking not working so well with non-determinism is\n    because shrinking stops when the property passes. So if some input causes\n    the property to fail and then we rerun the property on a smaller input it might\n    be the case that the smaller input still contains the race condition, but\n    because the interleaving of threads is non-deterministic we are unlucky and the\n    race condition isn't triggered and the property passes, which stops the\n    shrinking process.\n\n[^3]: Jepsen's [Knossos\n    checker](https://aphyr.com/posts/314-computational-techniques-in-knossos)\n    also uses linearisability checking and has found many\n    [bugs](https://jepsen.io/analyses), so we are in good company. \n\n    Note that the most recent Jepsen analyses use the [Elle\n    checker](https://github.com/jepsen-io/elle), rather than the Knossos checker.\n    The Elle checker doesn't do linearisability checking, but rather looks for\n    cycles in the dependencies of database transactions. Checking for cycles is\n    less general than linearisability checking, but also more efficient. See the\n    Elle [paper](https://github.com/jepsen-io/elle/raw/master/paper/elle.pdf) for\n    details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstevana%2Fdeterministic-scheduler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstevana%2Fdeterministic-scheduler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstevana%2Fdeterministic-scheduler/lists"}