{"id":17997410,"url":"https://github.com/thma/commutativemonoid","last_synced_at":"2025-06-19T01:33:34.562Z","repository":{"id":146285826,"uuid":"283505611","full_name":"thma/CommutativeMonoid","owner":"thma","description":"Trying to prove that commutative monoids are required for a parallel foldMap (aka. map/reduce)","archived":false,"fork":false,"pushed_at":"2023-12-15T08:36:54.000Z","size":64,"stargazers_count":4,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-06T22:43:03.084Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Haskell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thma.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-07-29T13:24:37.000Z","updated_at":"2021-08-25T08:46:12.000Z","dependencies_parsed_at":"2023-12-05T03:27:06.611Z","dependency_job_id":"6e82e222-f6dd-4841-8b09-3e3f20db797c","html_url":"https://github.com/thma/CommutativeMonoid","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/thma/CommutativeMonoid","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thma%2FCommutativeMonoid","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thma%2FCommutativeMonoid/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thma%2FCommutativeMonoid/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thma%2FCommutativeMonoid/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thma","download_url":"https://codeload.github.com/thma/CommutativeMonoid/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thma%2FCommutativeMonoid/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260662718,"owners_count":23043989,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-29T21:18:27.020Z","updated_at":"2025-06-19T01:33:29.549Z","avatar_url":"https://github.com/thma.png","language":"Haskell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Proving me wrong — How QuickCheck destroyed my favourite theory\n\n## Introduction\n\nQuite a while back I wrote a larger article on the algebraic foundation of software patterns\nwhich also covered the [MapReduce algorithm](https://thma.github.io/posts/2018-11-24-lambda-the-ultimate-pattern-factory.html#map-reduce).\n\nDuring the research digged out a paper on [algebraic properties of distributed big data analytics](https://pdfs.semanticscholar.org/0498/3a1c0d6343e21129aaffca2a1b3eec419523.pdf),\nwhich explained that a MapReduce will always work correctly when the intermediate data structure resulting from the\n`map`-phase is a Monoid under the `reduce`-operation.\n\nFor some reason, I was not convinced that this Monoid-condition was enough, because all the typical examples\nlike word-frequency maps are even **commutative** Monoids under the respective reduce operation.\n\nSo I came up with the following personal theory:\n\n\u003e Only if the intermediate data structure resulting from the `map`-phase is a **commutative Monoid** \n\u003e under the `reduce`-operation, then a parallel MapReduce will produce correct results.\n\nI tried to prove this property using the \n[QuickCheck test framework](https://wiki.haskell.org/Introduction_to_QuickCheck2).\n\nInterestingly QuickCheck was able to find counter examples!\nThis finally convinced me that my theory was wrong, and after a little deeper thought, I could understand why.\n\nI was impressed with the power of QuickCheck, so I thought it would be a good idea to share \nthis lesson in falsification.\n\n## Commutative Monoids\n\nIn abstract algebra, a monoid is a *set* equipped with an *associative \nbinary operation* and an *identity element*.\n\nThe simplest example for a *commutative Monoid* are the natural numbers under addition with 0 as the identity (or neutral) element. \nWe can use QuickCheck to verify that indeed the Monoid laws plus commutativity are maintained.\n\nIf we want to use `GHC.Natural` type to represent natural numbers, \nwe first have to make `Natural` instantiate the `Arbitrary` type class which is\nused by QuickCheck to automatically generate test data:\n\n```haskell\nimport           Test.QuickCheck (Arbitrary, arbitrary, NonNegative (..))\nimport           GHC.Natural     (Natural, naturalFromInteger)\n\ninstance Arbitrary Natural where\n  arbitrary = do\n    NonNegative nonNegative \u003c- arbitrary\n    return $ naturalFromInteger nonNegative\n```\n\nNow we can start to write our property based tests. For algebraic structures it is\nstraightforward to come up with properties: we just write the required\nlaws (associativity, 0 is identity element and commutativity) as properties.\n\nI am using Hspec as a wrapper around QuickCheck as it provides a very nice testing DSL which makes\nit easy to read the code and the output of the test suite:\n\n```haskell\nimport           Test.Hspec\n\nspec :: Spec\nspec = do\n  describe \"The Monoid 'Natural Numbers under Addition'\" $ do\n    it \"is associative\" $\n      property $ \\x y z -\u003e ((x + y) + z) `shouldBe` ((x + (y + z)) :: Natural)\n      \n    it \"has 0 as left and right identity element\" $\n      property $ \\x -\u003e (x + 0 `shouldBe` (x :: Natural)) .\u0026\u0026. (0 + x `shouldBe` x)\n      \n    it \"is commutative\" $\n      property $ \\x y -\u003e x + y `shouldBe` (y + x :: Natural)\n```\n\nThe output of these tests will be as follows:\n\n```bash\nMonoid\n  The Monoid 'Natural Numbers under Addition'\n    is associative\n      +++ OK, passed 100 tests.\n    has 0 as identity (or neutral) element\n      +++ OK, passed 100 tests.\n    is commutative\n      +++ OK, passed 100 tests.\n```\n\nSo behind the scenes, QuickCheck has generated test data for 100 tests for each\nproperty under test. For all these data the test cases passed.\n\nThis is definitely not a proof. But it gives us some confidence that our math text-books\nare correct when giving Natural Numbers under addition as an example for a commutative Monoid.\n\nOK, that was easy! Now let's move to non-commutative Monoids.\n\n## Non-commutative Monoids\n\nStrings (or any other Lists) under concatenation are a typical example. \nIt's easy to see that `\"hello\" ++ (\"dear\" ++ \"people\")` equals `\"(hello\" ++ \"dear\") ++ \"people\"`,\nbut that `\"hello\" ++ \"world\"` differs from `\"world\" ++ \"hello\"`.\n\nNow let's try to formalize these intuitions as QuickCheck property based tests again.\n\nFirst I'm introducing an alias for `(++)`, as it is defined on any list type,\nit would be required to have type signatures in all properties (as we had all those `:: Natural` \nsignatures in the examples above). So I define an operation `(⊕)` which is\nonly defined on `String` instances:\n\n```haskell\n(⊕) :: String -\u003e String -\u003e String\n(⊕) a b = a ++ b\n```\n\nNow we can extend our test suite with the following test cases:\n\n```haskell\n  describe \"The Monoid 'Strings under concatenation'\" $ do\n    \n    it \"is associative\" $ \n      property $ \\x y z -\u003e ((x ⊕ y) ⊕ z) `shouldBe` (x ⊕ (y ⊕ z))\n      \n    it \"has \\\"\\\" as left and right identity element\" $\n      property $ \\x -\u003e (x ⊕ \"\" `shouldBe` x) .\u0026\u0026. (\"\" ⊕ x `shouldBe` x)\n```\n\nThe output looks promising:\n\n```bash\n  The Monoid 'Strings under concatenation'\n    is associative\n      +++ OK, passed 100 tests.\n    has \"\" as left and right identity element\n      +++ OK, passed 100 tests.\n```\n\nNow let's try to test the non-commutativity:\n\n```haskell\n    it \"is NOT commutative\" $\n      property $ \\x y -\u003e x ⊕ y `shouldNotBe` y ⊕ x\n```\n\nBut unfortunately the output tells us that this is not true:\n\n```bash\n    is NOT commutative FAILED [1]\n\n  1) Monoid, The Monoid 'Strings under concatenation', is NOT commutative\n       Falsifiable (after 1 test):\n         \"\"\n         \"\"\n       not expected: \"\"\n```\n\nWe formulated the property in the wrong way. The `(⊕)` *may be commutative for some*\nedge cases, e.g. when one or both of the arguments are `\"\"`.\nBut it is not commutative *in general* – that is for all possible arguments.\n\nWe could rephrase this property as *\"There exists at least one pair of arguments\nfor which `(⊕)` is not commutative\"*.\n\n\nQuickCheck does not come with a mechanism for *existential quantification*. \nBut as is has `forAll` that is *universal quantification*. So we can build our own\ntool for existential quantification \n[based on a discussion on Stackoverflow](https://stackoverflow.com/questions/42764847/is-there-a-there-exists-quantifier-in-quickcheck).\n\n```haskell\nexists :: (Show a, Arbitrary a) \n       =\u003e (a -\u003e Bool) -\u003e Property\nexists = forSome $ resize 100 arbitrary\n    \nforSome :: (Show a, Testable prop)\n        =\u003e Gen a -\u003e (a -\u003e prop) -\u003e Property\nforSome gen prop = once $ disjoin $ replicate 100 $ forAll gen prop\n```\n\nNow we can rewrite the property \"There exists at least one pair of arguments\nfor which `(⊕)` is not commutative\" as follows:\n\n```haskell\n    it \"is not commutative (via exists)\" $\n      exists $ \\(x,y) -\u003e x ⊕ y /= y ⊕ x\n```\n\nThe output now fits much better into our intuitive understanding:\n\n```bash\n    is not commutative (via exists)\n      +++ OK, passed 1 test.\n\n```\n\n## Sequential MapReduce\n\n\u003e MapReduce is a programming model and an associated implementation for processing and generating large data sets. \n\u003e Users specify **a map function** that processes a key/value pair to generate a set of intermediate key/value pairs, \n\u003e **and a reduce function** that merges all intermediate values associated with the same intermediate key.\n\u003e \n\u003e [This] abstraction is inspired by the map and reduce primitives present in Lisp and many other functional languages. \n\u003e [Quoted from Google Research](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/16cb30b4b92fd4989b8619a61752a2387c6dd474.pdf)\n\nI'm not going into more details here, as You'll find detailed information on this approach and a\nworking example \n[in my original article](https://thma.github.io/posts/2018-11-24-lambda-the-ultimate-pattern-factory.html#map-reduce).\n\nHere is the definition of a sequential MapReduce:\n\n```haskell\nsimpleMapReduce \n  :: (a -\u003e b)   -- map function\n  -\u003e ([b] -\u003e c) -- reduce function\n  -\u003e [a]        -- list to map over\n  -\u003e c          -- result\nsimpleMapReduce mapFunc reduceFunc = reduceFunc . map mapFunc\n```\n\nWe can test the sequential MapReduce algorithm with the following property based test:\n\n```haskell\n    it \"works correctly with a sequential map-reduce\" $\n      property $ \\a b c d -\u003e (simpleMapReduce reverse (foldr (⊕) \"\") [a,b,c,d]) \n                     `shouldBe` (reverse a) ⊕ (reverse b) ⊕ (reverse c) ⊕ (reverse d)\n```\n\n## Parallel MapReduce\n\nNow we come to the tricky part that kicked off this whole discussion: parallelism.\n\nAs an example we consider a simple sequential MapReduce, taking an input list of `Int`s, computing their squares and computing\nthe sum of these squares:\n\n```haskell\nλ\u003e simpleMapReduce (^2) (foldr (+) 0) [1,2,3,4]\n30\n```\n\nLet's try to design this as a massively parallelized algorithm:\n\n1. Mapping of `(^2)` over the input-list `[1,2,3,4]` would be started in a parallel to the reduction of the intermediary \nlist of squares by `(foldr (+) 0)`. \n\n2. The mapping phase will be executed as a set of parallel computations (one for each element of the input list).\n\n3. The reduction phase will also be executed as a set of parallel computations.\n\nOf course the reduction phase can begin only when at least one list element is squared.\nSo in effect the mapping process would have to start first. The parallel computation of squares will result in a non-deterministic\nsequence of computations. In particular it is not guaranteed that all elements of the input list are processed in the\noriginal list order.\nSo it might for example happen that `3` is squared first. Now the reduction phase would receive it's first input `9`, and \nwould start reduction, that is compute `9 + 0`.\n\nLet's assume the following random sequence of mapping steps:\nNext the first element of the input `1`, then the fourth `4` and finally the second element `2` would be squared,\nresulting in a reduction sequence of `4 + 16 + 1 + 9 + 0`. As this sums up to `30` everything is fine. Addition is commutative, so\nchanging the sequence of reduction steps does not affect the overall result.\n\nBut now imagine we would parallelize:\n\n```haskell\nλ\u003e simpleMapReduce reverse (foldr (⊕) \"\") [\" olleh\",\" ym\",\" raed\",\" sklof\"]\n\"hello my dear folks \"\n```\n\nIf we assume the same sequence as above, the third element of the input list would be reversed first, resulting in a first reduction step `\"dear \" ⊕ \"\"`. Next the first, the fourth and finally the second element would be reversed, resulting\nin a reduction sequence of `\"my \" ⊕ \"folks \" ⊕ \"hello \" ⊕ \"dear \" ⊕ \"\" = \"my folks hello dear \"`.\nAs string concatenation is not commutative it does not really come as a surprise that random changes to the reduction\nsequence will eventually result in wrong computations.\n\nSo our conclusion is: \n\n\u003e If the MapReduce algorithm is parallelized in the way that I outlined above \u0026mdash; which may result in random changes of the \n\u003e sequence of list elements in the reduction phase \u0026mdash; it will only work correct if the intermediary data structure is a \n\u003e commutative monoid under the reduce operation.\n\nIn the following section we will implement a parallel MapReduce in Haskell in try to validate our theory with property based testing.\n\n## Parallel MapReduce in Haskell\n\nWe can define a parallel MapReduce implementation as follows (for more details see \n[Real World Haskell, Chapter 24](http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html)):\n\n```haskell\nimport           Control.Parallel (par)\nimport           Control.Parallel.Strategies (using, parMap, rpar)\n\nparMapReduce \n  :: (a -\u003e b)   -- map function\n  -\u003e ([b] -\u003e c) -- reduce function\n  -\u003e [a]        -- list to map over\n  -\u003e c          -- result\nparMapReduce mapFunc reduceFunc input =\n    mapResult `par` reduceResult\n    where mapResult    = parMap rpar mapFunc input\n          reduceResult = reduceFunc mapResult `using` rpar\n```\n\nThis implementation will start computing `mapResult` and `reduceResult` in parallel and finally return `reduceResult`.\nThe `mapResult` is computed with a parallelized `map` function `parMap`.\nThe `reduceResult` is computed by applying a parallel reduction strategy `rpar`.\n\nNext we will write a property based test to valdate our theory:\n\n```haskell\ntext = [\" olleh\",\" ym\",\" raed\",\" sklof\"]\n\n    it \"has some cases where parallel reduction deviates from sequential reduction\" $\n      exists $ \\() -\u003e parMapReduce reverse (foldr (⊕) \"\") text\n                /= simpleMapReduce reverse (foldr (⊕) \"\") text\n```\n\nBut it turns out that QuickCheck does not find any evidence for this assumption:\n\n```bash\n    has some cases where parallel reduction deviates from sequential reduction FAILED [1]\n\nFailures:\n\n  test\\MonoidSpec.hs:83:5: \n  1) Monoid, The Monoid 'Strings under concatenation', has some cases where parallel reduction deviates from sequential reduction\n       Falsified (after 1 test):\n```\n\nAfter seeing this result I had to deal with some growing cognitive dissonance similar to [this flat earther](https://www.youtube.com/watch?v=EBtx1MDi5tY)...\n\n\nI began verifying my setup. I made sure that the `package.yaml` contains the right GHC options to provide parallel execution of the test suite:\n\n```yaml\n    ghc-options:\n    - -threaded\n    - -rtsopts\n    - -with-rtsopts=-N\n```\n\nI also made sure that all cores of my CPU were actually running at 100% utilization during the\nparallel tests.\n\nI also increased the number of test executions to give better chances to hit any rare cases.\n\nBut to no avail.\n\nAs QuickCheck was consistently telling me: \"you are wrong\", I finally began admitting \"Well, maybe I'm wrong and should have a deeper look at the issue\".\n\n## Rethinking parallel evaluation in Haskell\n\nGiving a closer look at the definition of the parallel MapReduce will allow us to better\nunderstand what's actually going on:\n\n```haskell\nimport           Control.Parallel (par)\nimport           Control.Parallel.Strategies (using, parMap, rpar)\n\nparMapReduce \n  :: (a -\u003e b)   -- map function\n  -\u003e ([b] -\u003e c) -- reduce function\n  -\u003e [a]        -- list to map over\n  -\u003e c          -- result\nparMapReduce mapFunc reduceFunc input =\n    mapResult `par` reduceResult\n    where mapResult    = parMap rpar mapFunc input\n          reduceResult = reduceFunc mapResult `using` rpar\n\n-- and now an actual example usage:\nx = parMapReduce reverse (foldr (⊕) \"\") [\" olleh\",\" ym\",\" raed\",\" sklof\"]     \n```\n\nIn this concrete example `mapResult` will be:\n\n```haskell\nmapResult    = parMap rpar reverse [\" olleh\",\" ym\",\" raed\",\" sklof\"]\n```\n\nparMap is defined as follows:\n\n```haskell\nparMap :: Strategy b -\u003e (a -\u003e b) -\u003e [a] -\u003e [b]\nparMap strat f = (`using` parList strat) . map f\n```\n\nThe `parMap` evaluation strategy will spark a parallel evaluation for each element of `input` list. \nNevertheless the actual sequence of elements will not be changed as internally the classical sequential\n`map` function is used which will not change the sequence of elements. So the reduce phase will never receive a changed sequence of elements from the map phase,\neven if `map`-computations for the individual list elements might be executed in random order!\n\n`mapResult` will always be `[\"hello\", \"my \", \"dear \", \"folks\"]`.\n\nThus `reduceResult` will be:\n\n```haskell\nreduceResult = (foldr (⊕) \"\") [\"hello\", \"my \", \"dear \", \"folks\"] `using` rpar\n```\n\nAgain the traditional semantics of `foldr` is maintained, only we allow for parallel evaluation for parts of the reduction phase.\n\nSo the final output will always be `\"hello my dear folks\"`. This is exactly what the failed test cased was telling us:\n\n\u003e There do not exist any cases where sequential and parallel MapReduce result in deviating results!\n\nWe can again evaluate our improved theory with a QuickCheck test:\n\n```haskell\n    it \"parallel reduction always equals sequential reduction\" $\n      property $ \\a b c d -\u003e simpleMapReduce reverse (foldr (⊕) \"\") [a,b,c,d]\n                     `shouldBe` parMapReduce reverse (foldr (⊕) \"\") [a,b,c,d]\n```\n\nAnd \u0026mdash; not so surprisingly \u0026mdash; this test succeeds!\n\nIf you want to know more about parallel evaluation in Haskell I highly recommend the exellent\n[Parallel and Concurrent Programming in Haskell by Simon Marlow](https://www.oreilly.com/library/view/parallel-and-concurrent/9781449335939/ch02.html).\n\n## Conclusions\n\n1. The parallelism as provided by the Haskell `Control.Strategies` package maintains the semantic of sequential code and thus a parallel MapReduce maintains the same properties as its sequential counterpart.\nSo a parallel MapReduce will still work correctly if the intermediate data structure resulting from the `map`-phase is just a **Monoid** \u0026ndash; not necessarily a commutative Monoid.\n\n2. Nevertheless there may be implementations that do not strictly maintain the original order of the input data during the `map`- and `reduce`-phases. With such implementations the intermediate data structure resulting from the `map`-phase must be a **commutative Monoid** under the `reduce`-operation to produce correct results.\n\n3. Property based testing with QuickCheck is a powerful tool to verify assumptions about a given code-base. I really like using it as intended by [Karl Poppers Theory of Falsification](https://www.simplypsychology.org/Karl-Popper.html): \n- Derive hypothesis from your theory which can be experimentally tested.\n- Perform experiments that test your hypothesis\n- If the experiments can not validate the hypothesis, the theory is false.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthma%2Fcommutativemonoid","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthma%2Fcommutativemonoid","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthma%2Fcommutativemonoid/lists"}