{"id":16661407,"url":"https://github.com/bodigrim/chimera","last_synced_at":"2025-12-11T23:22:45.594Z","repository":{"id":47932702,"uuid":"88800441","full_name":"Bodigrim/chimera","owner":"Bodigrim","description":"Lazy infinite compact streams with cache-friendly O(1) indexing and applications for memoization","archived":false,"fork":false,"pushed_at":"2025-01-15T22:27:49.000Z","size":162,"stargazers_count":59,"open_issues_count":0,"forks_count":6,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-11T00:52:26.622Z","etag":null,"topics":["dynamic-programming","infinite-stream","lazy-streams","memoization","memoize","recursive-functions"],"latest_commit_sha":null,"homepage":"http://hackage.haskell.org/package/chimera","language":"Haskell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Bodigrim.png","metadata":{"files":{"readme":"README.md","changelog":"changelog.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-04-19T23:38:00.000Z","updated_at":"2025-01-15T22:27:51.000Z","dependencies_parsed_at":"2024-02-11T10:25:19.730Z","dependency_job_id":"f3950215-ab3d-484f-bf71-5a3f687ea522","html_url":"https://github.com/Bodigrim/chimera","commit_stats":{"total_commits":150,"total_committers":4,"mean_commits":37.5,"dds":"0.033333333333333326","last_synced_commit":"cb18ef75bb4520c11359cda700d3a3c11d486bae"},"previous_names":["bodigrim/bit-stream"],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bodigrim%2Fchimera","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bodigrim%2Fchimera/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bodigrim%2Fchimera/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bodigrim%2Fchimera/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Bodigrim","download_url":"https://codeload.github.com/Bodigrim/chimera/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248322609,"owners_count":21084336,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dynamic-programming","infinite-stream","lazy-streams","memoization","memoize","recursive-functions"],"created_at":"2024-10-12T10:34:52.425Z","updated_at":"2025-12-11T23:22:45.552Z","avatar_url":"https://github.com/Bodigrim.png","language":"Haskell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# chimera [![Hackage](http://img.shields.io/hackage/v/chimera.svg)](https://hackage.haskell.org/package/chimera) [![Stackage LTS](http://stackage.org/package/chimera/badge/lts)](http://stackage.org/lts/package/chimera) [![Stackage Nightly](http://stackage.org/package/chimera/badge/nightly)](http://stackage.org/nightly/package/chimera)\n\nLazy infinite compact streams with cache-friendly O(1) indexing\nand applications for memoization.\n\n## Introduction\n\nImagine having a function `f :: Word -\u003e a`,\nwhich is expensive to evaluate. We would like to _memoize_ it,\nreturning `g :: Word -\u003e a`, which does effectively the same,\nbut transparently caches results to speed up repetitive\nre-evaluation.\n\nThere are plenty of memoizing libraries on Hackage, but they\nusually fall into two categories:\n\n* Store cache as a flat array, enabling us\n  to obtain cached values in O(1) time, which is nice.\n  The drawback is that one must specify the size\n  of the array beforehand,\n  limiting an interval of inputs,\n  and actually allocate it at once.\n\n* Store cache as a lazy binary tree.\n  Thanks to laziness, one can freely use the full range of inputs.\n  The drawback is that obtaining values from a tree\n  takes logarithmic time and is unfriendly to CPU cache,\n  which kinda defeats the purpose.\n\nThis package intends to tackle both issues,\nproviding a data type `Chimera` for\nlazy infinite compact streams with cache-friendly O(1) indexing.\n\nAdditional features include:\n\n* memoization of recursive functions and recurrent sequences,\n* memoization of functions of several, possibly signed arguments,\n* efficient memoization of boolean predicates.\n\n## Example 1\n\nConsider the following predicate:\n\n```haskell\nisOdd :: Word -\u003e Bool\nisOdd n = if n == 0 then False else not (isOdd (n - 1))\n```\n\nIts computation is expensive, so we'd like to memoize it:\n\n```haskell\nisOdd' :: Word -\u003e Bool\nisOdd' = memoize isOdd\n```\n\nThis is fine to avoid re-evaluation for the same arguments.\nBut `isOdd` does not use this cache internally, going all the way\nof recursive calls to `n = 0`. We can do better,\nif we rewrite `isOdd` as a `fix` point of `isOddF`:\n\n```haskell\nisOddF :: (Word -\u003e Bool) -\u003e Word -\u003e Bool\nisOddF f n = if n == 0 then False else not (f (n - 1))\n```\n\nand invoke `memoizeFix` to pass cache into recursive calls as well:\n\n```haskell\nisOdd' :: Word -\u003e Bool\nisOdd' = memoizeFix isOddF\n```\n\n## Example 2\n\nDefine a predicate, which checks whether its argument is\na prime number, using trial division.\n\n```haskell\nisPrime :: Word -\u003e Bool\nisPrime n = n \u003e 1 \u0026\u0026 and [ n `rem` d /= 0 | d \u003c- [2 .. floor (sqrt (fromIntegral n))], isPrime d]\n```\n\nThis is certainly an expensive recursive computation and we would like\nto speed up its evaluation by wrappping into a caching layer.\nConvert the predicate to an unfixed form such that `isPrime = fix isPrimeF`:\n\n```haskell\nisPrimeF :: (Word -\u003e Bool) -\u003e Word -\u003e Bool\nisPrimeF f n = n \u003e 1 \u0026\u0026 and [ n `rem` d /= 0 | d \u003c- [2 .. floor (sqrt (fromIntegral n))], f d]\n```\n\nNow create its memoized version for rapid evaluation:\n\n```haskell\nisPrime' :: Word -\u003e Bool\nisPrime' = memoizeFix isPrimeF\n```\n\n## Example 3\n\nNo manual on memoization is complete\nwithout Fibonacci numbers:\n\n```haskell\nfibo :: Word -\u003e Integer\nfibo = memoizeFix $ \\f n -\u003e if n \u003c 2 then toInteger n else f (n - 1) + f (n - 2)\n```\n\nNo cleverness involved: just write a recursive function\nand let `memoizeFix` take care about everything else:\n\n```haskell\n\u003e fibo 100\n354224848179261915075\n```\n\n## What about non-`Word` arguments?\n\n`Chimera` itself can memoize only `Word -\u003e a` functions, which sounds restrictive.\nThat is because we decided to outsource\nenumerating of user's datatypes to other packages, e. g.,\n[`cantor-pairing`](http://hackage.haskell.org/package/cantor-pairing).\nUse `fromInteger . fromCantor` to convert data to `Word`\nand `toCantor . toInteger` to go back.\n\nAlso, `Data.Chimera.ContinuousMapping` covers several simple cases,\nsuch as `Int`, pairs and triples.\n\n## Benchmarks\n\nHow important is to store cached data as a flat array instead of a lazy binary tree?\nLet us measure the maximal length of [Collatz sequence](https://oeis.org/A006577),\nusing `chimera` and `memoize` packages.\n\n```haskell\n#!/usr/bin/env cabal\n{- cabal:\nbuild-depends: base, chimera, memoize, time\n-}\n{-# LANGUAGE TypeApplications #-}\nimport Data.Chimera\nimport Data.Function.Memoize\nimport Data.Ord\nimport Data.List\nimport Data.Time.Clock\n\ncollatzF :: Integral a =\u003e (a -\u003e a) -\u003e (a -\u003e a)\ncollatzF f n = if n \u003c= 1 then 0 else 1 + f (if even n then n `quot` 2 else 3 * n + 1)\n\nmeasure :: (Integral a, Show a) =\u003e String -\u003e (((a -\u003e a) -\u003e (a -\u003e a)) -\u003e (a -\u003e a)) -\u003e IO ()\nmeasure name memo = do\n  t0 \u003c- getCurrentTime\n  print $ maximumBy (comparing (memo collatzF)) [0..1000000]\n  t1 \u003c- getCurrentTime\n  putStrLn $ name ++ \" \" ++ show (diffUTCTime t1 t0)\n\nmain :: IO ()\nmain = do\n  measure \"chimera\" Data.Chimera.memoizeFix\n  measure \"memoize\" (Data.Function.Memoize.memoFix @Int)\n```\n\nHere `chimera` appears to be 20x faster than `memoize`:\n\n```\n837799\nchimera 0.428015s\n837799\nmemoize 8.955953s\n```\n\n## Magic and its exposure\n\nInternally `Chimera` is represented as a _boxed_ vector\nof growing (possibly, _unboxed_) vectors `v a`:\n\n```haskell\nnewtype Chimera v a = Chimera (Data.Vector.Vector (v a))\n```\n\nAssuming 64-bit architecture, the outer vector consists of 65 inner vectors\nof sizes 1, 1, 2, 2², ..., 2⁶³. Since the outer vector\nis boxed, inner vectors are allocated on-demand only: quite fortunately,\nthere is no need to allocate all 2⁶⁴ elements at once.\n\nTo access an element by its index it is enough to find out to which inner\nvector it belongs, which, thanks to the doubling pattern of sizes,\ncan be done instantly by [`ffs`](https://en.wikipedia.org/wiki/Find_first_set)\ninstruction. The caveat here is\nthat accessing an inner vector first time will cause its allocation,\ntaking O(n) time. So to restore _amortized_ O(1) time we must assume\na dense access. `Chimera` is no good for sparse access\nover a thin set of indices.\n\nOne can argue that this structure is not infinite,\nbecause it cannot handle more than 2⁶⁴ elements.\nI believe that it is _infinite enough_ and no one would be able to exhaust\nits finiteness any time soon. Strictly speaking, to cope with indices out of\n`Word` range and `memoize`\n[Ackermann function](https://en.wikipedia.org/wiki/Ackermann_function),\none could use more layers of indirection, raising access time\nto O([log ⃰](https://en.wikipedia.org/wiki/Iterated_logarithm) n).\nI still think that it is morally correct to claim O(1) access,\nbecause all asymptotic estimates of data structures\nare usually made under an assumption that they contain\nless than `maxBound :: Word` elements\n(otherwise you can not even treat pointers as a fixed-size data).\n\n## Additional resources\n\n* [Lazy streams with O(1) access](https://github.com/Bodigrim/my-talks/raw/master/londonhaskell2020/slides.pdf), London Haskell, 25.02.2020.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbodigrim%2Fchimera","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbodigrim%2Fchimera","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbodigrim%2Fchimera/lists"}