{"id":21253847,"url":"https://github.com/ebonnal/flexible-memoization","last_synced_at":"2026-05-19T14:07:01.012Z","repository":{"id":51953987,"uuid":"201510492","full_name":"ebonnal/flexible-memoization","owner":"ebonnal","description":"[Master's Thesis] Memoization library","archived":false,"fork":false,"pushed_at":"2024-02-11T15:57:03.000Z","size":505,"stargazers_count":3,"open_issues_count":3,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-07T02:49:46.720Z","etag":null,"topics":["apache-ignite","apache-spark","big-data","cache","caching","caffeine","memoization","scala","toolbox"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ebonnal.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-08-09T17:10:23.000Z","updated_at":"2025-01-05T17:43:11.000Z","dependencies_parsed_at":"2022-08-23T15:30:36.840Z","dependency_job_id":null,"html_url":"https://github.com/ebonnal/flexible-memoization","commit_stats":null,"previous_names":["ebonnal/flexible-memoization"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/ebonnal/flexible-memoization","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ebonnal%2Fflexible-memoization","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ebonnal%2Fflexible-memoization/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ebonnal%2Fflexible-memoization/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ebonnal%2Fflexible-memoization/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ebonnal","download_url":"https://codeload.github.com/ebonnal/flexible-memoization/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ebonnal%2Fflexible-memoization/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259421191,"owners_count":22854760,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-ignite","apache-spark","big-data","cache","caching","caffeine","memoization","scala","toolbox"],"created_at":"2024-11-21T03:53:15.943Z","updated_at":"2025-10-12T20:08:30.659Z","avatar_url":"https://github.com/ebonnal.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Flexible-memoization\n[![Actions Status](https://github.com/ebonnal/flexible-memoization/workflows/unit-tests/badge.svg)](https://github.com/ebonnal/flexible-memoization/actions) [![Actions Status](https://github.com/ebonnal/flexible-memoization/workflows/deploy-package/badge.svg)](https://github.com/ebonnal/flexible-memoization/actions)\n\nA *Scala* memoization library whose goals are to:\n\n- Be **simple** to use and to extend.\n- Handle **any function** signature.\n- Treat objects **equality by value** and content as much as possible.\n- Provide built-in cache management options **fetching the majority of the needs**, from medium size use cases needing a versatile eviction policy to cluster wide use cases needing to scale.\n\n## Powered by\n- [Apache Ignite](https://github.com/apache/ignite)\n- [Caffeine](https://github.com/ben-manes/caffeine)\n- [KeyChain](https://github.com/craiig/keychain-tools/tree/master/udf-hash)\n\n*Extract from [master thesis](https://bonnal-enzo.github.io/masterthesis.html):*\n\n## 3.2. Usage overview\nThis section will briefly explain how one can use *flexible-memoization*.  \n\n### 3.2.1. Install\n*Flexible-memoization* is [packaged as a *GitHub Maven package*](https://github.com/bonnal-enzo/flexible-memoization/packages). You can use it through the following dependency in your `pom.xml`:\n\n```xml\n\u003cproject ...\u003e\n    ...\n    \u003cdependencies\u003e  \n      [...] \n      \u003cdependency\u003e   \n        \u003cgroupId\u003ecom.bonnalenzo\u003c/groupId\u003e    \n        \u003cartifactId\u003eflexible-memoization\u003c/artifactId\u003e    \n        \u003cversion\u003e1.0.3-ignite2.7.5-s_2.11\u003c/version\u003e     \n      \u003c/dependency\u003e\n    ...\n    \u003c/dependencies\u003e \n    ... \n    \u003crepositories\u003e\n        ...\n        \u003crepository\u003e\n            \u003cid\u003egithub-bonnal-enzo\u003c/id\u003e\n            \u003curl\u003ehttps://maven.pkg.github.com/bonnal-enzo/flexible-memoization\u003c/url\u003e\n        \u003c/repository\u003e\n    \u003c/repositories\u003e\n    ...\n\u003c/project\u003e\n```  \n\nAnd in your `/path/to/.m2/settings.xml`\n\n\n```xml\n\u003csettings xmlns=\"http://maven.apache.org/SETTINGS/1.0.0\"\n          xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n          xsi:schemaLocation=\"http://maven.apache.org/xsd/settings-1.0.0.xsd\"\u003e\n    ...\n    \u003cservers\u003e\n        ...\n        \u003cserver\u003e\n              \u003cid\u003egithub-bonnal-enzo\u003c/id\u003e\n              \u003cusername\u003eYOUR_GITHUB_USERNAME\u003c/username\u003e\n              \u003cpassword\u003eYOUR_GITHUB_TOKEN\u003c/password\u003e\n        \u003c/server\u003e\n  \u003c/servers\u003e\n\u003c/settings\u003e\n```\n\nCheck out the [documentation about installing a *GitHub Maven package*](https://help.github.com/en/github/managing-packages-with-github-packages/configuring-apache-maven-for-use-with-github-packages#installing-a-package).\n\n\n### 3.2.2. Hello example\n1. Import Memo class\n\n```scala\nimport com.bonnalenzo.flexiblememoization.memo.impl.Memo\n```\n\n2. Create your memoizer\n\n```scala\nval memo: Memoizer = new Memo()  \n```\n\n3. Memoize any Scala function you want to\n\n```scala\nval memoHello = memo((name: String) =\u003e s\"Hello $name !\")  \n```\n\n4. Use the memoized function\n\n```scala\nprintln(memoHello(\"Enzo\"))\n```\n\n### 3.2.3. Recursive examples\nHere are some classic examples computing Fibonacci series (not optimal $O(n)$ implementation) and factorial, using our `memo` instance previously created:\n\n```scala\nlazy val memoFibo: Int =\u003e Int = \n  memo({  \n    case 0 | 1 =\u003e 1  \n    case n: Int =\u003e memoFibo(n - 1) + memoFibo(n - 2)\n  })\n  \nlazy val memoFacto: Int =\u003e Int = \n  memo({  \n    case 0 | 1 =\u003e 1  \n    case n: Int =\u003e n*memoFacto(n - 1)\n  })\n```\n\nThe trick is to leverage Scala's lazy value declarations.\n\n## 3.3. Structure\n\n### 3.3.1. UML\n*Flexible-memoization* follows mainly the *Object Oriented Programming* principles, nevertheless it also tries not to break too many *Functional Programming* principles. \n\nAs a support for the following explanations, here is a UML diagram summarizing the relations between *flexible-memoization*'s abstractions:\n\n![enter image description here](https://raw.githubusercontent.com/bonnal-enzo/flexible-memoization/master/figs/UML.png)\n\n### 3.3.2. Core abstractions\n*Flexible-memoization* core abstractions intend to:\n- Guarantee a memoization interface as simple and exhaustive as possible allowing users to memoize any function with minimal effort\n- Expose segregated interfaces easy to override, especially for custom caches integration.\n\nNote: In Scala the abstractions are defined as *traits*. One can view traits as an equivalent for both Java *interfaces* and *abstract classes*. A *mixin* is a trait containing concrete features used to add behaviors to a class inheriting it.\n\n#### 3.3.2.1. Cache sub-package\nThe cache package defines abstractions related to cache structure. It is independent from any memoization concern.\n\n#### *Gettable*\n`Gettable` trait defines the interface for the most central feature of a cache, the retrieving and insertion of *key-value* pairs. \nIt only forces its sub-classes to implement one method `getOrElseUpdate(key: Int, value: =\u003e Any): Any`. This method takes an integer as a key and a value of any type. Note that the arrow in `value: =\u003e Any` means that the value argument is lazy, so that the code passed as value will only be run in case it is needed, *i.e.* when key is missing in the cache.\n\n#### *NotifiableMixin*\nA object mixing `NotifiableMixin` earns the ability to be notified when another object starts to depend on it and when this dependency ends. It is an observer in the *Observer Pattern*, that listens on its `notifyDependencyStart` and its `notifyDependencyEnd` methods.\n\n`NotifiableMixin` extends `ClosableMixin`. It also keeps a counter of the number of objects depending on him. When this counter pass from 1 to 0, it calls its `ClosableMixin`'s `close` method.\n\n#### *Cache*\n`Cache` trait defines the minimal features needed to back a `Memoizer`. It is a `Gettable` mixing `NotifiableMixin`.\n\n#### *HitCounterMixin*\n`HitCounterMixin` provides methods to update counters of hits and misses, allowing to retrieve them or to compute *hit-ratio* score.\n\n#### *CacheBuilder*\n`CacheBuilder` is a *Builder Pattern*, forcing the implementation of a `build(): Cache` method. It provides to the user a simple way to instantiate `Cache`s, hiding potentially complex or confusing details of construction.\n\n#### 3.3.2.2. Memo sub-package\nThe memo sub-package relies on the cache sub-package and contains sources relative to memoization.\n\n#### 3.3.2.3. Memoizer\n`Memoizer` defines the central interface of the library, the one used to memoize functions. \n\nIt enforces the exhaustive handling of any function. It exposes overloaded `apply` methods. There is 22 of them, one for each possible function arity, from `I =\u003e R` to `(I1, I2, I3, ..., I22) =\u003e R`. Unlike in many other memoization libraries, here the user is not forced to *tupleize*  its multi-arity functions (*i.e.* to change `(I1, I2, ..., In) =\u003e R` into `((I1, I2, ..., In)) =\u003e R`).\n\nThe `apply` methods return a function that has the same signature as the original function taken in input. It is in addition decorated with the behaviors of the class `MemoizedFunction` that is introduced in the next point. This is a form of dynamic *Decorator Pattern*.\n\nEach one of the 22 `apply`'s overloadings are doubled with a variant. These alternatives take in input another function that has the same signature as the function being memoized except that it is forced to return a Boolean. This second function will act as a *trigger* for memoization behavior:\n\n```scala\nval iLastChars = (i: Int, s: String) =\u003e\n  s.substring(Math.max(0, s.length - i))\n  \nval memoILastChars = new Memo()(\n  iLastChars, \n  trigger=(i: Int, _: String) =\u003e i \u003e 1000\n)\n```\n\nThe following lines create a memoized function `memoILastChars` that will only use its underlying cache to store or retrieve values if its `i` input argument is greater than 1000. \nThis feature aims at letting user chose to only leverage caching when it worth it.\n\n#### 3.3.2.4. MemoizedFunction\n`MemoizedFunction` class defines behaviors that will decorate memoized functions. It owns an id and a `Cache` instance.\n\nIt is a subject in the *Observer Pattern* formed with `NotifiableMixin`: When a `MemoizedFunction` is created it notifies its `Cache` attribute and when it is *garbage collected*, it also informs it. This last notification is done in the `finalyze()` method that is the method that the *garbage collector* calls on each object just before it cleans them. This notification system allows caches to close themselves if needed, when no more memoized functions depends on them.\n\n### 3.3.3. Provided implementation\nOne can find built-in implementations of previous abstractions, in `com.bonnalenzo.flexiblememoization.memo.impl` and `com.bonnalenzo.flexiblememoization.cache.impl`. The next section will introduce them.\n\n#### 3.3.3.1. Memo\n`Memo` is the concrete implementation of the `Memoizer` trait. We have already encountered it in the *usage overview* section.\n\n#### *Equality handling*\n`Memo` defines how memoized function input will be hashed to become a cache key. Like previously discussed, this is a crucial point that need to be implemented carefully because it directly impact to which extension results will be shared. \n\nThis task is delegated to the method `getHashCode(args: Any*)` owned by `Memo`'s companion object (Scala design equivalent to the use of a static method). `getHashCode(args: Any*)` arguments are pattern matched and based on the case, a different hashing policy is applied:\n- `Iterator`s are hashed using built-in `.hashCode()`, they are matched first to ensure that automatic *iterator-function* conversions will not occur.\n- `Array`s are casted to `Seq`s and hashed using built-in `.hashCode()`.\n- The 22 different function arities are hashed using *KeyChain* [8], the previously presented tool that hash functions based on their *bytecode*.\n- Every other types are hashed using built-in `.hashCode()`.\n\nThe hashes of all the arguments are then put inside a tuple that is hashed in turn using `.hashCode()`.\n\nA memoized function generates keys for a given set of inputs by calling  `getHashCode(args: Any*)` on their *id* followed by every input elements. The *id* being itself generated at the moment of the instantiation using `getHashCode`, we end up with cache results that are shared between memoized functions that have the same original function's *bytecode*.\n\n#### *Underlying Cache*\n`Memo` takes as constructor parameter a `Cache` or  `CacheBuilder` instance at runtime. This convenient *Strategy Pattern* abstracts `Memo` behavior from its underlying cache management.\n\n#### 3.3.3.2. Built-in Caches\nThe library comes with a range of caches whose goal is to fulfill the majority of the needs. It faces the \"no free lunch\" law by giving flexibility to the user. \n\nEach cache is structured in the same way:\n- One or more private classes implemented as *Adapter Patterns* that makes underlying cache technology comply with the `Cache` interface. User should never have to deal with them.\n- A `CacheBuilder` that allows user to easily manipulate the parameters exposed to him and finally to build a `Cache` instance. All the builders are functional oriented in the sense that all the builders are immutable and their parameters setting methods always return new builder objects.\n\n*Flexible-memoization* features three built-in caches:\n- *Caffeine* based one.\n- *Ignite* based one.\n- Map based ones: These are non optimized implementations based only on Scala collections. Still it contains an interesting purely computation-cost-based eviction policy.\n\n#### *Caffeine Based*\nA *Caffeine* based cache is integrated into *flexible-memoization* because it is the most used implementation of the WC-W-TinyLFU eviction algorithm that provides state-of-the-art average performances. It is a very versatile choice for the user, addressing small and medium sized workloads.\n\nThe `CaffeineCacheAdapter` implements `Cache` trait with `HitCounterMixin`, adapting `com.github.blemale.scaffeine.Cache` to the `Cache` interface. \n\nOne can instantiate a memoizer backed by a *Caffeine* cache simply with:\n```scala\nnew Memo(new CaffeineCacheBuilder().withMaxEntryNumber(100000))\n```\n\nNote that the adaptability and the little overhead cost of *Caffeine* based cache makes it a first choice for many needs, thus it has been set to be the default cache for `Memo`: `new Memo()` is equivalent to `new Memo(new CaffeineCacheBuilder())`.\n\n#### *Ignite based*\nWhile the *Caffeine* based cache addresses efficiently small and medium sized workload needs, the *Ignite* based cache provides the horizontal scaling memoization solution that is our primary goal. \n\nAmong *Redis*, *Aerospike* and *Ehcache*, *Ignite* has been selected because it is the only one matching all our requirements: \n- Propose a good set of built-in eviction policies.\n- Leverage *off-heap* memory.\n- Having a peer-to-peer cluster mode that allows it to be easily integrated without having to deal with asymmetric global state issues that *master-slave* or *client-server* models can introduce.\n- Be simple to embed entirely in our Scala library.\n- Be completely open-source.\n\nThe `IgniteCacheAdapter` implements `Cache` trait with `HitCounterMixin`, adapting `org.apache.ignite.IgniteCache` to the `Cache` interface. \n\nThis *Ignite* cache leverages both on-heap and *off-heap* policies.\n\nOne can for example instantiate a memoizer backed by an *Ignite* cache with an initial *off-heap* size of 4Gb with:\n```scala\nnew Memo(new IgniteCacheBuilder()\n  .withOffHeapEviction(OffHeapEviction.RANDOM_LRU)\n  .withInitialOffHeapSize(4*1024*1024)\n  .build())\n```\n\n## 3.4. Answering the problematic\nWe will finally see how to solve our problematic using *flexible-memoization*. This will be as quick as the solution is simple.\n\nGiven a `spark: SparkSession` and a registered table \"table\", here is a minimalist example:\n\n```scala\nval f: Int =\u003e Int = [...]  // some expensive computation\nval igniteMemoizer = new Memo(new IgniteCacheBuilder().build())\nspark.udf.register(\"mf\", igniteMemoizer(f))  \nspark.sql(\"\"\"SELECT col2, sum(mf(col1)), avg(mf(col1)) FROM\n             global_temp.table GROUP BY col2\"\"\")\n```\n\nHere is what happens behind the scene, for a Spark job run in *Client Mode*:\n\n![enter image description here](https://raw.githubusercontent.com/bonnal-enzo/flexible-memoization/master/figs/igniteflowdiagram.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Febonnal%2Fflexible-memoization","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Febonnal%2Fflexible-memoization","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Febonnal%2Fflexible-memoization/lists"}