{"id":25802126,"url":"https://github.com/phantie/multilayer-cache","last_synced_at":"2025-07-19T20:34:37.173Z","repository":{"id":278968323,"uuid":"937155474","full_name":"phantie/multilayer-cache","owner":"phantie","description":"Nano framework for implementing multilayered caching","archived":false,"fork":false,"pushed_at":"2025-02-22T20:47:42.000Z","size":43,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-22T21:27:54.919Z","etag":null,"topics":["caching","framework","layering","pattern"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/phantie.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-22T13:22:12.000Z","updated_at":"2025-02-22T21:07:16.000Z","dependencies_parsed_at":"2025-02-22T21:38:00.056Z","dependency_job_id":null,"html_url":"https://github.com/phantie/multilayer-cache","commit_stats":null,"previous_names":["phantie/multilayer-cache"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phantie%2Fmultilayer-cache","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phantie%2Fmultilayer-cache/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phantie%2Fmultilayer-cache/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phantie%2Fmultilayer-cache/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/phantie","download_url":"https://codeload.github.com/phantie/multilayer-cache/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241036519,"owners_count":19898167,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["caching","framework","layering","pattern"],"created_at":"2025-02-27T16:46:49.135Z","updated_at":"2025-07-19T20:34:37.159Z","avatar_url":"https://github.com/phantie.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Multilayer caching\n\n\"Multilayer\" here means that one cache layer may depend on another, which, in turn, may depend on yet another. It is also capable of forming not only chains but also trees of dependent caches. Value retrievals update the local cache of all downstream layers.\n\n## Our example\n\nFor example, suppose we have files stored in an S3 bucket that we want to cache locally, and also cache the parsed data structures derived from these files. This results in a 2-layer cache structure, with the bucket serving as the data source.\n\n```\nBucket  \n ├── File Cache  \n │   ├── Parsed File Cache  \n```\n\nSay you want a parsed value, so you are concerned with the **Parsed File Cache**.\n\nWith both local caches empty, let's describe what happens during the first value retrieval.\n\n\nSince **Parsed File Cache** would *not* find it in the local cache, it would then try to retrieve it from its dependant - **File Cache**.\n**File Cache** would also *not* find a transformable value in its local cache, and to the dependant it goes - **Bucket**.\n\n\n**Bucket** may or may *not* have a value.\nIf it doesn’t, no local cache updates occur, and the result of retrieval from **Parsed File Cache** is a value representing *Key not found*.\nBut if it does, **File Cache** transforms the value retrieved from **Bucket**, stores the value in its local cache, and **Parsed File Cache** does the same.\n\nWhen values are found in local caches, they pop out immediately, and the caches do not contact their dependants.\n\nIt's a simple recursive algorithm.\n\n### More elaborate cache structures\n\nFor the sake of brevity, we investigate this minimal example, but arbitrary nested tree cache structures are possible nonetheless:\n\n```\nBucket  \n ├── File Cache  \n │   ├── Parsed File Cache  \n │   │   ├── Further Parsed File Cache  \n │   ├── Another Parsed File Cache  \n```\n\n## Implementation\n\n### Common problems\n\nImplementing such caching still may be a challenge. The implementation may suffer from:\n\n- spagetti\n(having a recursive nature, but with finite nesting. For your purposes, you might have started with one layer, but after adding a layer or two more, the code started looking like this from afar:)\n```\n@@@@ outer layer get\n  @@@@ middle layer get\n    @@@@ inner layer get\n  @@@@\n@@@@\n```\n\n- imposing too tight contracts and controlling the local cache.\n\n- mixing in more logic than necessary (due to lack of formalization and restrictions).\n\n\n### Approach\n\nIn implementation, the concern is to provide the most flexible way to construct caches. This is achieved by imposing only the essential constraints to the problem, which, at the same time, provide freedom by enforcing similarity across different layers.\n\n### Python implementation\n\n```python\n# Represents value type a cache returns\nT = TypeVar(\"T\")\n# Represent [K]ey used for retrieving from local cache or source\nK = TypeVar(\"K\")\n# Represents unique (in \"is\" operation) [D]efault value that should be returned on not found key\nD = TypeVar(\"D\")\n\n\ndef cache_layer(\n    # A way to get a cache key\n    get_cache_key: Callable[[], K],\n    # A way to use the key from local cache to get a value\n    get_cache_value: Callable[[K, D], T | D],\n    # A way to update local cache with the key and value\n    set_cache_value: Callable[[K, T], None],\n    # A way to get value from the dependant source with the key\n    on_cache_miss_source: Callable[[K, D], T | D],\n    # A way to get a unique value the local cache and dependant source would return when the key not found\n    get_default: Callable[[], D],\n    # A way to get an identifier for a cache layer\n    get_identifier: Callable[[], Any],\n    # Handler of generated events, for example for testing and logging\n    inspect: Callable[[CacheLayerInspect], None] = lambda _: None,\n) -\u003e T | D:\n    ...\n```\n\n### Constraints\n\nFor nesting of layers L(0..N) to be possible (where L_0 is the most inner layer and L_N is the most outer layer)\n\nT(0..N) must be such that, there must exist a one-way transformation (morfism) T_0 -\u003e T_N.\nSimply, there must be a way to reduce a **value** passing from *inner to outer* layer.\nFor example, it works with bytes -\u003e decoded bytes -\u003e parsed json\n\nK(0..N) must be such that, there must exist a one-way transformation (morfism) K_N -\u003e K_0.\nSimply, there must be a way to reduce a **key** passing from *outer to inner* layer.\n\n### [multilayer_cache](https://github.com/phantie/multilayer-cache) is a library containing cache_layer among other things (asynchronous and type hinted cache layer, examples)\n\nSo let's implement the 2-layer example\n\n```python\nfrom multilayer_cache import cache_layer\nfrom multilayer_cache import type_hinted_cache_layer\nfrom multilayer_cache import KEY_NOT_FOUND\nfrom multilayer_cache import CacheLayerInspect\nfrom multilayer_cache import CacheLayerInspectHit\nfrom multilayer_cache import CacheLayerInspectMiss\n\nimport json\nfrom functools import partial\nfrom typing import TypeAlias\nfrom typing import TypeVar\nfrom typing import Any\nfrom typing import Optional\n\nimport pydantic\n\nD = TypeVar(\"D\")\n\n########################################################################\n\n### Define mock blob storage as a mapping from BlobId to FileContents.\n\nBlobId: TypeAlias = str\nFileContents: TypeAlias = str\n\nclass Bucket(pydantic.BaseModel):\n    files: dict[BlobId, FileContents]\n\n    def get(self, blob_id: BlobId, default: D) -\u003e FileContents | D:\n        return self.files.get(blob_id, default)\n\nbucket = Bucket(\n    files = {\n        \"a\": json.dumps({\"key\": \"a\", \"value\": \"a\"}),\n        \"b\": json.dumps({\"key\": \"b\", \"value\": \"b\"}),\n    }\n)\n\n########################################################################\n\n### Let's implement the first layer: an in-memory cache layer preserving raw files.\n\nFilesInnerCache: TypeAlias = dict[BlobId, FileContents]\n\n# It doesn't enforce local cache management.\n# You may provide any and manage it as you like.\n# An in-memory solution is the shortest to demonstrate.\nfiles_inner_cache: FilesInnerCache = {}\n\n# Let's match against the generated events.\nevents = []\n\ndef on_cache_miss_source(cache_key: BlobId, default: D) -\u003e FileContents | D:\n    blob_id = cache_key\n    # It's important to enforce a contract that lets you know when a value was not found because\n    # most of the time, a library would throw its own exception.\n    return bucket.get(blob_id, default)\n\n# Bake in common parameters\nfiles_cache_layer_partial = partial(\n    cache_layer,\n    # get_cache_key=\n    get_cache_value=lambda key, default: files_inner_cache.get(key, default),\n    set_cache_value=lambda key, value: files_inner_cache.update({key: value}),\n    on_cache_miss_source=on_cache_miss_source,\n    # get_default=\n    get_identifier=lambda: \"raw_files\",\n    inspect=lambda event: events.append(event),\n)\n\n# Make a call with the key \"a\" (we know the bucket has it).\nresult = files_cache_layer_partial(\n    get_cache_key=lambda: \"a\",\n    # do not bake in default because outer layers provide their own\n    get_default=lambda: KEY_NOT_FOUND,\n)\n\n# As expected, we received the unchanged value from the blob and cached it locally.\n# The same call would return the already cached value.\nassert result == '{\"key\": \"a\", \"value\": \"a\"}'\n\n\nmatch events:\n    # One miss event was generated because the key was missing from the cache.\n    case [\n        CacheLayerInspect(identifier='raw_files', value=CacheLayerInspectMiss(key='a')),\n    ]:\n        pass\n    case _:\n        raise ValueError\n\nevents.clear()\n\n\n# Let's do the same call\nresult = files_cache_layer_partial(\n    get_cache_key=lambda: \"a\",\n    get_default=lambda: KEY_NOT_FOUND,\n)\n\nmatch events:\n    # Now it's a hit\n    case [\n        CacheLayerInspect(identifier='raw_files', value=CacheLayerInspectHit(key='a')),\n    ]:\n        pass\n    case _:\n        raise ValueError\n\nevents.clear()\n\n\n# Make a call with the key \"c\" (we know the bucket does not have it).\nresult = files_cache_layer_partial(\n    get_cache_key=lambda: \"c\",\n    get_default=lambda: KEY_NOT_FOUND,\n)\n\n# As expected, the value was not found, and nothing has changed.\nassert result is KEY_NOT_FOUND\n\n\n########################################################################\n\n### Let's implement the second layer: an in-memory cache of parsed files.\n\n# To demonstrate more complex key usage,\n# we'll version a parser.\nParserVersion: TypeAlias = str\n\n# To demonstrate transformations to and from the local cache,\n# we'll serialize the model to a string and back.\nParsedFileCompressed: TypeAlias = str\n\n# To demonstrate the transformation of a value retrieved from a dependant source,\n# we'll parse it.\nclass ParsedFile(pydantic.BaseModel):\n    key: Any\n    value: Any\n\n\n# It's common for parsers to change, so data parsed with one version may not be compatible with another.\n\n# You are free to manage (invalidate) the local cache however you like in this regard.\n\n# You may clean it when a parser with a newer version is used, \n# keep all the data,\n# restrict it by size and keep the latest data,\n# or use a database or network.\n\n# It's still your choice and an exercise for the reader.\nclass JsonParser:\n    def version(self) -\u003e ParserVersion:\n        return \"0\"\n\n    def parse(self, value: FileContents) -\u003e ParsedFile:\n        return ParsedFile.model_validate_json(value)\n\n\nparser = JsonParser()\n\nParsedFilesKey: TypeAlias = tuple[BlobId, ParserVersion]\nParsedFilesInnerCache: TypeAlias = dict[ParsedFilesKey, ParsedFileCompressed]\n\nparsed_files_inner_cache: FilesInnerCache = {}\n\ndef on_cache_miss_source(cache_key: ParsedFilesKey, default: D) -\u003e ParsedFile | D:\n    # The inner layer requires only the blob_id.\n    blob_id, _parser_version = cache_key\n\n    # Use the raw files cache and provide a key and a default.\n    value = files_cache_layer_partial(\n        get_cache_key=lambda: blob_id,\n        get_default=lambda: default,\n    )\n\n    # Pop out the default.\n    if value is default:\n        return default\n\n    # Transform the found value to this cache return type.\n    value = parser.parse(value)\n\n    # This value will be passed to be stored in the local cache.\n    return value\n\n\nparsed_files_cache_layer_partial = partial(\n    # The type_hinted_cache_layer allows you to type hint ahead of time,\n    # making it better to work with lambdas.\n    type_hinted_cache_layer[ParsedFile, ParsedFilesKey, Any].new,\n    # get_cache_key=\n    on_cache_miss_source=on_cache_miss_source,\n    get_cache_value = lambda key, default: (\n        ParsedFile.model_validate(cached) \n        if (cached := parsed_files_inner_cache.get(key, default)) is not default \n        else default\n    ),\n    set_cache_value=lambda key, value: parsed_files_inner_cache.update({key: value.model_dump_json(by_alias=True)}),\n    # get_default=\n    get_identifier=lambda: \"parsed_files\",\n)\n\n\nresult = parsed_files_cache_layer_partial(\n    # Provide both the blob_id and the parser version.\n    get_cache_key=lambda: (\"a\", parser.version()),\n    get_default=lambda: KEY_NOT_FOUND,\n)\n\n# As a result, we've got a parsed file cached on all layers.\nassert result == ParsedFile(key=\"a\", value=\"a\")\n\n\n########################################################################\n\n### The layers are implemented, but the composition is up to your imagination.\n\n# For example, we could provide a more user-friendly interface.\ndef get_parsed_file(blob_id: BlobId, parser: JsonParser) -\u003e Optional[ParsedFile]:\n    value = parsed_files_cache_layer_partial(\n        get_cache_key=lambda: (blob_id, parser.version()),\n        get_default=lambda: KEY_NOT_FOUND,\n    )\n\n    return None if value is KEY_NOT_FOUND else value\n```\n\n### Async capabilities\n\nThe [multilayer_cache](https://github.com/phantie/multilayer-cache) library also has an asynchronous cache layer (async_cache_layer). The difference is that it takes as arguments asynchronous functions instead of synchronous. See [async_cached_files](https://github.com/phantie/multilayer-cache/blob/main/multilayer_cache/tests/test_async_cached_files.py) example.\n\nSince retrieving values from the cache is a parallelizable operation when used with many keys, it would work nicely with asyncio.gather or asyncio.Semaphore.\n\n## Conclusion\n\nCaching **can** be fun. (somewhat)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphantie%2Fmultilayer-cache","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphantie%2Fmultilayer-cache","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphantie%2Fmultilayer-cache/lists"}