{"id":18621266,"url":"https://github.com/yiling-j/theine","last_synced_at":"2025-05-15T04:06:00.459Z","repository":{"id":65779311,"uuid":"597613673","full_name":"Yiling-J/theine","owner":"Yiling-J","description":"high performance in-memory cache","archived":false,"fork":false,"pushed_at":"2025-05-10T02:24:32.000Z","size":76441,"stargazers_count":393,"open_issues_count":2,"forks_count":8,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-15T04:05:51.162Z","etag":null,"topics":["cache","django","memory","python","ttl"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Yiling-J.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-02-05T04:20:07.000Z","updated_at":"2025-05-09T16:31:43.000Z","dependencies_parsed_at":"2024-08-26T13:31:59.443Z","dependency_job_id":"57abe515-fbc5-4c52-8652-4c7001ea3b5b","html_url":"https://github.com/Yiling-J/theine","commit_stats":null,"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yiling-J%2Ftheine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yiling-J%2Ftheine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yiling-J%2Ftheine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yiling-J%2Ftheine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Yiling-J","download_url":"https://codeload.github.com/Yiling-J/theine/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254270646,"owners_count":22042859,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cache","django","memory","python","ttl"],"created_at":"2024-11-07T04:10:06.198Z","updated_at":"2025-05-15T04:05:55.443Z","avatar_url":"https://github.com/Yiling-J.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Theine\n\n**IMPORTANT: Theine is currently undergoing a major rewrite and refactor to become a thread-safe, high-performance concurrent cache. V2 will support free-threading in Python and will focus on enhancing multi-threading performance. Some APIs will change in the update.**\n\nPlanned Updates in V2\n- **Single Policy**: V2 will feature only one caching policy, Adaptive Window-TinyLFU, so the policy option in the API will be removed.\n- **Improved Hit Ratio**: The current TinyLFU (tlfu) policy will be optimized to achieve a higher hit ratio.\n- **Unified Expiration Thread**: Instead of each cache instance using a separate thread for proactive expiration as in V1, V2 will utilize a single thread, with all cache instances scheduling expirations through asyncio.\n- **Enhanced Thread Safety and Concurrency**: Mutexes will be added to ensure thread safety, alongside advanced concurrency optimizations such as sharding to boost performance.\n- **Memory Optimization**: Memory usage per cached item will be reduced.\n\nPython’s free-threading support is still evolving rapidly. And compared to Go, the Python ecosystem and tooling around free-threading remain relatively immature, with ongoing uncertainties around safety and scalability. As a result, progress may take some time. If you're interested in the current state of free-threaded Python, you can read more in this discussion: [PEP 779 – Criteria for Supported Status for Free-Threaded Python](https://discuss.python.org/t/pep-779-criteria-for-supported-status-for-free-threaded-python/84319).\n\n---\n\nHigh performance in-memory cache inspired by [Caffeine](https://github.com/ben-manes/caffeine).\n\n- High performance [Rust core](https://github.com/Yiling-J/theine-core)\n- High hit ratio with [W-TinyLFU](https://arxiv.org/pdf/1512.00727.pdf) or [Clock-Pro](https://static.usenix.org/event/usenix05/tech/general/full_papers/jiang/jiang_html/html.html) eviction policy\n- Expired data are removed automatically using [hierarchical timer wheel](http://www.cs.columbia.edu/~nahum/w6998/papers/ton97-timing-wheels.pdf)\n\n  \u003e TTL must be considered in in-memory caching because\nit limits the effective (unexpired) working set size. Efficiently removing expired objects from cache needs to be\nprioritized over cache eviction. - [A large scale analysis of hundreds of in-memory\ncache clusters at Twitter](https://www.usenix.org/system/files/osdi20-yang.pdf)\n- Simple API\n- Django cache backend\n\n## Table of Contents\n\n- [Requirements](#requirements)\n- [Installation](#installation)\n- [Cache Eviction Policies](#cache-eviction-policies)\n- [API](#api)\n- [Decorator](#decorator)\n- [Django Cache Backend](#django-cache-backend)\n- [Metadata Memory Overhead](#metadata-memory-overhead)\n- [Benchmarks](#benchmarks)\n  * [continuous benchmark](#continuous-benchmark)\n  * [10k requests](#10k-requests)\n  * [hit ratios](#hit-ratios)\n- [Support](#support)\n\n## Requirements\nPython 3.7+\n\n## Installation\n```\npip install theine\n```\n\n## Cache Eviction Policies\n\nTheine provides 3 built in cache eviction policies:\n\n#### LRU\n\nDiscards the least recently used items first.\n\n#### W-TinyLFU\n\nAn approximate LFU policy in order to boost the effectiveness of caches subject to skewed access distributions.\n\nTheine uses an adaptive version of W-TinyLFU to get better hit ratio under different types of workloads.\n\nReference:\n\nhttps://arxiv.org/pdf/1512.00727.pdf\n\n\n#### Clock-PRO\n\nAn improved CLOCK replacement policy(CLOCK: an approximation of LRU), based on [PyClockPro](https://bitbucket.org/SamiLehtinen/pyclockpro/src/master/).\n\nReference:\n\nhttps://static.usenix.org/event/usenix05/tech/general/full_papers/jiang/jiang_html/html.html\n\n\n## API (V1)\n\nKey should be a **Hashable** object, and value can be any **Python object**. If key type is not **str/int**, Theine will generate a unique key string automatically, this unique str will use extra space in memory and increase get/set/remove overhead.\n\nEach Cache instance will span a thread to evict expired entries proactively, and the overhead of cache instance init is relatively high. So **don't create instance dynamically in your function**. Django adapter will create a global cache instance autmoatically, and when using the `Memoize` decorator, please make sure your cache instance is created globally, instead of creating a new one in each run.\n\nPlease be aware the Cache class is **not** thread-safe.\n\n```Python\nfrom theine import Cache\nfrom datetime import timedelta\n\n# tlfu is the eviction policy, Theine provide 3 policies lru/tlfu/clockpro\ncache = Cache(\"tlfu\", 10000)\n# without default, return None on miss\nv = cache.get(\"key\")\n\n# with default, return default on miss\nsentinel = object()\nv = cache.get(\"key\", sentinel)\n\n# set with ttl\ncache.set(\"key\", {\"foo\": \"bar\"}, timedelta(seconds=100))\n\n# delete from cache\ncache.delete(\"key\")\n\n# close cache, stop timing wheel thread\ncache.close()\n\n# clear cache\ncache.clear()\n\n# get current cache stats, please call stats() again if you need updated stats\nstats = cache.stats()\nprint(stats.request_count, stats.hit_count, stats.hit_rate)\n\n# get cache max size\ncache.max_size\n\n# get cache current size\nlen(cache)\n\n```\n\n## Decorator\nTheine support hashable keys, so to use a decorator, a function to convert input signatures to hashable is necessary. **The recommended way is specifying the function explicitly**, this is approach 1, Theine also support generating key automatically, this is approach 2. Same as Theine API, if key function return type is not **str/int**, Theine will generate a unique key string automatically, this unique str will use extra space in memory and increase get/set/remove overhead.\n\n**- explicit key function**\n\n```python\nfrom theine import Cache, Memoize\nfrom datetime import timedelta\n\n@Memoize(Cache(\"tlfu\", 10000), timedelta(seconds=100))\ndef foo(a:int) -\u003e int:\n    return a\n\n@foo.key\ndef _(a:int) -\u003e str:\n    return f\"a:{a}\"\n\nfoo(1)\n\n# asyncio\n@Memoize(Cache(\"tlfu\", 10000), timedelta(seconds=100))\nasync def foo_a(a:int) -\u003e int:\n    return a\n\n@foo_a.key\ndef _(a:int) -\u003e str:\n    return f\"a:{a}\"\n\nawait foo_a(1)\n\n```\n\n**Pros**\n- Both sync and async support.\n- Explicitly control how key is generated. Most remote cache(redis, memcached...) only allow string keys, return a string in key function make it easier when you want to use remote cache later.\n- Thundering herd protection(multithreading: set `lock=True` in `Memoize`, asyncio: always enabled).\n- Type checked. Mypy can check key function to make sure it has same input signature as original function and return a hashable.\n\n**Cons**\n- You have to use 2 functions.\n- Performance. Theine API: around 8ms/10k requests -\u003e\u003e decorator: around 12ms/10k requests.\n\n**- auto key function**\n\n```python\nfrom theine import Cache, Memoize\nfrom datetime import timedelta\n\n@Memoize(Cache(\"tlfu\", 10000), timedelta(seconds=100), typed=True)\ndef foo(a:int) -\u003e int:\n    return a\n\nfoo(1)\n\n# asyncio\n@Memoize(Cache(\"tlfu\", 10000), timedelta(seconds=100), typed=True)\nasync def foo_a(a:int) -\u003e int:\n    return a\n\nawait foo_a(1)\n\n```\n**Pros**\n- Same as explicit key version.\n- No extra key function.\n\n**Cons**\n- Worse performance: around 18ms/10k requests.\n- Unexpected memory usage. The auto key function use same methods as Python's lru_cache. Take a look [this issue](https://github.com/python/cpython/issues/88476) or [this one](https://github.com/python/cpython/issues/64058).\n\n\n## Django Cache Backend\n\n```Python\nCACHES = {\n    \"default\": {\n        \"BACKEND\": \"theine.adapters.django.Cache\",\n        \"TIMEOUT\": 300,\n        \"OPTIONS\": {\"MAX_ENTRIES\": 10000, \"POLICY\": \"tlfu\"},\n    },\n}\n```\n\n## Metadata Memory Overhead\nAssume your key is 24 bytes long, then each meta key entry in Rust is 92 bytes. For 1 million keys, the total memory overhead is 92 megabytes. Clock-Pro will use **2x** meta space, which is 184 megabytes.\n\n## Benchmarks\n\nPython version: 3.11\n\nOS: Ubuntu 22.04.2 LTS\n\n### continuous benchmark\nhttps://github.com/Yiling-J/cacheme-benchmark\n\n### 10k requests\nCachetools: https://github.com/tkem/cachetools\n\nCacheout: https://github.com/dgilland/cacheout\n\nSource Code: https://github.com/Yiling-J/theine/blob/main/benchmarks/benchmark_test.py\n\nWrite and Mix Zipf use 1k max cache size, so you can see the high cost of traditional LFU eviction policy here.\n\n|                                        | Read     | Write     | Mix Zipf  |\n|----------------------------------------|----------|-----------|-----------|\n| Theine(Clock-Pro) API                  | 3.07 ms  | 9.86 ms   |           |\n| Theine(W-TinyLFU) API                  | 3.42 ms  | 10.14 ms  |           |\n| Theine(W-TinyLFU) Auto-Key Decorator   | 7.17 ms  | 18.41 ms  | 13.18 ms  |\n| Theine(W-TinyLFU) Custom-Key Decorator | 6.45 ms  | 17.67 ms  | 11.50 ms  |\n| Cachetools LFU Decorator               | 15.70 ms | 627.10 ms | 191.04 ms |\n| Cacheout LFU Decorator                 | 50.05 ms | 704.70 ms | 250.95 ms |\n| Theine(LRU) Custom-Key Decorator       | 5.70 ms  | 16.04 ms  | 10.91 ms  |\n| Cachetools LRU Decorator               | 14.05 ms | 61.06 ms  | 36.89 ms  |\n| Cacheout LRU Decorator                 | 47.90 ms | 94.94 ms  | 68.25 ms  |\n\n### hit ratios\n\nAll hit ratio benchmarks use small datasets and finish in seconds/minutes, better to try Theine yourself and focus on whether the cache exceeds your performance needs and has the desired capabilities.\n\nSource Code: https://github.com/Yiling-J/theine/blob/main/benchmarks/trace_bench.py\n\n**zipf**\n\n![hit ratios](benchmarks/zipf.png)\n**search**\n\nThis trace is described as \"disk read accesses initiated by a large commercial search engine in response to various web search requests.\"\n![hit ratios](benchmarks/s3.png)\n**database**\n\nThis trace is described as \"a database server running at a commercial site running an ERP application on top of a commercial database.\"\n![hit ratios](benchmarks/ds1.png)\n**Scarabresearch database trace**\n\nScarabresearch 1 hour database trace from this [issue](https://github.com/ben-manes/caffeine/issues/106)\n![hit ratios](benchmarks/scarab1h.png)\n**Meta anonymized trace**\n\nMeta shared anonymized trace captured from large scale production cache services, from [cachelib](https://cachelib.org/docs/Cache_Library_User_Guides/Cachebench_FB_HW_eval/#running-cachebench-with-the-trace-workload)\n![hit ratios](benchmarks/fb.png)\n\n## Support\nOpen an issue, ask question in discussions or join discord channel: https://discord.gg/StrgfPaQqE\n\nTheine Go version is also available, which focus on concurrency performance, take a look if you are interested: [Theine Go](https://github.com/Yiling-J/theine-go).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyiling-j%2Ftheine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyiling-j%2Ftheine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyiling-j%2Ftheine/lists"}