Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sslivkoff/toolcache
toolcache makes it simple to create and configure caches in python
https://github.com/sslivkoff/toolcache
Last synced: 10 days ago
JSON representation
toolcache makes it simple to create and configure caches in python
- Host: GitHub
- URL: https://github.com/sslivkoff/toolcache
- Owner: sslivkoff
- License: apache-2.0
- Created: 2021-02-13T03:30:26.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-10-04T21:00:10.000Z (about 2 years ago)
- Last Synced: 2024-04-24T05:01:05.738Z (7 months ago)
- Language: Python
- Homepage:
- Size: 62.5 KB
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# toolcache
`toolcache` makes it simple to create and configure caches in python
## Features
- save caches to memory or to disk
- memoize functions, instance methods, `@classmethod`s, and `@staticmethod`s
- control cache size with ttl and eviction policies like lru / fifo / lfu
- use thread safety, process safety, or no safety (default = thread safety)
- use custom hash functions
- track cache usage statistics## Install
`pip install toolcache`## Contents
- [Example Usage](#example-usage)
- [Reference](#cache-reference)
- [Cache Types](#cache-types)
- [Cache Creation](#cache-creation)
- [Cache Configuration](#cache-configuration)
- [Cache Decorators](#cache-decorators)
- [Cache Methods](#cache-methods)
- [Frequently Asked Questions](#frequently-asked-questions)## Example Usage
### Creating Caches
```python
import toolcache# memoize function with memory cache
@toolcache.cache('memory')
def f(a, b, c):
return a * b * c# memoize function with disk cache, stored in a tempdir
@toolcache.cache('disk')
def f(a, b, c):
return a * b * c# memoize function with disk cache, stored in a persistent dir
@toolcache.cache('disk', cache_dir='/path/to/cache/dir')
def f(a, b, c):
return a * b * c
# remove cache entries once they reach a specific age
@toolcache.cache('disk', ttl='24 hours')
def f(a, b, c):
return a * b * c# remove cache entries once cache reaches a specific size
@toolcache.cache('disk', max_size=3, max_size_policy='fifo')
def f(a, b, c):
return a * b * c# specify which args are used to create unique hash of inputs
@toolcache.cache('disk', hash_args=['a', 'b'])
def f(a, b, c):
return a * b * c# create standalone cache
standalone_cache = toolcache.MemoryCache()
```### Using Caches
```python
# get cache size
print(f.cache.get_cache_size())
> 4# track cache usage statistics
print(f.cache.stats)
> {'n_checks': 6,
> 'n_deletes': 2,
> 'n_hashes': 8,
> 'n_hits': 2,
> 'n_loads': 1,
> 'n_misses': 4,
> 'n_saves': 3,
> 'n_size_evictions': 0,
> 'n_ttl_evictions': 0}# clear cache
f.cache.delete_all_entries()
```### More Examples
- [Configure Hashing](examples/function_hashing_options.py)
- [Configure Disk Caches](examples/disk_cache_options.py)
- [Use Cache Eviction Policies](examples/cache_eviction_policies.py)
- [Monitor Cache Usage](examples/monitor_cache_statistics.py)
- [Create Standalone Caches](examples/standalone_cache.py)
- [Define Custom Cachetypes](examples/define_custom_cache.py)## Cache Reference
### Cache Types
`toolcache` includes 3 cache types that each inherit from abstract cache class `BaseCache`:
| cachetype | description | use case |
| -- | -- | -- |
| `MemoryCache` | cache that saves each entry as key-value pair in a `dict` | speed |
| `DiskCache` | cache that saves each entry as a file to disk | persistence, or large data that does not fit in memory |
| `NullCache` | cache that does not save any entries | programmatically disabling cache |### Cache Creation
Caches can be created in two ways:
1. decorating a function with `@toolcache.cache(cachetype)` where `cachetype` is `'memory'`, `'disk'`, `'null'`, or a class inheriting from `BaseCache`
2. creating a standalone cache by instantiating a class that inherits from `BaseCache`### Cache Configuration
The configuration options listed below can be passed to `toolcache.cache()` or passed to a standalone cache during initialization.
#### General Config
these configuration options are available to every cache
| arg | description | example value | default behavior |
| -- | -- | -- | -- |
| `safety` | `str` name of concurrency safety level, one of `'thread'`, `'process'`, or `None` | `'thread'` | `'thread'` |
| `verbose` | `bool` of whether to print info whenever saving to or loading from cache | `False` | `False` |
| `cache_name` | `bool` of whether to print info whenever saving to or loading from cache | `'important_cache'` | use decorated function name, or uuid for a standalone cache |#### Hash Config
| arg | description | example value | default behavior |
| -- | -- | -- | -- |
| `f_hash` | custom function for computing hash | `lambda x: hash(x)` | `toolcache. compute_hash_json()` |
| `normalize_hash_inputs` | bool of whether to normalize function calls so that for a function `f` with args `a` and `b`, the calls `f(1, 2)` and `f(a=1, b=2)` are equivalent | `False` | `False` |
| `hash_include_args` | `list` of `str` names of arguments used to compute hash | `['arg1', 'arg2']` | include all args |
| `hash_exclude_args` | `list` of `str` names of arguments excluded from hash | `['arg3', 'arg4']` | exclude no args |#### Eviction Config
| arg | description | example value | default behavior |
| -- | -- | -- | -- |
| `ttl` | [`Timelength`](https://github.com/sslivkoff/tooltime#timelength-representations) of time-to-live maximum age for entries in cache | `'1000s'` | no max age |
| `max_size` | `int` of max size of cache size | `1000` | no max size |
| `max_size_policy` | `str` name of eviction policy to use when `max_size` is exceeded, one of `'lru'`, `'fifo'`, or `'lfu'` | `'fifo'` | `'lru'' |#### Statistic Tracking Config
| arg | description | example value | default behavior |
| -- | -- | -- | -- |
| `track_basic_stats` | `bool` of whether to track basic usage stats | `False` | `False` |
| `track_detailed_stats` | `bool` of whether to track creations and accesses | `False` | `False` |
| `track_creation_times` | `bool` of whether to track creation times | `False` | track only if `ttl` is not `None` or `max_size_policy == 'fifo'` |
| `track_access_times` | `bool` of whether to track access times | `False` | track only if `max_size_policy == 'lru'` |
| `track_access_counts` | `bool` of whether to track access counts | `False` | track only if `max_size_policy == 'lfu'` |#### `DiskCache`-specific Config
| arg | description | example value | default behavior |
| -- | -- | -- | -- |
| `cache_dir` | `str` of directory path to store cache data | `'/path/to/cache_dir'` | create a `tmpdir` |
| `file_format` | `str` of file format to use for cache data, either `'pickle'` or `'json'` | `'json'` | `'pickle'` |
| `f_disk_save` | custom function for saving data to disk, function should take `entry_path` and `entry_data` as arguments | `f_save` | save as pickle |
| `f_disk_load` | custom function for load data from disk, function should take `entry_path` as an argument | `f_load` | load as pickle |### Cache Decorators
When using `toolcache.cache()` to decorate a function, one should consider 1) how function inputs will be hashed, 2) what attributes will be added to the function, and 3) what arguments might be added to the function.
#### Hashing Function Inputs
To save a function input-output pair within a cache, a unique hash must be taken of the inputs.
Under the default hash configuration, each input arg should either be json-serializable or be a hashable object (i.e. it implements a `__hash__()` method). By default `toolcache` uses [`orjson`](https://github.com/ijl/orjson) to create these hashes quickly.
If function inputs do not satisfy these criteria, one or more of the cache config parameters should be used:
| parameter | description | example |
| -- | -- | -- |
| `f_hash` | provide a custom hash function that takes the same args and kwargs as the decorated function | `@toolcache.cache(..., f_hash=f_custom_hash)` |
| `hash_include_args` | specify `list` of arg names that should be used to compute hash | `@toolcache.cache(..., hash_include_args=['arg1', 'arg2'])` |
| `hash_exclude_args` | specify `list` of arg names that should not be used to compute hash | `@toolcache.cache(..., hash_exclude_args=['arg3', 'arg4'])` |`toolcache.cache()` also works on functions that have `*args` or `**kwargs` for inputs
#### Decorated Function Args
Every time the decorated function is called, it can use the following keyword args to control cache behavior.
| kwarg | description | default | example |
| -- | -- | -- | -- |
| `cache_save` | `bool` of whether to save output to cache | `True` | `f(..., cache_save=False)` will not save output to cache |
| `cache_load` | `bool` of whether to attempt to load entry from cache | `True` | `f(..., cache_load=False)` will not attempt to load entry from cache |
| `cache_verbose` | `bool` of whether to print info about loading from or saving to cache | `True` | `f(..., cache_load=False)` will not attempt to load entry from cache |You can avoid adding these args to the decorated function by using `@toolcache.cache(..., add_cache_args=False)`.
#### Decorated Function Attributes
The original decorated function can be acessed as `f.__wrapped__`.
The cache instance associated with a decorated function `f()` can be accessed using `f.cache`.
### Cache Methods
These methods are available on every cache instance:
| method | description |
| -- | -- |
| `compute_entry_hash()` | compute hash of entry |
| `save_entry()` | save entry data to cache |
| `exists_in_cache()` | return `bool` of whether entry exists in cache |
| `load_entry()` | load entry data from cache |
| `get_cache_size()` | return `int` number of items in cache |
| `delete_entry()` | remove entry from cache |
| `delete_all_entries()` | delete all entries from cache |## Frequently Asked Questions
#### How is the performance? What is the overhead for using a cache decorator?
To maximize cache performance, one can disable input name normalization (`normalize_hash_inputs=False`), statistic tracking (`track_basic_stats=False` and `track_detailed_stats=False`), and thread safety (`safety=None`).
On a somewhat modern machine with the above settings, the `toolcache.cache()` decorator adds about 3 μs to each function call, whereas running a simple function with no cache decorator takes about 50 ns per function call. Using a disk cache instead of a memory cache adds about 25 μs per function call. To truly know whether `toolcache` is fast enough for your application you may need to run your own benchmarks.
#### How does `toolcache` relate to other similar projects?
A large motivation for developing `toolcache` was being able to manage memory-based and disk-based caches with a unified interface and feature set. `toolcache` is currently the only python package to offer this functionality.
There exist many other python packages for caching and memoization. [`cacheout`](https://github.com/dgilland/cacheout) and [`python-memoization`](https://github.com/lonelyenvoy/python-memoization) both provide in-memory caches with many features. Compared to `toolcache` these libraries provide a wider variety of cache eviction policies and other interesting features. [`python-diskcache`](https://github.com/grantjenks/python-diskcache/) provides a feature-rich disk-based cache with Django integration and extensive benchmark comparisons to other solutions.