Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/sslivkoff/toolcache

toolcache makes it simple to create and configure caches in python
https://github.com/sslivkoff/toolcache
Last synced: 10 days ago
JSON representation
toolcache makes it simple to create and configure caches in python
Host: GitHub
URL: https://github.com/sslivkoff/toolcache
Owner: sslivkoff
License: apache-2.0
Created: 2021-02-13T03:30:26.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2022-10-04T21:00:10.000Z (about 2 years ago)
Last Synced: 2024-04-24T05:01:05.738Z (7 months ago)
Language: Python
Homepage:
Size: 62.5 KB
Stars: 6
Watchers: 1
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

        
# toolcache

`toolcache` makes it simple to create and configure caches in python

## Features

- save caches to memory or to disk

- memoize functions, instance methods, `@classmethod`s, and `@staticmethod`s

- control cache size with ttl and eviction policies like lru / fifo / lfu

- use thread safety, process safety, or no safety (default = thread safety)

- use custom hash functions

- track cache usage statistics

## Install

`pip install toolcache`

## Contents

- [Example Usage](#example-usage)

- [Reference](#cache-reference)

    - [Cache Types](#cache-types)

    - [Cache Creation](#cache-creation)

    - [Cache Configuration](#cache-configuration)

    - [Cache Decorators](#cache-decorators)

    - [Cache Methods](#cache-methods)

- [Frequently Asked Questions](#frequently-asked-questions)

## Example Usage

### Creating Caches

```python

import toolcache

# memoize function with memory cache

@toolcache.cache('memory')

def f(a, b, c):

    return a * b * c

# memoize function with disk cache, stored in a tempdir

@toolcache.cache('disk')

def f(a, b, c):

    return a * b * c

# memoize function with disk cache, stored in a persistent dir

@toolcache.cache('disk', cache_dir='/path/to/cache/dir')

def f(a, b, c):

    return a * b * c

    

# remove cache entries once they reach a specific age

@toolcache.cache('disk', ttl='24 hours')

def f(a, b, c):

    return a * b * c

# remove cache entries once cache reaches a specific size

@toolcache.cache('disk', max_size=3, max_size_policy='fifo')

def f(a, b, c):

    return a * b * c

# specify which args are used to create unique hash of inputs

@toolcache.cache('disk', hash_args=['a', 'b'])

def f(a, b, c):

    return a * b * c

# create standalone cache

standalone_cache = toolcache.MemoryCache()

```

### Using Caches

```python

# get cache size

print(f.cache.get_cache_size())

> 4

# track cache usage statistics

print(f.cache.stats)

> {'n_checks': 6,

>  'n_deletes': 2,

>  'n_hashes': 8,

>  'n_hits': 2,

>  'n_loads': 1,

>  'n_misses': 4,

>  'n_saves': 3,

>  'n_size_evictions': 0,

>  'n_ttl_evictions': 0}

# clear cache

f.cache.delete_all_entries()

```

### More Examples

- [Configure Hashing](examples/function_hashing_options.py)

- [Configure Disk Caches](examples/disk_cache_options.py)

- [Use Cache Eviction Policies](examples/cache_eviction_policies.py)

- [Monitor Cache Usage](examples/monitor_cache_statistics.py)

- [Create Standalone Caches](examples/standalone_cache.py)

- [Define Custom Cachetypes](examples/define_custom_cache.py)

## Cache Reference

### Cache Types

`toolcache` includes 3 cache types that each inherit from abstract cache class `BaseCache`:

| cachetype     | description                                               | use case |

| --            | --                                                        | -- |

| `MemoryCache` | cache that saves each entry as key-value pair in a `dict` | speed |

| `DiskCache`   | cache that saves each entry as a file to disk             | persistence, or large data that does not fit in memory |

| `NullCache`   | cache that does not save any entries                      | programmatically disabling cache |

### Cache Creation

Caches can be created in two ways:

1. decorating a function with `@toolcache.cache(cachetype)` where `cachetype` is `'memory'`, `'disk'`, `'null'`, or a class inheriting from `BaseCache`

2. creating a standalone cache by instantiating a class that inherits from `BaseCache`

### Cache Configuration

The configuration options listed below can be passed to `toolcache.cache()` or passed to a standalone cache during initialization.

#### General Config

these configuration options are available to every cache

| arg          | description                                                                       | example value       | default behavior |

| --           | --                                                                                | --                  | -- |

| `safety`     | `str` name of concurrency safety level, one of `'thread'`, `'process'`, or `None` | `'thread'`          | `'thread'` |

| `verbose`    | `bool` of whether to print info whenever saving to or loading from cache          | `False`             | `False` |

| `cache_name` | `bool` of whether to print info whenever saving to or loading from cache          | `'important_cache'` | use decorated function name, or uuid for  a standalone cache |

#### Hash Config

| arg                     | description                                              | example value | default behavior |

| --                      | --                                                       | --            | --               |

| `f_hash`                | custom function for computing hash | `lambda x: hash(x)` | `toolcache. compute_hash_json()` |

| `normalize_hash_inputs` | bool of whether to normalize function calls so that for a function `f` with args `a` and `b`, the calls `f(1, 2)` and `f(a=1, b=2)` are equivalent | `False` | `False` |

| `hash_include_args`     | `list` of `str` names of arguments used to compute hash  | `['arg1', 'arg2']`               | include all args |

| `hash_exclude_args`     | `list` of `str` names of arguments excluded from hash    | `['arg3', 'arg4']`               | exclude no args |

#### Eviction Config

| arg               | description                                                                                            | example value | default behavior |

| --                | --                                                                                                     | --            | -- |

| `ttl`             | [`Timelength`](https://github.com/sslivkoff/tooltime#timelength-representations) of time-to-live maximum age for entries in cache | `'1000s'`     | no max age |

| `max_size`        | `int` of max size of cache size                                                                        | `1000`        | no max size |

| `max_size_policy` | `str` name of eviction policy to use when `max_size` is exceeded, one of `'lru'`, `'fifo'`, or `'lfu'` | `'fifo'`      | `'lru'' |

#### Statistic Tracking Config

| arg | description | example value | default behavior |

| --                | --          | --           | -- |

| `track_basic_stats` | `bool` of whether to track basic usage stats | `False` | `False` |

| `track_detailed_stats` | `bool` of whether to track creations and accesses | `False` | `False` |

| `track_creation_times` | `bool` of whether to track creation times | `False` | track only if `ttl` is not `None` or `max_size_policy == 'fifo'` |

| `track_access_times` | `bool` of whether to track access times | `False` | track only if `max_size_policy == 'lru'` |

| `track_access_counts` | `bool` of whether to track access counts | `False` |  track only if `max_size_policy == 'lfu'` |

#### `DiskCache`-specific Config

| arg | description | example value | default behavior |

| --            | --                                                                                                       | --                     | --                |

| `cache_dir`   | `str` of directory path to store cache data                                                              | `'/path/to/cache_dir'` | create a `tmpdir` |

| `file_format` | `str` of file format to use for cache data, either `'pickle'` or `'json'`                                | `'json'`               | `'pickle'`        |

| `f_disk_save` | custom function for saving data to disk, function should take `entry_path` and `entry_data` as arguments | `f_save` | save as pickle  |

| `f_disk_load` | custom function for load data from disk, function should take `entry_path` as an argument                | `f_load` | load as pickle |

### Cache Decorators

When using `toolcache.cache()` to decorate a function, one should consider 1) how function inputs will be hashed, 2) what attributes will be added to the function, and 3) what arguments might be added to the function.

#### Hashing Function Inputs

To save a function input-output pair within a cache, a unique hash must be taken of the inputs.

Under the default hash configuration, each input arg should either be json-serializable or be a hashable object (i.e. it implements a `__hash__()` method). By default `toolcache` uses [`orjson`](https://github.com/ijl/orjson) to create these hashes quickly.

If function inputs do not satisfy these criteria, one or more of the cache config parameters should be used:

| parameter           | description                                                                                  | example |

| --                  | --                                                                                           | -- |

| `f_hash`            | provide a custom hash function that takes the same args and kwargs as the decorated function | `@toolcache.cache(..., f_hash=f_custom_hash)` |

| `hash_include_args` | specify `list` of arg names that should be used to compute hash                              | `@toolcache.cache(..., hash_include_args=['arg1', 'arg2'])` |

| `hash_exclude_args` | specify `list` of arg names that should not be used to compute hash                          | `@toolcache.cache(..., hash_exclude_args=['arg3', 'arg4'])` |

`toolcache.cache()` also works on functions that have `*args` or `**kwargs` for inputs

#### Decorated Function Args

Every time the decorated function is called, it can use the following keyword args to control cache behavior.

| kwarg           | description                                                           | default | example |

| --              | --                                                                    | --      | -- |

| `cache_save`    | `bool` of whether to save output to cache                             | `True`  | `f(..., cache_save=False)` will not save output to cache |

| `cache_load`    | `bool` of whether to attempt to load entry from cache                 | `True`  | `f(..., cache_load=False)` will not attempt to load entry from cache |

| `cache_verbose` | `bool` of whether to print info about loading from or saving to cache | `True`  | `f(..., cache_load=False)` will not attempt to load entry from cache |

You can avoid adding these args to the decorated function by using `@toolcache.cache(..., add_cache_args=False)`.

#### Decorated Function Attributes

The original decorated function can be acessed as `f.__wrapped__`.

The cache instance associated with a decorated function `f()` can be accessed using `f.cache`.

### Cache Methods

These methods are available on every cache instance:

| method                 | description |

| --                     | -- |

| `compute_entry_hash()` | compute hash of entry |

| `save_entry()`         | save entry data to cache |

| `exists_in_cache()`    | return `bool` of whether entry exists in cache |

| `load_entry()`         | load entry data from cache |

| `get_cache_size()`     | return `int` number of items in cache |

| `delete_entry()`       | remove entry from cache |

| `delete_all_entries()` | delete all entries from cache |

## Frequently Asked Questions

#### How is the performance? What is the overhead for using a cache decorator?

To maximize cache performance, one can disable input name normalization (`normalize_hash_inputs=False`), statistic tracking (`track_basic_stats=False` and `track_detailed_stats=False`), and thread safety (`safety=None`).

On a somewhat modern machine with the above settings, the `toolcache.cache()` decorator adds about 3 μs to each function call, whereas running a simple function with no cache decorator takes about 50 ns per function call. Using a disk cache instead of a memory cache adds about 25 μs per function call. To truly know whether `toolcache` is fast enough for your application you may need to run your own benchmarks.

#### How does `toolcache` relate to other similar projects?

A large motivation for developing `toolcache` was being able to manage memory-based and disk-based caches with a unified interface and feature set. `toolcache` is currently the only python package to offer this functionality.

There exist many other python packages for caching and memoization. [`cacheout`](https://github.com/dgilland/cacheout) and [`python-memoization`](https://github.com/lonelyenvoy/python-memoization) both provide in-memory caches with many features. Compared to `toolcache` these libraries provide a wider variety of cache eviction policies and other interesting features. [`python-diskcache`](https://github.com/grantjenks/python-diskcache/) provides a feature-rich disk-based cache with Django integration and extensive benchmark comparisons to other solutions.