Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jython1415/notecache

Persistent caching tool with state-dependent reevaluation; designed for a notebook environment
https://github.com/jython1415/notecache

jupyter jupyter-notebook python3

Last synced: 12 days ago
JSON representation

Persistent caching tool with state-dependent reevaluation; designed for a notebook environment

Awesome Lists containing this project

README

        

# notecache — Notebook Caching

Created by Joshua Shew

- ([[email protected]]("mailto:[email protected]")).
- [Twitter](https://twitter.com/JShoes1415)

## Introduction

This package is for persistent and state-dependent caching of objects in Jupyter notebooks. This means that objects (such as large intermediate data frames) are recomputing if and only if a state they are tied to has changed. Otherwise the cell, even when run, will load the object from cache. The "persistent" part of the description means that these objects (and their associated state) are stored between notebook sessions and even after the notebook has been closed and reopened.

Without `notecache`:

```python
df = expensive_computation([multiple_large_arguments])
```

With `notecache`:

```python
import notecache
state = {"arg1": arg1, "arg2": another_arg}
def generate(state) -> DataFrame:
return expensive_computation(state["arg1"], state["arg2"])
df = notecache.load(state, generate, unique_id = "large-data-frame")
```

The first time this cell is executed, `expensive_computation` will be run to generate the result. Following executions of this cell will load the result instead of calling `expensive_computation`, *even if the notebook has closed*and reopened*. The result is recomputation *if and only if* a change to `state` has been detected.

## Installation

`notecache` can be found on [PyPI](https://pypi.org/project/notecache/). It can be installed with `pip`.

```bash
pip install notecache
```

## Basic Usage

This package has one public function, `load`. It is used to both store and load any given object. The 3 most important arguments passed into `load` are:

1. `state`

This argument should contain all the information that is required to compute the object that is to be stored. A change in `state` between two calls to `load` (with the same `unique_id`) will cause the object to be generated instead of loaded from cache.

1. `generate`

This is the function that is used to generate the target object. The return value of `load` contains the return value of `generate(state)`.

1. `unique_id`

The `sha512` hash value of `unique_id` is used to create a unique file name to store the object. Overlapping `unique_id` in different calls to `load` may cause cache objects to be overwritten.

`load` returns a named tuple, and the object can be accessed with `load([args]).object`.

## Usage Examples

- Contact the repository author if you used this package in a public repository or if you know of any place it is used so that it can be featured in this list.

## Developer Instructions

### Installation

1. Fork the repository
1. Clone your fork with `git clone ...`
1. Run the installation script: `./scripts/initialize.sh`
1. Confirm successful installation by running unit tests
1. Activate the virtual environment: `source .venv/bin/activate`
1. Run the tests: `pytest tests/unit`

### Issues

Submit issues to the GitHub repository with steps to reproduce any bugs. Feature requests and optimization ideas can also be submitted as issues.

### Making Code Contributions

1. Make changes on a branch in your fork
1. Create tests to define behavior and get them passing
1. Create a pull request with a description of the changes