{"id":48725922,"url":"https://github.com/jolovicdev/cashet","last_synced_at":"2026-04-25T20:03:00.599Z","repository":{"id":350738325,"uuid":"1208082932","full_name":"jolovicdev/cashet","owner":"jolovicdev","description":"Cache Python function results like git objects. Content-addressable, pipeline-friendly, and CLI-inspectable. Run once, reuse forever.","archived":false,"fork":false,"pushed_at":"2026-04-19T20:27:22.000Z","size":80,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-04-19T23:38:38.359Z","etag":null,"topics":["cache","cli-tool","compute-cache","content-addressable","dag","deduplication","function-cache","hashing","memoization","pickle","pythn","sqlite"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/cashet/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jolovicdev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-04-11T19:48:00.000Z","updated_at":"2026-04-19T21:51:57.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/jolovicdev/cashet","commit_stats":null,"previous_names":["jolovicdev/cashet"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/jolovicdev/cashet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jolovicdev%2Fcashet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jolovicdev%2Fcashet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jolovicdev%2Fcashet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jolovicdev%2Fcashet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jolovicdev","download_url":"https://codeload.github.com/jolovicdev/cashet/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jolovicdev%2Fcashet/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32274987,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-25T18:29:39.964Z","status":"ssl_error","status_checked_at":"2026-04-25T18:29:32.149Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cache","cli-tool","compute-cache","content-addressable","dag","deduplication","function-cache","hashing","memoization","pickle","pythn","sqlite"],"created_at":"2026-04-11T22:17:41.212Z","updated_at":"2026-04-25T20:03:00.587Z","avatar_url":"https://github.com/jolovicdev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003ecashet\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eContent-addressable compute cache with git semantics\u003c/strong\u003e\u003cbr\u003e\n  Run a function once. Get the same result instantly every time after that.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#install\"\u003eInstall\u003c/a\u003e · \u003ca href=\"#quickstart\"\u003eQuick Start\u003c/a\u003e · \u003ca href=\"#why\"\u003eWhy\u003c/a\u003e · \u003ca href=\"#use-cases\"\u003eUse Cases\u003c/a\u003e · \u003ca href=\"#cli\"\u003eCLI\u003c/a\u003e · \u003ca href=\"#api\"\u003eAPI\u003c/a\u003e · \u003ca href=\"#how-it-works\"\u003eHow It Works\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\n## Install\n\n**Global CLI tool** (recommended):\n\n```bash\nuv tool install cashet\n# or\npipx install cashet\n```\n\nThen use the CLI anywhere:\n\n```bash\ncashet --help\n```\n\n**In a project** (library + CLI):\n\n```bash\nuv add cashet\n# or\npip install cashet\n```\n\nThis installs `cashet` as both an importable Python library (`from cashet import Client`) and a project-local CLI (`uv run cashet`).\n\n**Develop / contribute:**\n\n```bash\ngit clone https://github.com/jolovicdev/cashet.git\ncd cashet\nuv sync\nuv run pytest\n```\n\n## Quick Start\n\n```python\nfrom cashet import Client\n\nclient = Client()  # creates .cashet/ in current directory\n\ndef expensive_transform(data, scale=1.0):\n    # imagine this takes 10 minutes\n    return [x * scale for x in data]\n\n# First call: runs the function\nref = client.submit(expensive_transform, [1, 2, 3], scale=2.0)\nprint(ref.load())  # [2.0, 4.0, 6.0]\n\n# Second call with same args: instant — returns cached result\nref2 = client.submit(expensive_transform, [1, 2, 3], scale=2.0)\nprint(ref2.load())  # [2.0, 4.0, 6.0] — no re-computation\n```\n\nYou can also use `Client` as a context manager to ensure the store connection is closed cleanly:\n\n```python\nwith Client() as client:\n    ref = client.submit(expensive_transform, [1, 2, 3], scale=2.0)\n    print(ref.load())\n```\n\nChain tasks into a pipeline where each step's output feeds into the next:\n\n```python\nfrom cashet import Client\n\nclient = Client()\n\ndef load_dataset(path):\n    return list(range(100))\n\ndef normalize(data):\n    max_val = max(data)\n    return [x / max_val for x in data]\n\ndef train_model(data, lr=0.01):\n    return {\"loss\": 0.05, \"lr\": lr, \"samples\": len(data)}\n\n# Step 1: load\nraw = client.submit(load_dataset, \"data/train.csv\")\n\n# Step 2: normalize (receives raw output as input)\nnormalized = client.submit(normalize, raw)\n\n# Step 3: train (receives normalized output)\nmodel = client.submit(train_model, normalized, lr=0.001)\n\nprint(model.load())  # {'loss': 0.05, 'lr': 0.001, 'samples': 100}\n```\n\nRe-run the script — everything returns instantly from cache. Change one argument and only that step (and downstream) re-runs.\n\n## Why\n\nYou already have caches (`functools.lru_cache`, `joblib.Memory`). Here's what's different:\n\n| | lru_cache | joblib.Memory | **cashet** |\n|---|---|---|---|\n| AST-normalized hashing | No | No | Yes (comments/formatting don't break cache) |\n| DAG resolution (chain outputs) | No | No | Yes |\n| Content-addressable storage | No | No | Yes (like git blobs) |\n| CLI to inspect history | No | No | Yes |\n| Diff two runs | No | No | Yes |\n| Garbage collection / eviction | No | No | Yes |\n| Pluggable serialization | No | No | Yes |\n| Explicit cache opt-out | No | Partial | Yes |\n| Pluggable store / executor | No | No | Yes |\n| Persists across restarts | No | Yes | Yes |\n\nThe core idea: **hash the function's AST-normalized source + arguments = unique cache key**. Comments, docstrings, and formatting changes don't invalidate the cache — only semantic changes do. Same function + same args = same result, stored immutably on disk. The result is a git-like blob you can inspect, diff, and chain.\n\n## Use Cases\n\n### 1. ML Experiment Tracking Without the Bloat\n\nYou run 200 hyperparameter sweeps overnight. Half crash. You fix a bug and re-run. Without cashet, you re-process the dataset 200 times. With cashet:\n\n```python\nfrom cashet import Client, TaskError, TaskRef\n\nclient = Client()\n\ndef preprocess(dataset_path, image_size):\n    # 45 minutes of image resizing\n    ...\n\ndef train(data, learning_rate, dropout):\n    ...\n\n# Batch submit with topological ordering\n# TaskRef(0) refers to the first task's output\nresults = client.submit_many([\n    (preprocess, (\"s3://my-bucket/images\", 224)),\n    (train, (TaskRef(0), 0.01, 0.2)),\n    (train, (TaskRef(0), 0.01, 0.5)),\n    (train, (TaskRef(0), 0.001, 0.2)),\n    (train, (TaskRef(0), 0.001, 0.5)),\n    (train, (TaskRef(0), 0.0001, 0.2)),\n    (train, (TaskRef(0), 0.0001, 0.5)),\n])\n```\n\n`preprocess` runs **once** — all 6 training jobs reuse its cached output. Re-run the script tomorrow and even the training results come from cache (same function + same args = instant).\n\n### 2. Data Pipeline Debugging\n\nYour ETL pipeline fails at step 5. You fix a typo. Now you need to re-run steps 5-7 but steps 1-4 are unchanged and expensive:\n\n```python\nfrom cashet import Client\n\nclient = Client()\n\nraw = client.submit(load_s3, \"s3://logs/2024-05-01/\")\nclean = client.submit(remove_pii, raw)\nenriched = client.submit(join_crm, clean, \"select * from users\")\nreport = client.submit(generate_report, enriched)\n```\n\nFix the `join_crm` function and re-run the script. Steps 1-2 return instantly from cache. Only step 3 onward re-executes. This works because cashet tracks which function produced which output — changing a function's source code changes its hash, invalidating downstream cache entries.\n\n### 3. Reproducible Notebook Results\n\n`cashet` is designed to work in Jupyter notebooks and IPython sessions. Share a result with a colleague and they can verify exactly how it was produced:\n\n```python\n# your notebook\nref = client.submit(generate_forecast, date=\"2024-01-01\", model=\"v3\")\nprint(f\"Result hash: {ref.hash}\")\n```\n\n```bash\n# their terminal — inspect provenance\ncashet show \u003chash\u003e\n\n# Output:\n# Hash:     a3b4c5d6...\n# Function: generate_forecast\n# Source:   def generate_forecast(date, model): ...\n# Args:     (('2024-01-01',), {'model': 'v3'})\n# Created:  2024-05-01T10:32:17\n\n# Retrieve the actual result\ncashet get \u003chash\u003e -o forecast.csv\n```\n\n### 4. Incremental Computation\n\nProcess a large dataset in chunks. Already-processed chunks return instantly:\n\n```python\nfrom cashet import Client\n\nclient = Client()\n\ndef process_chunk(chunk_id, source_file):\n    # expensive per-chunk processing\n    ...\n\nresults = []\nfor chunk_id in range(100):\n    ref = client.submit(process_chunk, chunk_id, \"huge_file.parquet\")\n    results.append(ref)\n```\n\nFirst run processes all 100 chunks. Second run (even after restarting Python) returns all 100 results instantly. Add a new chunk? Only that one runs.\n\n## CLI\n\n```bash\n# Show commit history\ncashet log\n\n# Filter by function name\ncashet log --func \"preprocess\"\n\n# Filter by tag\ncashet log --tag env=prod --tag experiment=run-1\n\n# Show full commit details (source code, args, error)\ncashet show \u003chash\u003e\n\n# Retrieve a result (pretty-prints strings/dicts/lists)\ncashet get \u003chash\u003e\n\n# Write a result to file\ncashet get \u003chash\u003e -o output.bin\n\n# Compare two commits\ncashet diff \u003chash_a\u003e \u003chash_b\u003e\n\n# Show lineage of a result (same function+args over time)\ncashet history \u003chash\u003e\n\n# Delete a specific commit\ncashet rm \u003chash\u003e\n\n# Evict old cache entries and orphaned blobs\ncashet gc --older-than 30\n\n# Evict oldest entries until under a size limit\ncashet gc --max-size 1GB\n\n# Clear everything (alias for gc --older-than 0)\ncashet clear\n\n# Storage statistics (includes disk size)\ncashet stats\n```\n\n## API\n\n### `Client`\n\n```python\nfrom cashet import Client\n\nclient = Client(\n    store_dir=\".cashet\",       # where to store blobs + metadata (SQLiteStore)\n                               # falls back to $CASHET_DIR env var if set\n    store=None,                # or inject any Store implementation\n    executor=None,             # or inject any Executor implementation\n    serializer=None,           # defaults to PickleSerializer\n    max_workers=1,             # max parallelism for submit_many (default: 1, sequential)\n)\n```\n\n### Pluggable Backends\n\nEverything is protocol-based. Swap the store, executor, or serializer without touching your task code:\n\n```python\nfrom pathlib import Path\n\nfrom cashet import Client, Store, Executor, Serializer\nfrom cashet.store import SQLiteStore\nfrom cashet.executor import LocalExecutor\n\n# These are equivalent (the defaults):\nclient = Client(store_dir=\".cashet\")\n\n# Explicit injection:\nclient = Client(\n    store=SQLiteStore(Path(\".cashet\")),\n    executor=LocalExecutor(),\n)\n```\n\n**Store protocol** — implement this to use RocksDB, Redis, S3, or anything else:\n\n```python\nfrom cashet.protocols import Store\n\nclass RedisStore:\n    def put_blob(self, data: bytes) -\u003e ObjectRef: ...\n    def get_blob(self, ref: ObjectRef) -\u003e bytes: ...\n    def put_commit(self, commit: Commit) -\u003e None: ...\n    def get_commit(self, hash: str) -\u003e Commit | None: ...\n    def find_by_fingerprint(self, fingerprint: str) -\u003e Commit | None: ...\n    def find_running_by_fingerprint(self, fingerprint: str) -\u003e Commit | None: ...\n    def list_commits(self, ...) -\u003e list[Commit]: ...\n    def get_history(self, hash: str) -\u003e list[Commit]: ...\n    def stats(self) -\u003e dict[str, int]: ...\n    def evict(self, older_than: datetime) -\u003e int: ...\n    def delete_commit(self, hash: str) -\u003e bool: ...\n    def close(self) -\u003e None: ...\n\nclient = Client(store=RedisStore(\"redis://localhost\"))\n# Everything else works identically\n```\n\n**Executor protocol** — implement this for distributed execution (Celery, Kafka, RQ):\n\n```python\nfrom cashet.protocols import Executor\n\nclass CeleryExecutor:\n    def submit(self, func, args, kwargs, task_def, store, serializer):\n        # Push to Celery, poll for result\n        ...\n\nclient = Client(\n    store=RedisStore(\"redis://localhost\"),\n    executor=CeleryExecutor(),\n)\n```\n\n**Serializer protocol** — already covered below.\n\n### `client.submit(func, *args, **kwargs) -\u003e ResultRef`\n\nSubmit a function for execution. Returns a `ResultRef` — a lazy handle to the result.\n\n```python\nref = client.submit(my_func, arg1, arg2, key=\"value\")\nref.hash         # content hash of the result blob\nref.commit_hash  # commit hash (use this for show/history/rm/get)\nref.size         # size in bytes\nref.load()       # deserialize and return the result\n```\n\nIf the same function + same arguments have been submitted before, returns the cached result **without re-executing**.\n\n### `client.clear()`\n\nRemove all cache entries and orphaned blobs. Equivalent to `client.gc(timedelta(days=0))`.\n\n```python\nclient.clear()\n```\n\n### `client.submit_many(tasks) -\u003e list[ResultRef]`\n\nSubmit a batch of tasks with automatic topological ordering. Use `TaskRef(index)` to wire outputs between tasks in the batch.\n\n```python\nfrom cashet import TaskRef\n\nrefs = client.submit_many([\n    step1_func,\n    (step2_func, (TaskRef(0),)),\n    (step3_func, (TaskRef(1), \"extra_arg\")),\n], max_workers=4)  # run independent tasks in parallel\n```\n\nThis enables parallel fan-out and ensures each task only runs after its dependencies.\n\n**Opt out of caching:**\n\n```python\n# Per-call\nref = client.submit(non_deterministic_func, _cache=False)\n\n# Per-function via decorator\n@client.task(cache=False)\ndef random_score():\n    return random.random()\n```\n\n**Force re-execution (skip cache, always run):**\n\n```python\n# Per-call\nref = client.submit(my_func, arg, _force=True)\n\n# Per-function via decorator\n@client.task(force=True)\ndef always_rerun():\n    ...\n```\n\n**Tag commits:**\n\n```python\n# Per-call\nref = client.submit(train, data, lr=0.01, _tags={\"experiment\": \"v1\"})\n\n# Per-function via decorator\n@client.task(tags={\"team\": \"ml\"})\ndef preprocess(raw):\n    ...\n```\n\nTags are not part of the cache key — they are metadata for organization and filtering.\n\n**Retry flaky operations:**\n\n```python\n# Per-call\nref = client.submit(fetch_api, url, _retries=3)\n\n# Per-function via decorator\n@client.task(retries=3)\ndef fetch_api(url):\n    ...\n```\n\nRetries wait briefly between attempts. When retries are exhausted, `client.submit` raises `TaskError` with the original traceback included in the message.\n\n**Task timeouts:**\n\n```python\n# Per-call (seconds)\nref = client.submit(slow_func, _timeout=30)\n\n# Per-function via decorator\n@client.task(timeout=30)\ndef slow_func():\n    ...\n```\n\nTimeouts can be combined with retries — a timed-out attempt counts as a failure and will be retried.\n\n### `@client.task`\n\nRegister a function with cashet metadata and make it directly callable:\n\n```python\n@client.task\ndef my_func(x):\n    return x * 2\n\nref = my_func(5)  # Returns ResultRef, same as client.submit(my_func, 5)\nref.load()        # 10\n\n@client.task(cache=False, name=\"custom_task_name\", tags={\"env\": \"prod\"})\ndef other_func(x):\n    return x + 1\n```\n\n`client.submit(my_func, 5)` still works identically.\n\n### `client.log()`, `client.show()`, `client.get()`, `client.diff()`, `client.history()`, `client.rm()`, `client.gc()`\n\n```python\n# List commits\ncommits = client.log(func_name=\"preprocess\", limit=10)\n\n# Filter by status\ncommits = client.log(status=\"failed\")\n\n# Filter by tags\ncommits = client.log(tags={\"experiment\": \"v1\"})\n\n# Get commit details\ncommit = client.show(hash)\ncommit.task_def.func_source  # the source code\ncommit.task_def.args_snapshot  # the serialized args\ncommit.parent_hash  # previous commit for same func+args\ncommit.created_at\n\n# Load a result by commit hash\nresult = client.get(hash)\n\n# Diff two commits\ndiff = client.diff(hash_a, hash_b)\n# {'func_changed': True, 'args_changed': False, 'output_changed': True, ...}\n\n# Get lineage (all runs of same func+args)\nhistory = client.history(hash)\n\n# Evict old entries (default: 30 days)\nevicted = client.gc()\n# Evict entries older than 7 days\nfrom datetime import timedelta\nevicted = client.gc(older_than=timedelta(days=7))\n# Evict oldest entries until under size limit\nevicted = client.gc(max_size_bytes=1024 * 1024 * 1024)  # 1GB\n\n# Storage stats\nstats = client.stats()\n# {\n#     'total_commits': 42,\n#     'completed_commits': 40,\n#     'stored_objects': 38,      # blob_objects + inline_objects\n#     'disk_bytes': 10485760,    # blob_bytes + inline_bytes\n#     'blob_objects': 35,\n#     'blob_bytes': 9437184,\n#     'inline_objects': 3,\n#     'inline_bytes': 1048576,\n# }\n```\n\n### Jupyter \u0026 Notebook Support\n\n`cashet` works seamlessly in Jupyter notebooks, IPython, and the Python REPL. It uses a tiered source-resolution strategy:\n\n1. **`inspect.getsource()`** — for normal `.py` files\n2. **`dill.source.getsource()`** — for interactive sessions with live history\n3. **`dis.Bytecode` fallback** — for any live function, even after a kernel restart\n\nThis means you can define functions in a notebook cell, rerun the cell with changes, and `cashet` will correctly invalidate the cache based on the new code.\n\n```python\n# In a notebook cell\nclient = Client()\n\ndef preprocess(data):\n    return [x * 2 for x in data]\n\nref = client.submit(preprocess, [1, 2, 3])\n```\n\nChange the cell body and rerun — the cache invalidates automatically.\n\n### Thread Safety\n\n`cashet` is safe to use from multiple threads and processes sharing the same store directory. Concurrent submissions of the same uncached task are deduplicated: the function executes **exactly once** and all callers receive the same cached result. This works across `multiprocessing.Process`, `ProcessPoolExecutor`, and multiple independent Python interpreters.\n\n\u003e **Note:** Cross-process dedup uses a 5-minute timeout by default. If a process dies while running a task, its claim is automatically reclaimed after that timeout so other workers are not blocked forever. You can adjust this via `LocalExecutor(running_ttl=...)`:\n\u003e\n\u003e ```python\n\u003e from datetime import timedelta\n\u003e from cashet.executor import LocalExecutor\n\u003e\n\u003e client = Client(executor=LocalExecutor(running_ttl=timedelta(minutes=10)))\n\u003e ```\n\n```python\nimport threading\n\ndef worker():\n    c = Client()  # separate Client instance, same store\n    c.submit(expensive_func, arg)\n\nthreads = [threading.Thread(target=worker) for _ in range(10)]\nfor t in threads:\n    t.start()\nfor t in threads:\n    t.join()\n# expensive_func ran only once\n```\n\n### `ResultRef`\n\nA lazy reference to a stored result. Pass it as an argument to chain tasks:\n\n```python\nstep1 = client.submit(func_a, input_data)\nstep2 = client.submit(func_b, step1)  # step1 auto-resolves to its output\n```\n\n### Custom Serialization\n\n```python\nfrom cashet import Client, PickleSerializer, SafePickleSerializer, JsonSerializer\n\n# Default: pickle (handles arbitrary Python objects)\nclient = Client(serializer=PickleSerializer())\n\n# Safe pickle: restricts deserialization to an allowlist of known types\nclient = Client(serializer=SafePickleSerializer())\n\n# Allow custom classes through the allowlist\nclient = Client(serializer=SafePickleSerializer(extra_classes=[MyClass]))\n\n# For JSON-safe data (dicts, lists, primitives)\nclient = Client(serializer=JsonSerializer())\n\n# Or implement the Serializer protocol\nfrom cashet.hashing import Serializer\n\nclass MySerializer:\n    def dumps(self, obj) -\u003e bytes:\n        ...\n    def loads(self, data: bytes):\n        ...\n```\n\n## How It Works\n\n```\nclient.submit(func, arg1, arg2)\n         │\n         ▼\n  ┌─────────────────┐\n  │  Hash function   │  SHA256(AST-normalized source + dep versions + referenced user helpers)\n  │  Hash arguments  │  SHA256(canonical repr of args/kwargs)\n  └────────┬────────┘\n           │\n           ▼\n  ┌─────────────────┐\n  │  Fingerprint     │  func_hash:args_hash\n  │  cache lookup    │  ← Store protocol (SQLiteStore, RedisStore, ...)\n  └────────┬────────┘\n           │\n     ┌─────┴─────┐\n     │            │\n  CACHED       MISS\n     │            │\n     ▼            ▼\n  Return ref   ← Executor protocol (LocalExecutor, CeleryExecutor, ...)\n               Execute function\n               Store result as blob → Store protocol\n               Record commit with parent lineage\n               Return ref\n```\n\n**Architecture (protocol-based):**\n\n| Protocol | Default | Implement for |\n|---|---|---|\n| `Store` | `SQLiteStore` | RocksDB, Redis, S3, Postgres |\n| `Executor` | `LocalExecutor` | Celery, Kafka, RQ, subprocess |\n| `Serializer` | `PickleSerializer` | JSON, MessagePack, custom formats |\n\n**Storage layout** (in `.cashet/`):\n\n```\n.cashet/\n├── objects/          # content-addressable blobs (like git objects)\n│   ├── a3/\n│   │   └── b4c5d6... # compressed result blob\n│   └── e7/\n│       └── f8g9h0...\n└── meta.db           # SQLite: commits, fingerprints, provenance, inline_objects\n```\n\n**Small objects** (\u003c1KB) are stored inline in `meta.db` instead of the filesystem. This reduces inode overhead for caches with many tiny results. Larger objects are stored as compressed blobs in `objects/` as usual.\n\n**Key design decisions:**\n\n- **Closure variables are not hashed** and emit a `ClosureWarning` if present. Function identity is source code, not runtime state. If you need cache invalidation based on a value, pass it as an explicit argument.\n- **Referenced user-defined helper functions are hashed recursively.** Change an imported helper in your own code and the caller's cache invalidates correctly. Builtin and third-party library functions are skipped.\n- **Blobs are deduplicated by content hash.** Identical results share one blob on disk.\n- **Source is hashed as an AST.** Comments, docstrings, and whitespace changes don't invalidate the cache.\n- **Non-cached tasks get unique commit hashes** (timestamp salt) so they always re-execute but still record lineage.\n- **Parent tracking:** Each commit records the hash of the previous commit for the same function+args, forming a history chain you can traverse.\n\n## Project Status\n\n**Beta.** The core (hashing, DAG resolution, fingerprint dedup) is stable. The defaults work reliably for single-machine and multiprocess workflows. The protocol layer (`Store`, `Executor`, `Serializer`) is ready for alternative backends — implementing a Redis store or Celery executor is a single-file job.\n\nBuilt-in: `SQLiteStore` + `LocalExecutor` + `PickleSerializer`.\nNot yet built: Redis, RocksDB, S3 stores; Celery/Kafka executors. PRs welcome.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjolovicdev%2Fcashet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjolovicdev%2Fcashet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjolovicdev%2Fcashet/lists"}