https://github.com/thombashi/df-diskcache
df-diskcache is a Python library for caching pandas.DataFrame objects to local disk.
https://github.com/thombashi/df-diskcache
disk-cache pandas-dataframe python-library
Last synced: 4 months ago
JSON representation
df-diskcache is a Python library for caching pandas.DataFrame objects to local disk.
- Host: GitHub
- URL: https://github.com/thombashi/df-diskcache
- Owner: thombashi
- License: mit
- Created: 2023-11-26T11:30:01.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2023-12-11T14:35:29.000Z (about 2 years ago)
- Last Synced: 2025-09-27T16:29:06.123Z (5 months ago)
- Topics: disk-cache, pandas-dataframe, python-library
- Language: Python
- Homepage:
- Size: 20.5 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
.. contents:: **df-diskcache**
:backlinks: top
:depth: 2
Summary
============================================
``df-diskcache`` is a Python library for caching ``pandas.DataFrame`` objects to local disk.
.. image:: https://badge.fury.io/py/df-diskcache.svg
:target: https://badge.fury.io/py/df-diskcache
:alt: PyPI package version
.. image:: https://img.shields.io/pypi/pyversions/df-diskcache.svg
:target: https://pypi.org/project/df-diskcache
:alt: Supported Python versions
.. image:: https://github.com/thombashi/df-diskcache/actions/workflows/ci.yml/badge.svg
:target: https://github.com/thombashi/df-diskcache/actions/workflows/ci.yml
:alt: CI status of Linux/macOS/Windows
.. image:: https://coveralls.io/repos/github/thombashi/df-diskcache/badge.svg?branch=master
:target: https://coveralls.io/github/thombashi/df-diskcache?branch=master
:alt: Test coverage: coveralls
.. image:: https://github.com/thombashi/df-diskcache/actions/workflows/github-code-scanning/codeql/badge.svg
:target: https://github.com/thombashi/df-diskcache/actions/workflows/github-code-scanning/codeql
:alt: CodeQL
Installation
============================================
::
pip install df-diskcache
Features
============================================
Supports the following methods:
- ``get``: Get a cache entry (``pandas.DataFrame``) for the key. Returns ``None`` if the key is not found.
- ``set``: Create a cache entry with an optional time-to-live (TTL) for the key-value pair.
- ``update``
- ``touch``: Update the last accessed time of a cache entry to extend the TTL.
- ``delete``
- ``prune``: Delete expired cache entries.
- Dictionary-like operations:
- ``__getitem__``
- ``__setitem__``
- ``__contains__``
- ``__delitem__``
Usage
============================================
:Sample Code:
.. code-block:: python
import pandas as pd
from dfdiskcache import DataFrameDiskCache
cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"
df = cache.get(url)
if df is None:
print("cache miss")
df = pd.read_csv(url)
cache.set(url, df)
else:
print("cache hit")
print(df)
You can also use operations like a dictionary:
:Sample Code:
.. code-block:: python
import pandas as pd
from dfdiskcache import DataFrameDiskCache
cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"
df = cache[url]
if df is None:
print("cache miss")
df = pd.read_csv(url)
cache[url] = df
else:
print("cache hit")
print(df)
Set TTL for cache entries
--------------------------------------------
:Sample Code:
.. code-block:: python
import pandas as pd
from dfdiskcache import DataFrameDiskCache
DataFrameDiskCache.DEFAULT_TTL = 10 # you can override the default TTL (default: 3600 seconds)
cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"
df = cache.get(url)
if df is None:
df = pd.read_csv(url)
cache.set(url, df, ttl=60) # you can set a TTL for the key-value pair
print(df)
Dependencies
============================================
- Python 3.7+
- `Python package dependencies (automatically installed) `__