{"id":23092038,"url":"https://github.com/thiswillbeyourgithub/persistdict","last_synced_at":"2025-08-16T09:30:54.019Z","repository":{"id":257815609,"uuid":"869756143","full_name":"thiswillbeyourgithub/PersistDict","owner":"thiswillbeyourgithub","description":"Looks like a dict and acts like a dict but is persistent via an sqlite3 db, like sqldict","archived":false,"fork":false,"pushed_at":"2024-12-05T15:28:32.000Z","size":63,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-05T16:30:27.235Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thiswillbeyourgithub.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-08T20:41:04.000Z","updated_at":"2024-12-05T16:20:39.000Z","dependencies_parsed_at":null,"dependency_job_id":"9fef0df8-eef2-413a-a256-b56763baebf6","html_url":"https://github.com/thiswillbeyourgithub/PersistDict","commit_stats":null,"previous_names":["thiswillbeyourgithub/persistdict"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thiswillbeyourgithub%2FPersistDict","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thiswillbeyourgithub%2FPersistDict/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thiswillbeyourgithub%2FPersistDict/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thiswillbeyourgithub%2FPersistDict/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thiswillbeyourgithub","download_url":"https://codeload.github.com/thiswillbeyourgithub/PersistDict/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230027916,"owners_count":18161837,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-16T21:26:53.672Z","updated_at":"2025-08-16T09:30:54.008Z","avatar_url":"https://github.com/thiswillbeyourgithub.png","language":"Python","readme":"# PersistDict\n\nA persistent dictionary implementation backed by an [LMDB database](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database). PersistDict looks and acts like a Python dictionary but persists data to disk, making it ideal for caching and persistent storage needs.\n\n## Overview\n\nPersistDict provides a dictionary-like interface that stores data on disk using the high-performance LMDB (Lightning Memory-Mapped Database). It builds upon [lmdb-dict](https://github.com/uchicago-dsi/lmdb-dict) to provide a robust, thread-safe persistent dictionary with additional features like automatic expiration, metadata tracking, and customizable serialization.\n\n## Why PersistDict?\n\nI created PersistDict while developing [wdoc](https://github.com/thiswillbeyourgithub/WDoc), my RAG library, after encountering issues with langchain's caching mechanisms. Instead of relying on existing implementations that didn't handle concurrency well, I built PersistDict to be thread-safe and robust from the ground up.\n\nPersistDict makes it simple to add persistent caching to any Python application. While earlier versions (before 2.0.0) used SQLite, the current version leverages LMDB for better performance and reliability in concurrent environments.\n\n## Key Features\n\n- **Thread-safe**: All operations are protected by a reentrant lock, allowing multiple threads to safely access the same database without corruption.\n- **Background Processing**: Integrity checks and expiration run in a background thread by default, avoiding blocking the main thread during initialization.\n- **Automatic Expiration**: Old entries are automatically removed after a configurable number of days to prevent unbounded growth.\n- **Metadata Tracking**: Each entry includes creation time (ctime) and last access time (atime) for advanced data management.\n- **Performance Optimized**: Uses `LRUCache128` from [cachetools](https://github.com/tkem/cachetools/) for better performance with frequently accessed items.\n- **Customizable Serialization**: Supports custom serializers for both keys and values, enabling encryption, compression, or any custom data transformation.\n- **Key Hashing**: Keys are hashed and cropped to handle the LMDB key size limitation (default 511 bytes).\n- **Robust Error Handling**: Gracefully handles serialization errors and database corruption with detailed logging.\n- **Collision Management**: Properly handles key hash collisions to ensure data integrity.\n- **Minimal Dependencies**: Only requires `lmdb-dict-full`. Optionally uses [beartype](https://github.com/beartype/beartype/) for type checking and [loguru](https://loguru.readthedocs.io/) for logging if available.\n\n## Installation\n\n### From PyPI\n```bash\npip install PersistDict\n```\n\n### From GitHub\n```bash\ngit clone https://github.com/thiswillbeyourgithub/PersistDict\ncd PersistDict\npip install -e .\n```\n\n### Running Tests\n```bash\ncd PersistDict\npython -m pytest tests/test_persistdict.py -v\n```\n\n## Basic Usage\n\n```python\nfrom PersistDict import PersistDict\n\n# Create a persistent dictionary\nd = PersistDict(\n    database_path=\"/path/to/db\",  # Path to the database directory\n    expiration_days=30,           # Optional: entries older than this will be removed\n    verbose=False,                # Optional: enable debug logging\n    background_thread=True,       # Optional: run initialization tasks in background\n)\n\n# Use it like a regular dictionary\nd[\"key\"] = \"value\"\nprint(d[\"key\"])         # \"value\"\nprint(\"key\" in d)       # True\nprint(len(d))           # 1\n\n# Dictionary-style initialization (only available once)\nd = d(a=1, b=\"string\", c=[1, 2, 3])\n\n# Supports standard dictionary methods\nfor key in d.keys():\n    print(key)\n    \nfor value in d.values():\n    print(value)\n    \nfor key, value in d.items():\n    print(f\"{key}: {value}\")\n\n# Delete items\ndel d[\"a\"]\n\n# Clear the entire dictionary\nd.clear()\n```\n\n## Advanced Usage\n\n### Custom Serialization\n\n```python\nimport json\nimport pickle\nimport dill\n\n# Custom serializers for encryption, compression, etc.\nd = PersistDict(\n    database_path=\"/path/to/db\",\n    key_serializer=json.dumps,       # Custom key serializer\n    key_unserializer=json.loads,     # Custom key deserializer\n    value_serializer=dill.dumps,     # Custom value serializer\n    value_unserializer=dill.loads,   # Custom value deserializer\n    key_size_limit=511,              # Maximum key size before hashing\n    caching=True,                    # Enable/disable LRU caching\n    background_timeout=30,           # Maximum time for background operations\n)\n```\n\n### Shared Database Access\n\nMultiple instances can safely access the same database:\n\n```python\n# Create two instances pointing to the same database\nd1 = PersistDict(database_path=\"/path/to/db\")\nd2 = PersistDict(database_path=\"/path/to/db\")\n\n# Changes in one instance are visible in the other\nd1[\"shared_key\"] = \"shared_value\"\nassert d2[\"shared_key\"] == \"shared_value\"\nassert list(d1.keys()) == list(d2.keys())\n```\n\n### Background Thread Control\n\nControl how initialization tasks run:\n\n```python\n# Run in background (default)\nd1 = PersistDict(database_path=\"/path/to/db\", background_thread=True)\n\n# Run in foreground (blocking)\nd2 = PersistDict(database_path=\"/path/to/db\", background_thread=False)\n\n# Skip initialization tasks entirely\nd3 = PersistDict(database_path=\"/path/to/db\", background_thread=\"disabled\")\n```\n\n### Named Instances\n\nCreate named instances for better logging:\n\n```python\nd = PersistDict(\n    database_path=\"/path/to/db\",\n    name=\"cache_db\",    # Name for identifying this instance in logs\n    verbose=True        # Enable logging\n)\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthiswillbeyourgithub%2Fpersistdict","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthiswillbeyourgithub%2Fpersistdict","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthiswillbeyourgithub%2Fpersistdict/lists"}