{"id":13642218,"url":"https://github.com/mkrd/DictDataBase","last_synced_at":"2025-04-20T16:30:54.824Z","repository":{"id":57140552,"uuid":"334609134","full_name":"mkrd/DictDataBase","owner":"mkrd","description":"A python NoSQL dictionary database, with concurrent access and ACID compliance","archived":false,"fork":false,"pushed_at":"2024-09-06T04:21:11.000Z","size":6260,"stargazers_count":234,"open_issues_count":16,"forks_count":11,"subscribers_count":7,"default_branch":"main","last_synced_at":"2024-11-07T09:07:37.126Z","etag":null,"topics":["acid","compression","database","dict","dictionary","documentdb","json","multiprocessing","multithreading","nosql","python","storage"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mkrd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-31T08:36:28.000Z","updated_at":"2024-10-29T11:50:18.000Z","dependencies_parsed_at":"2023-02-09T03:46:24.547Z","dependency_job_id":"8261f20c-661f-4ab8-a047-de6c573b7656","html_url":"https://github.com/mkrd/DictDataBase","commit_stats":{"total_commits":350,"total_committers":5,"mean_commits":70.0,"dds":"0.11428571428571432","last_synced_commit":"e86b3810ea55e5b410c3b1d6e4502731e525318e"},"previous_names":[],"tags_count":25,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkrd%2FDictDataBase","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkrd%2FDictDataBase/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkrd%2FDictDataBase/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkrd%2FDictDataBase/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mkrd","download_url":"https://codeload.github.com/mkrd/DictDataBase/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223832874,"owners_count":17210735,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["acid","compression","database","dict","dictionary","documentdb","json","multiprocessing","multithreading","nosql","python","storage"],"created_at":"2024-08-02T01:01:28.669Z","updated_at":"2025-04-20T16:30:54.812Z","avatar_url":"https://github.com/mkrd.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"![Logo](https://github.com/mkrd/DictDataBase/blob/main/assets/logo.png?raw=true)\n\n[![Downloads](https://static.pepy.tech/badge/dictdatabase)](https://pepy.tech/project/dictdatabase)\n![Tests](https://github.com/mkrd/DictDataBase/actions/workflows/test.yml/badge.svg)\n![Coverage](https://github.com/mkrd/DictDataBase/blob/main/assets/coverage.svg?raw=1)\n\nDictDataBase is a fast document-based database that uses json files or compressed json files for storage.\n- **Multi threading and multi processing safe**. Multiple processes on the same machine\ncan simultaneously read and write to dicts without losing data.\n- **ACID** compliant. Unlike TinyDB, it is suited for concurrent environments.\n- **No Conflict resolution** required. Unlike with ZODB, lock-based access control is used, such that conflicts never occur.\n- **No database server** required. Simply import DictDataBase in your project and use\nit.\n- **Compression**. Configure if the files should be stored as raw json or as json\ncompressed with zlib.\n- **Fast**. Key-value pairs inside a json file can be accessed quickly and efficiently because the keys are indexed.\n- **Tested** with 98%+ coverage on Python 3.8 to 3.13.\n\n### Why use DictDataBase\n- Your application concurrently reads and writes data from multiple processes or threads.\n- Using database server is a bit too much for your application.\n    - But you need [ACID](https://en.wikipedia.org/wiki/ACID) guarantees.\n- Your use case requires reading key-value pairs from very large json files repeatedly. (For example, DictDataBase can handle about 2000 reads per second when reading single key-value pairs from a 2.5GB json file with 20000 key-value pairs.)\n- You need to repeatedly read and write many smaller json files.\n- Your use case is suited for working with json data, or you have to work with a lot of\njson data.\n\n### Why not DictDataBase\n- If your storage is slow.\n- Your use cases requires repeatedly modifying or writing data in a single very large json file\n- If a relational database is better suited for your use case.\n- If you need to read files that are larger than your system's RAM.\n\nInstall\n========================================================================================\n\n```sh\npip install dictdatabase\n```\n\nConfiguration\n========================================================================================\nThe following configuration parameters can be modified using `DDB.config`:\n\n### Storage directory\nSet storage_directory to the path of the directory that will contain your json files:\n```python\nDDB.config.storage_directory = \"./ddb_storage\" # Default value\n```\n\n### Compression\nIf you want to use compressed files, set use_compression to `True`.\nThis will make the db files significantly smaller and might improve performance if your\ndisk is slow. However, the files will not be human readable.\n```python\nDDB.config.use_compression = False # Default value\n```\n\n### Indentation\nSet the way how written json files should be indented. Behaves exactly like\n`json.dumps(indent=...)`. It can be an `int` for the number of spaces, the tab\ncharacter, or `None` if you don't want the files to be indented.\n```python\nDDB.config.indent = \"\\t\" # Default value\n```\nNotice: If `DDB.config.use_orjson = True`, then the value can only be 2 (spaces) or\n0/None for no indentation.\n\n### Use orjson\nYou can use the orjson encoder and decoder if you need to.\nThe standard library json module is sufficient most of the time.\nHowever, orjson is a lot more performant in virtually all cases.\n```python\nDDB.config.use_orjson = True # Default value\n```\n\nUsage\n========================================================================================\n\nImport\n----------------------------------------------------------------------------------------\n\n```python\nimport dictdatabase as DDB\n```\n\nCreate a file\n----------------------------------------------------------------------------------------\nThis library is called DictDataBase, but you can actually use any json serializable object.\n```python\nusers_dict = {\n   \"u1\": { \"name\" : \"Ben\", \"age\": 30, \"job\": \"Software Engineer\" },\n   \"u2\": { \"name\" : \"Sue\", \"age\": 21, \"job\": \"Architect\" },\n   \"u3\": { \"name\" : \"Joe\", \"age\": 50, \"job\": \"Manager\" },\n}\nDDB.at(\"users\").create(users_dict)\n```\nThere is now a file called `users.json` or `users.ddb` in your specified storage\ndirectory depending on if you use compression.\n\n\nCheck if file or sub-key exists\n----------------------------------------------------------------------------------------\n```python\nDDB.at(\"users\").exists()\n\u003e\u003e\u003e True  # File exists\nDDB.at(\"users\", key=\"u10\").exists()\n\u003e\u003e\u003e False # Key \"u10\" not in users\nDDB.at(\"users\", key=\"u2\").exists()\n\u003e\u003e\u003e True\n```\n\nRead dicts\n----------------------------------------------------------------------------------------\n\n```python\nd = DDB.at(\"users\").read()\nd == users_dict # True\n\n# Only partially read Joe\njoe = DDB.at(\"users\", key=\"u3\").read()\njoe == users_dict[\"Joe\"] # True\n```\n\n\u003e Note: Doing a partial read like with `DDB.at(\"users\", key=\"Joe\").read()` will only\n\u003e return the value of the key if the key is at the root indentation level.\n\u003e Example: You can get \"a\" from {\"a\" : 3}, but not from {\"b\": {\"a\": 3}}.\n\nIt is also possible to only read a subset of keys based on a filter callback:\n\n```python\nDDB.at(\"numbers\").create({\"a\", 1, \"b\", 2, \"c\": 3})\n\nabove_1 = DDB.at(\"numbers\", where=lambda k, v: v \u003e 1).read()\n\u003e\u003e\u003e above_1 == {\"b\", 2, \"c\": 3}\n```\n\u003e The `where` callback is a function that takes two parameters, the key and the value.\n\n\nWrite dicts\n----------------------------------------------------------------------------------------\n\n```python\nwith DDB.at(\"users\").session() as (session, users):\n   users[\"u3\"][\"age\"] = 99\nprint(DDB.at(\"users\", key=\"u3\").read()[\"age\"])\n\u003e\u003e\u003e 99\n```\n\u003e If you do not call session.write(), changes will not be written to disk!\n\nPartial writing\n----------------------------------------------------------------------------------------\nImagine you have a huge json file with many purchases.\nThe json file looks like this: `{\u003cid\u003e: \u003cpurchase\u003e, \u003cid\u003e: \u003cpurchase\u003e, ...}`.\nNormally, you would have to read and parse the entire file to get a specific key.\nAfter modifying the purchase, you would also have to serialize and write the\nentire file again. With DDB, you can do it more efficiently:\n```python\nwith DDB.at(\"purchases\", key=\"3244\").session() as (session, purchase):\n    purchase[\"status\"] = \"cancelled\"\n    session.write()\n```\nAfterwards, the status is updated in the json file.\nHowever, DDB did only efficiently gather the one purchase with id 134425, parsed\nits value, and serialized that value alone before writing again. This is several\norders of magnitude faster than the naive approach when working with big files.\n\n\nFolders\n----------------------------------------------------------------------------------------\n\nYou can also read and write to folders of files. Consider the same example as\nbefore, but now we have a folder called `purchases` that contains many files\n`\u003cid\u003e.json`. If you want to open a session or read a specific one, you can do:\n\n```python\nDDB.at(\"purchases/\u003cid\u003e\").read()\n# Or equivalently:\nDDB.at(\"purchases\", \"\u003cid\u003e\").read()\n```\n\nTo open a session or read all, do the following:\n```python\nDDB.at(\"purchases/*\").read()\n# Or equivalently:\nDDB.at(\"purchases\", \"*\").read()\n```\n\n### Select from folder\n\nIf you have a folder containing many json files, you can read them selectively\nbased on a function. The file is included if the provided function returns true\nwhen it get the file dict as input:\n\nTo open a session or read all, do the following:\n```python\nfor i in range(10):\n    DDB.at(\"folder\", i).create({\"a\": i})\n# Now in the directory \"folder\", 10 files exist\nres = DDB.at(\"folder/*\", where=lambda x: x[\"a\"] \u003e 7).read() # .session() also possible\nassert ress == {\"8\": {\"a\": 8}, \"9\": {\"a\": 9}} # True\n```\n\n\n\nPerformance\n========================================================================================\n\nIn preliminary testing, DictDataBase showed promising performance.\n\n### SQLite vs DictDataBase\nIn each case, `16` parallel processes were spawned to perform `128` increments\nof a counter in `4` tables/files. SQLite achieves `2435 operations/s` while\nDictDataBase managed to achieve `3143 operations/s`.\n\n### More tests\nIt remains to be tested how DictDatabase performs in different scenarios, for\nexample when multiple processes want to perform full writes to one big file.\n\n\nAdvanced\n========================================================================================\n\nSleep Timeout\n----------------------------------------------------------------------------------------\nDictDataBase uses a file locking protocol to coordinate concurrent file accesses.\nWhile waiting for a file where another thread or process currently has exclusive\naccess rights, the status of the file lock is periodically checked. You can set\nthe timout between the checks:\n\n```python\nDDB.locking.SLEEP_TIMEOUT = 0.001 # 1ms, default value\n```\n\nA value of 1 millisecond is good and it is generally not recommended to change it,\nbut you can still tune it to optimize performance in your use case.\n\n\nLock aquisition timeout\n----------------------------------------------------------------------------------------\nAQUIRE_LOCK_TIMEOUT specifies the maximum duration to wait for acquiring a lock before\ngiving up and throwing a timeout error.\n\n```python\nDDB.locking.REMOVE_ORPHAN_LOCK_TIMEOUT = 60.0 # 60s, default value\n```\n\n\nAPI Reference\n========================================================================================\n\n### `at(path) -\u003e DDBMethodChooser:`\nSelect a file or folder to perform an operation on.\nIf you want to select a specific key in a file, use the `key` parameter,\ne.g. `DDB.at(\"file\", key=\"subkey\")`. The key value is only returned if the key\nis at the root level of the json object.\n\nIf you want to select an entire folder, use the `*` wildcard,\neg. `DDB.at(\"folder\", \"*\")`, or `DDB.at(\"folder/*\")`. You can also use\nthe `where` callback to select a subset of the file or folder.\n\nIf the callback returns `True`, the item will be selected. The callback\nneeds to accept a key and value as arguments.\n\nArgs:\n- `path`: The path to the file or folder. Can be a string, a\ncomma-separated list of strings, or a list.\n- `key`: The key to select from the file.\n- `where`: A function that takes a key and value and returns `True` if the\nkey should be selected.\n\nBeware: If you select a folder with the `*` wildcard, you can't use the `key`\nparameter.\nAlso, you cannot use the `key` and `where` parameters at the same time.\n\nDDBMethodChooser\n----------------------------------------------------------------------------------------\n\n### `exists() -\u003e bool:`\nCreate a new file with the given data as the content. If the file\nalready exists, a FileExistsError will be raised unless\n`force_overwrite` is set to True.\n\nArgs:\n- `data`: The data to write to the file. If not specified, an empty dict\nwill be written.\n- `force_overwrite`: If `True`, will overwrite the file if it already\nexists, defaults to False (optional).\n\n\n### `create(data=None, force_overwrite: bool = False):`\nIt creates a database file at the given path, and writes the given database to\nit\n:param db: The database to create. If not specified, an empty database is\ncreated.\n:param force_overwrite: If True, will overwrite the database if it already\nexists, defaults to False (optional).\n\n### `delete()`\nDelete the file at the selected path.\n\n### `read(self, as_type: T = None) -\u003e dict | T | None:`\nReads a file or folder depending on previous `.at(...)` selection.\n\nArgs:\n- `as_type`: If provided, return the value as the given type.\nEg. as_type=str will return str(value).\n\n### `session(self, as_type: T = None) -\u003e DDBSession[T]:`\nOpens a session to the selected file(s) or folder, depending on previous\n`.at(...)` selection. Inside the with block, you have exclusive access\nto the file(s) or folder.\nCall `session.write()` to write the data to the file(s) or folder.\n\nArgs:\n- `as_type`: If provided, cast the value to the given type.\nEg. as_type=str will return str(value).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkrd%2FDictDataBase","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmkrd%2FDictDataBase","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkrd%2FDictDataBase/lists"}