{"id":25703448,"url":"https://github.com/dashpay/grovedb","last_synced_at":"2025-04-05T08:06:28.553Z","repository":{"id":37029980,"uuid":"400487124","full_name":"dashpay/grovedb","owner":"dashpay","description":"Storage solution with proofs and secondary indices.","archived":false,"fork":false,"pushed_at":"2024-10-29T10:17:34.000Z","size":4021,"stargazers_count":36,"open_issues_count":12,"forks_count":17,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-10-29T12:18:44.280Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dashpay.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-08-27T11:31:35.000Z","updated_at":"2024-10-11T16:49:35.000Z","dependencies_parsed_at":"2023-07-17T01:31:03.114Z","dependency_job_id":"b9f67d87-5d20-4934-b7a8-66ad7b85a9b5","html_url":"https://github.com/dashpay/grovedb","commit_stats":null,"previous_names":["dashevo/grovedb"],"tags_count":35,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dashpay%2Fgrovedb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dashpay%2Fgrovedb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dashpay%2Fgrovedb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dashpay%2Fgrovedb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dashpay","download_url":"https://codeload.github.com/dashpay/grovedb/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247305933,"owners_count":20917208,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-25T05:29:41.792Z","updated_at":"2025-04-05T08:06:28.522Z","avatar_url":"https://github.com/dashpay.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GroveDB\n| Branch | Tests                                                                                                                           | Coverage |\n|--------|---------------------------------------------------------------------------------------------------------------------------------|------|\n| master | [![Tests](https://github.com/dashevo/grovedb/workflows/CI/badge.svg?branch=master)](https://github.com/dashevo/grovedb/actions) | [![codecov](https://codecov.io/gh/dashpay/grovedb/branch/master/graph/badge.svg?token=6Z6A6FT5HV)](https://codecov.io/gh/dashpay/grovedb) |\n\n\n*Hierarchical Authenticated Data Structure with Efficient Secondary Index Queries*\n\nGroveDB is a database system designed specifically for efficient secondary index queries, proofs, speed, and reliability. It was built for use within [Dash Platform](https://dashplatform.readme.io/docs/introduction-what-is-dash-platform), but can be easily integrated into other applications for similar use.  \n   \n## Motivation\n\nSecondary indices are crucial to any database management system. All previous solutions had certain tradeoffs depending on the problem they were trying to solve. \n\nConsider an authenticated data structure, like a Merkle tree built on a database of restaurants for example. Each restaurant has certain attributes, such as price and type:\n\n```\nstruct Restaurant{\n\tID uint32;\n\tname: String;\n\ttype: String;\n\tisVegan: bool;\n};\n```\n\nIf we have say four restaurants, we might normally commit them to a Merkle tree as follows:\n\n```mermaid\ngraph TD;\nroot--\u003eA[\" \"];\nroot--\u003eB[\" \"];\nA--\u003eAA[\"id:0\"];\nA--\u003eAB[\"id:1\"];\nB--\u003eAC[\"id:2\"];\nB--\u003eAD[\"id:3\"];\n```\n\n\nQuerying by primary key is easy and efficient. If we have a query such as  ```SELECT * WHERE ID \u003c= 2; ```, we can return the appropriate elements as well as construct an efficient range proof. However, querying by a secondary index is not efficient at all; it's likely that you will have to iterate over the entire structure. Consider the query ``` SELECT * WHERE isVegan=true;```. When sorted by primary key, the vegan restaurant won't be contiguous. Not only will the proof be nontrivial, but so will the time required to find these elements. \n\nGroveDB is a classic time-space tradeoff. It enables efficient querying on secondary indices by precomputing and committing them. A subtree of each possible queryable secondary index (up to a cap) is built and committed to our authenticated data structure. A tree of subtrees; a grove. For the same data, part of the analogous GroveDB structure might look like this:\n\n```mermaid\ngraph TD;\nroot--\u003eA[\"\\'Restaurant\\'\"];\nroot--\u003eB[\"...\"];\nA--\u003eQ[\"ID\"];\nA--\u003eW[\"name\"];\nA--\u003eE[\"kind\"];\nA--\u003eR[\"isVegan\"];\nQ--\u003eZ[\"...\"];\nW--\u003eX[\"...\"];\nE--\u003eC[\"...\"];\nR--\u003eY[\"id:2\"];\nR--\u003eU[\"id:1\"];\nR--\u003eI[\"id:0\"];\nR--\u003eO[\"id:3\"];\n```\nFrom here, a query on the secondary index ```isVegan``` would traverse to the subtree built for this secondary index. The items are not necessarily replicated, but referenced to.\n## Features\n- **Efficient secondary index queries** - Built specifically for and tailored to secondary index queries.\n- **Proofs** - Supports proofs of membership, proofs of non-membership, and range proofs.\n- **Run anywhere** - Being written in Rust, it supports all compile targets. x86, Raspberry Pis (AArch64), and Wasm. There are Node.js bindings as well.\n\n## Architecture\nInsertion and deletion work as you might expect, updating the respective subtrees and returning appropriate proofs of membership/nonmembership.\n### Tree structure(s)\nInstead of disjoint authenticated data structures, we opt for a unified one; a hierarchical, authenticated data structure based off of [Database Outsourcing with Hierarchical Authenticated Data Structures](https://ia.cr/2015/351). Elements are the most atomic pieces and can be represented in a few ways. They can be items, item references, trees, trees with items, or even trees with item references. An element contains an item, a reference to an object, or a subtree.\n\n\nThe trees are based off of our fork of Merk, with custom patches applied for better use with GroveDB. Merk is unique in the fact that it's an AVL tree, so the intermediary nodes also contain a key/value pair. Each node contains a third hash, the ```kv_hash```, in addition to the hashes of its left and right children. The ```kv_hash``` is simply computed as ```kv_hash=H(key,value)```. The node hash is then computed as ```H(kv_hash,left_child_hash,right_child_hash)```. Merk uses Blake2B, and rs-merkle uses SHA256. \n\n### Storage\nRocksDB is a key-value store, forked from LevelDB and built out by Facebook. We chose it because of its high performance, maturity, and its compatibility with our stack. Merk itself is built on top of RocksDB.\n\nWe have three types of storage: auxiliary, metadata, and tree root storage. Auxiliary storage is used to store plain key-value data which is not used in consensus.  Metadata is used to store things outside of the GroveDB usage scope. Is has no prefixes, and therefore has no relation to subtrees. It lives at a higher level. Tree root storage is used to store subtrees.\n\nA database transaction in GroveDB is a wrapper around the ```OptimisticTransactionDB``` primitive from RocksDB. An optimistic transaction hopes on average there will be only a few conflicts, which are detected at the commit stage. This is as compared to the pessimistic model, which uses a lock. \n\n## Querying\nTo query GroveDB, a path and a query item have to be supplied.\nThe path specifies the subtree, and the query item determines which nodes are selected from the subtree.\n\nGroveDB currently supports 10 query item types:\n- Key(key_name)\n- Range(start..end)\n- RangeInclusive(start..=end)\n- RangeFull(..)\n- RangeFrom(start..)\n- RangeTo(..end)\n- RangeToInclusive(..=end)\n- RangeAfter(prev..)\n- RangeAfterTo(prev..end)\n- RangeAfterToInclusive(prev..=end)\n\nThis describes a basic query system: select a subtree then select nodes from that subtree. The need to create more complex queries or add restrictions to the result set may arise, which leads us to the **PathQuery**.\n\n### PathQuery\nThe ```PathQuery``` allows for more complex queries with optional restrictions on the result set, i.e. limits and offsets. \n```\n    PathQuery\n        path: [k1, k2, ..]\n        sized_query: SizedQuery\n            limit: Optional\u003cnumber\u003e\n            offset: Optional\u003cnumber\u003e\n            query: Query\n                items: [query_item_1, query_item_2, ...],\n                default_subquery_branch: SubqueryBranch\n                    subquery_path: Optional\u003ckey\u003e\n                    subquery_value: Optional\u003cQuery\u003e\n                conditional_subquery_branches: Map\u003cQueryItem, SubqueryBranch\u003e\n                        \n```\n\nA path is needed to define the starting context for the query.\n\n#### SizedQuery\nThe ```sized_query``` determines how the result set would be restricted. It holds optional limits and offset values. \nThe ```limit``` determines the maximum size of the result set and the ```offset``` specifies the number of elements to skip before adding to the result set. \n\n#### Query\nThe ```query``` object is a recursive structure - it specifies how to select nodes from the current subtree and has the option to recursively apply another query to the result set obtained from the previous query. \n\n#### Items\nThe ```items``` are a collection of query items that decide which nodes to select from the current context (this builds a result set).  \n\nBefore describing ```default_subquery_branch``` and ```conditional_subquery_branches```, we need to define their building blocks, subquery branches:\n\n#### Subquery Branches\n```\n    subquery_path: Optional\u003cKey\u003e\n    subquery_value: Optional\u003cQuery\u003e\n```\n**Cases**  \n- ```subquery_path: true```, ```subquery_value: false```  \nThe node with the subquery path is selected and returned as the result set.\n\n- ```subquery_path: false```, ```subquery_value: true```  \nThe query held in subquery_value is applied directly to the subtree, and the result is returned as the result set.\n\n- ```subquery_path: true```, ```subquery_value: true``` \nFirst the node with the subquery path is selected and set as new context.  \nThen, the subquery value is applied to this new context, and the result is returned as the result set.\n\nThe subquery branch is used on a single node but can be applied to the result set of a previous query with the use of **default_subquery_branch** and **conditional_subquery_branches**:\n\n#### default_subquery_branch\nIf this exists, the specified subquery branch is applied to every node in the result set of the previous query.\n\n#### conditional_subquery_branch\nRather than applying a subquery branch to every node in the result set, you might want to apply it to a subset of the result set.  In such cases, we make use of a conditional subquery.  \n  \nThe conditional subquery holds a map QueryItem to SubqueryBranch.  \n```\n    Map\u003cQueryItem, SubqueryBranch\u003e\n```\nFor every node in the result set, we check if there is a query item that matches it. If there is, then the associated subquery branch is applied to that node.  Note that once a conditional subquery has been applied to a node, the default subquery does run on that node.\n\n## Merging Path Queries\nThis section describes how GroveDB deals with the merging of path queries.\n\nMergeable path queries allow for the combination of separate path queries that do different things into a single equivalent path query.  \n  \nA path query can be represented as a set of keys (path to a subtree), and a query to apply to that subtree (query can have unknown depth):  \n\np\u003csub\u003ei\u003c/sub\u003e = [k\u003csub\u003e1\u003c/sub\u003e, k\u003csub\u003e2\u003c/sub\u003e, .., k\u003csub\u003en\u003c/sub\u003e, Query]\n\nSomething very important to show is that a path query chain can be compressed at any point, i.e. you can turn a sequence of keys into a single query.  \n\nConsider p\u003csub\u003e1\u003c/sub\u003e = [k\u003csub\u003e1\u003c/sub\u003e, k\u003csub\u003e2\u003c/sub\u003e, k\u003csub\u003e3\u003c/sub\u003e]. This reads as: \n- From the root tree, select node with key k1\n- Change the context to k1, then select the node with key k2\n- Change the context to k2 and finally select the node with key k3\n\nWe can create an equivalent query to represent this, which can look like this:\n```\n    Query\n        query k1\n        cond on k1\n            query k2\n            cond on k2\n                query k3\n                cond on k3\n```\n[k\u003csub\u003e1\u003c/sub\u003e, k\u003csub\u003e2\u003c/sub\u003e, k\u003csub\u003e3\u003c/sub\u003e] =\u003e [Q\u003csub\u003e1\u003c/sub\u003e],  where Q1 is equivalent to the path array.  \n\nThis can also be done at any point in the path array, so we can have:  \n\n[k\u003csub\u003e1\u003c/sub\u003e, k\u003csub\u003e2\u003c/sub\u003e, k\u003csub\u003e3\u003c/sub\u003e] =\u003e [k\u003csub\u003e1\u003c/sub\u003e, Q\u003csub\u003e2\u003c/sub\u003e]  \n[k\u003csub\u003e1\u003c/sub\u003e, k\u003csub\u003e2\u003c/sub\u003e, k\u003csub\u003e3\u003c/sub\u003e] =\u003e [K\u003csub\u003e1\u003c/sub\u003e, K\u003csub\u003e2\u003c/sub\u003e Q\u003csub\u003e3\u003c/sub\u003e]\n\nThe path merge algorithm becomes:\n- Find the common path across the path queries\n- Compress each path array to a query after the common path index\n- Merge the compressed query into a single query\n- Return new path query with common path as path and combined query as query\n\n**Example:**  \np\u003csub\u003e1\u003c/sub\u003e =  [k\u003csub\u003e1\u003c/sub\u003e, k\u003csub\u003e2\u003c/sub\u003e, k\u003csub\u003e3\u003c/sub\u003e, Q\u003csub\u003ea\u003c/sub\u003e]  \np\u003csub\u003e2\u003c/sub\u003e =  [k\u003csub\u003e1\u003c/sub\u003e, k\u003csub\u003e2\u003c/sub\u003e, k\u003csub\u003e4\u003c/sub\u003e, Q\u003csub\u003eb\u003c/sub\u003e]\n\nCommon path = [k1, k2]  \n\nCompress each path array after common path:  \np\u003csub\u003e1\u003c/sub\u003e = [k\u003csub\u003e1\u003c/sub\u003e, k\u003csub\u003e2\u003c/sub\u003e, Q\u003csub\u003ec\u003c/sub\u003e]  \np\u003csub\u003e2\u003c/sub\u003e = [k\u003csub\u003e1\u003c/sub\u003e, k\u003csub\u003e2\u003c/sub\u003e, Q\u003csub\u003ed\u003c/sub\u003e]  \n\nMerge compressed queries:  \nQ\u003csub\u003ep\u003c/sub\u003e = Q\u003csub\u003ec\u003c/sub\u003e + Q\u003csub\u003ed\u003c/sub\u003e \n\nReturn final PathQuery:  \np\u003csub\u003ef\u003c/sub\u003e = [k\u003csub\u003e1\u003c/sub\u003e, k\u003csub\u003e2\u003c/sub\u003e, Q\u003csub\u003ep\u003c/sub\u003e]\n\n\n## Usage\nGroveDB is built for use with Dash Platform, but can be easily integrated into other applications for similar use. See its use in [rs-drive](https://github.com/dashevo/rs-drive) ([example](https://github.com/dashevo/rs-drive-example)). \n\nWe currently also have bindings for Node.js. See [node-grove](https://github.com/dashevo/grovedb/tree/master/node-grove). \n\n## Building\nFirst, install [rustup](https://www.rust-lang.org/tools/install) using your preferred method. \n\nRust nightly is required to build, so ensure you are using the correct version.\n\n```rustup install nightly```\n\nClone the repo and navigate to the main directory:\n\n```git clone https://github.com/dashevo/grovedb.git \u0026\u0026 cd grovedb```\n\nFrom here we can build: \n\n```cargo build```\n\n## grovedbg\n\nThere is a work in progress implementation of a debugger layer for GroveDB. To use this library with\nthese capabilities enabled one needs to set a dependency with `grovedbg` feature.\n\nThen, to launch visualizer tool to observe the database structure inside of your browser on a port,\nlet's say 10000, the following snippet should do:\n\n```rust\n    let db = Arc::new(GroveDb::open(\"db\").unwrap());\n    db.start_visualizer(10000);\n```\n\nJust remember to use Arc because the HTTP server might outlast the GroveDB instance.\n\n## Performance\n\nrun with ```cargo test```\n|CPU | Time |\n|----|-----|\n|Raspberry Pi 4 | 2m58.491s|\n|R5 1600AF | 33.958s |\n|R5 3600 | 25.658s |\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdashpay%2Fgrovedb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdashpay%2Fgrovedb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdashpay%2Fgrovedb/lists"}