{"id":19458464,"url":"https://github.com/postgrespro/pg_credereum","last_synced_at":"2025-04-25T06:30:27.045Z","repository":{"id":149601050,"uuid":"133019760","full_name":"postgrespro/pg_credereum","owner":"postgrespro","description":"Prototype of PostgreSQL extension bringing some properties of blockchain to the relational DBMS","archived":false,"fork":false,"pushed_at":"2018-05-24T12:59:36.000Z","size":69,"stargazers_count":63,"open_issues_count":1,"forks_count":7,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-04-24T10:48:42.422Z","etag":null,"topics":["audit","blockchain","crypto","ethereum","postgres","postgresql","trusted"],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/postgrespro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-05-11T09:30:23.000Z","updated_at":"2024-09-03T06:16:52.000Z","dependencies_parsed_at":"2024-01-18T09:04:37.398Z","dependency_job_id":"1826b6ab-3a0f-4a6d-ac88-701968ffec8b","html_url":"https://github.com/postgrespro/pg_credereum","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Fpg_credereum","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Fpg_credereum/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Fpg_credereum/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Fpg_credereum/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/postgrespro","download_url":"https://codeload.github.com/postgrespro/pg_credereum/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250766955,"owners_count":21483894,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audit","blockchain","crypto","ethereum","postgres","postgresql","trusted"],"created_at":"2024-11-10T17:27:11.526Z","updated_at":"2025-04-25T06:30:26.753Z","avatar_url":"https://github.com/postgrespro.png","language":"C","funding_links":[],"categories":["C","Compiled list"],"sub_categories":["plv8:"],"readme":"pg\\_credereum\n=============\n\n[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](https://raw.githubusercontent.com/postgrespro/pg_credereum/master/LICENSE)\n\nOverview\n--------\n\npg\\_credereum is a PostgreSQL extension that provides a cryptographically\nverifiable audit capability for a PostgreSQL database, bringing some properties\nof blockchain to relational DBMS. pg\\_credereum is not a production-ready\nsolution yet, it's a prototype for the upcoming Credereum platform.\n\nIn a classic client-server DBMS, a client relies on the server to guarantee\ndata integrity and authenticity. A client has to believe that the data\nprovided by the server is correct without any proof. Even if the server supports\naudit features, the audit data could be forged by the server administrator or\ncompromised by an intruder.\n\nThe blockchain systems allow to trace their state to individial actions of\ntheir clients, which are authenticated by their respective digital signatures.\nCredereum aims to bring this blockchain feature to a client-server\nrelational DBMS. In Credereum, each modification of the database contents\nis digitally signed by the client that performs this modification.\nBased on these signatures, the server can build a proof that the current\ndatabase state results from the previous actions of the clients.\nThe clients can check this proof to verify that the data was not tampered with.\n\nA well-known \"double spending\" problem the blockchain implementations have to\nsolve also applies to Credereum. A malicious database administrator can\nmaintain multiple forks of the database, and return query results to different\nusers from different forks. To prevent it, Credereum\ncreates a cryptographic digest of the entire database. This digest is\nperiodically uploaded to a trusted storage, which clients can read. This storage\nmust be immutable, that is, once the data is recorded in the storage, it must\nbe impossible to change this data retroactively. This way, a client can be\nsure that once a database digest reaches the trusted storage, it can't be\nremoved or changed anymore. Thus, any retroactive modification of the database\ncontents can be easily detected.\n\nIn principle, various systems can serve as a trusted storage. For example, you\ncan use a public blockchain like Ethereum or Bitcoin with a smart contract to\nstore the hashes. Another approach is to use a third-party server that is\ntrusted to forbid retroactive changes. Yet another example is a cluster of\nservers, where each server signs the hash it accepts, and the hash is assumed to\nbe accepted once it is signed by the majority of the servers. pg\\_credereum uses\nan Ethereum smart contract as a trusted storage.\n\n\nImplementation\n-----------\n\nIn Credereum, each modification of the database contents is signed by the\nclient that made it. The client must sign both the original digest version\nbefore the initiated changes and the final version with the modified data.\n\nThe digest must have the following properties:\n* It is compact, compared to the size of the entire database.\n* It allows to detect forgery (i.e., it is hard to change the database\n   contents without changing the digest).\n* The client can verify that the digest corresponds to the correct values of\n   the modified rows.\n* The digest does not divulge the values of the unmodified rows.\n\nA well-known data structure that meets these requirements is a Merkle prefix\ntree (\"merklix tree\"). pg\\_credereum builds a single merklix tree for all the\ntables it manages. The tree is built over the set of key-value pairs, each pair\nrepresenting a single row. The values are given by JSON-encoded rows, and the\nkeys are given by a string consisting of the schema qualified table name\nfollowed by a string that encodes the value of the primary key as a 64-bit\nbinary number. Hashes in merklix tree are calculated as follows:\n\n* For a leaf node, the hash value is calculated as sha256 hash of the `value`\n  field.\n* For a non-leaf node, the hash value is calculated as sha256 of concatenation\n  of child1 key, child1 hash, ... etc., for all the children.\n\nThe hash of the root node of the merklix tree summarizes the contents of the\nentire database. We also refer to this hash as the \"root database hash\". A\n\"Merkle proof\" is used to prove that a particular leaf node of the tree contains\na particular value, without revealing the whole tree. Merkle proof is a subtree\nof merklix tree that contains the root node, the leaf nodes to be proven, and\nthe path from the root to these leaves. The Merkle proof also contains the nodes\ndirectly referenced by the path from the root to the leaves to be proven. The\nwhole contents of these nodes are not needed, so they are represented just by\ntheir hash values.\n\n### Signing a Transaction\n\nSince the tables are represented as a merklix tree, a modification of these\ntables can be encoded as a pair of Merkle proofs. Both proofs encode only the\nrows that are being changed, with the \"original\" proof encoding the original row\nvalues, and the \"updated\" proof -- the updated values.\n\nTo sign a transaction, the client has to complete the following steps:\n\n1. Check the authenticity of the original state of the database as described\n   below.\n2. Verify that the perfromed modification transforms the original proof into the\n   updated proof.\n3. Sign the concatenation of the hashes of these proofs to authorize the\n   transaction.\n\n### Using Transaction Blocks\n\nThe simple scheme described above, with each transaction depending on the\nprevious one, effectively serializes database access, therefore limiting the\ntransaction throughput. As an optimization, pg\\_credereum uses \"blocks of\ntransactions\", that is, sequential sets of transactions. These blocks are\ncreated periodically by a background worker. The database contents at\nthe end of the block are represented with a Merkle proof for all the rows that\nwere changed within the block. The subsequent transactions use this proof as\nthe digest of the original database state. This way, multiple transactions can\nrun concurrently. A limitation of this approach is that a particular row can\nbe modified only once inside a particular block. Transactions that violate this\nrule are rolled back.\n\nThe end of a block is a suitable point to publish the database digest to the\ntrusted storage. pg\\_credereum can be configured to send the hash of a block to\nan Ethereum smart contract each time a transaction block is completed.\n\n### Verifying Data Authenticity\n\nGiven particular row values, a client should be able to check that these values\nare authentic, that is, they are a result of a series of modifications\nauthorized by other clients. With pg\\_credereum, a typical verification workflow\nis as follows:\n\n1. Find the initial database state that is trusted to be valid.\n   It may be an empty database, or, in a more practical case,\n   a block published to a trusted storage.\n2. Request the history of transactions that have modified the given\n   rows since the initial state.\n3. Check that each of these transaction was authorized by some other\n   client, and the composition of these transactions transforms the\n   initial state into the final state. If this is the case, the final\n   state is authentic.\n\nThis procedure is used, in particular, to verify the authenticity of a given\nblock, starting from some previous block that is known to be authentic.\n\n\nInstallation\n------------\n\npg\\_credereum is a PostgreSQL extension that requires PostgreSQL 10. Before you\nbuild and install the extension, make sure that the following conditions are\nmet:\n\n * PostgreSQL major version is 10.\n * You have installed the development package of PostgreSQL, or built\n   PostgreSQL from source.\n * go-ethereum, curl and jasson are installed.\n * The `pg\\_config` command can be found in your `PATH`, or the `PG\\_CONFIG`\n   variable is pointing to it.\n\nA typical installation procedure is as follows:\n\n1. Download and install pg\\_credereum:\n\n```shell\ngit clone https://github.com/postgrespro/pg_credereum.git\ncd pg_credereum\nmake\nsudo make install\n```\n2. Add pg\\_credereum to `shared_preload_libraries` to\n   register the block collector background worker.\n3. Set the `pg_credereum.database` variable to the database\n   you are going to control with pg\\_credereum. Optionally,\n   you can also customize other GUC variables listed in the\n   [Setup](#Setup) section.\n4. Create the pg\\_credereum extension in the database:\n\n```shell\npsql DB -c \"CREATE EXTENSION pg_credereum;\"\n```\n\nUsage\n-----\n\n### Setup\n\nFor now, pg\\_credereum usage is limited to a single database of a PostgreSQL\ninstance. This limitation might be addressed in the future versions. The name\nof the database managed by pg\\_credereum is specified by the GUC variable\n`pg_credereum.database`.\n\npg\\_credereum includes a \"block collector\" background worker, which\nperiodically creates transaction blocks.  The time interval is defined\n`pg_credereum.block_period` GUC variable. Block collector has to know in which\nschema pg\\_credereum is defined, so you must specify this schema in the\n`pg_credereum.schema` GUC variable.\n\nBlock collector can also store block hashes into a trusted storage. When\n`pg_credereum.eth_end_point`, `pg_credereum.eth_source_addr` and\n`pg_credereum.eth_contract_addr` GUCs are defined, the block collector tries to\nupload each block hash to a smart contract at the\n`pg_credereum.eth_contract_addr` address using Ethereum RPC at\n`pg_credereum.eth_end_point`, sending the transactions from the Ethereum\naddress specified by `pg_credereum.eth_source_addr`. Note that the\ncorresponding Ethereum account must be unlocked on the Ethereum node. When the\nblock collector fails to store the block hash, it skips the block and tries to\nstore the hash of the next block. The hashes are stored using the\n`saveHash(uint256)` method of the smart contract.\n\nAll pg\\_credereum GUC variables are listed below:\n\n GUC                              | Type      | Default    | Description\n--------------------------------- | --------- | ---------- | --------------------------------------------------------------\n`pg_credereum.block_period`       | `integer` | 1000       | Time interval for block packing, in milliseconds\n`pg_credereum.block_retry_period` | `integer` | 5000       | Time interval for block packing retry after failure, in milliseconds\n`pg_credereum.database`           | `string`  | `postgres` | Name of the database pg\\_credereum is working with.\n`pg_credereum.schema`             | `string`  | `public`   | Schema where pg\\_credereum extension is installed.\n`pg_credereum.eth_end_point`      | `string`  | `NULL`     | Ethereum trusted storage RPC endpoint in format `host[:port]`.\n`pg_credereum.eth_source_addr`    | `string`  | `NULL`     | Source address to spend ether from.\n`pg_credereum.eth_contract_addr ` | `string`  | `NULL`     | Smart contract to store top database hashes.\n\nBy default, pg\\_credereum does not track changes to any tables.  To track a\ntable, add a `credereum_acc_trigger()` trigger after each insert, update and\ndelete. You have to revoke truncate right on that table from non-superusers,\nbecause pg\\_credereum can't handle truncates. Also note that the primary key of\nthis table must be a single column named `id` of type `bigint`.\n\nThe following SQL snippet demonstrates how to set up such a table:\n```sql\nCREATE TABLE t (id serial PRIMARY KEY, value int NOT NULL);\nCREATE TRIGGER t_after AFTER INSERT OR UPDATE OR DELETE ON t\nFOR EACH ROW EXECUTE PROCEDURE credereum_acc_trigger();\nREVOKE TRUNCATE ON t FROM public;\n```\n\n### API\n\nThis section describes how to use functions and tables provided by\npg\\_credereum to sign transactions and verify data authenticity in the database.\n\nThe procedure to sign a transaction is as follows:\n1. Client begins a new transaction.\n2. Client performs DML operations on the tables.\n3. Client run `credereum_get_changeset()` function to get the changeset\n   for the performed DML operations, in the form of Merkle proof.\n4. Client checks that the received changeset really corresponds to the changes\n   this client made at step #2.\n4. Client signs the transaction (a transition from one root database hash to\n   another) using `credereum_sign_transaction(pubkey text, sign bytea)` function.\n5. Client commits the transaction.\n\nThe procedure to acquire and validate the history of given database rows is as\nfollows: \n1. User acquires the history of given database rows with appropriate Merkle\n   proofs using `credereum_merkle_proof(keys varbit[])` function and fetches the\n   information about transactional history and blocks from `credereum_tx_log`\n   and `credereum_block`. When the required information is received, the user\n   checks its consistency.\n2. User fetches hashes stored in the trusted storage, and checks that they match\n   block hashes received from the database.\n\n#### `credereum_get_changeset()`\n\nReturns changes made by the current transaction. The changes are shown as a\nset of rows, each of these rows corresponding to a merklix node. The following\ncolumns are returned:\n\n Column name | Type       | Description\n------------ | ---------- | ---------------------------------------------------------------------------------------------------------------\n`key`        | `varbit`   | Key of merklix node\n`children`   | `varbit[]` | Array of children keys (for non-leaf nodes)\n`leaf`       | `bool`     | Is this a leaf node?\n`hash`       | `bytea`    | Hash sum validating this node and descendants\n`value`      | `json`     | Value stored in leaf node\n`next`       | `bool`     | `false` for the original values of the rows modified by the current transaction, and `true` for the new values.\n\nLogically, the result of `credereum_get_changeset()` function consists of the\ntwo Merkle proofs mentioned above:\n\n* Merkle proof of the original values of the rows modified by the current\n  transaction (`next = false`)\n* Merkle proof of the current values of the rows modified by the current\n  transaction (`next = true`)\n\nEach Merkle proof is a subtree of merklix tree. The `value` field of leaf nodes\nis a json representation of the corresponding rows. Some non-leaf nodes can\nhave `children` set to `NULL`, and some leaf nodes can have `value` field set\nto `NULL`. This means that the entire subtree or the leaf respectively were not\nmodified by this transaction.\n\nNote that the same row can't be modified twice within the same block. On an\nattempt to do this, unique constraint violation error is generated. If\nsuch errors happen too frequently, consider decreasing the value of the\n`pg_credereum.block_period` GUC variable.\n\n#### `credereum_merkle_proof(keys varbit[])`  \n\nReturns the history of a particular set of rows.  `keys` are the merlix tree\nkeys of the nodes, as described in the [Implementation](#implementation)\nsection. The return value is a set of rows, with each row representing a\nmerklix node. The columns are as follows:\n\nColumn name      | Type       | Description\n---------------- | ---------- | -----------\n`block_num`      | `bigint`   | The number of the block this node belongs to\n`transaction_id` | `bigint`   | Transaction ID this node belongs to (NULL if it's a block node)\n`key`            | `varbit`   | Key of merklix node\n`children`       | `varbit[]` | Array of children keys (for non-leaf nodes)\n`leaf`           | `bool`     | Is this a leaf node?\n`hash`           | `bytea`    | Hash sum validating this node and descendants\n`value`          | `json`     | Value stored in leaf node\n\nLogically, the result of `credereum_merkle_proof(keys varbit[])` function is a\nforest of Merkle proofs. Since a transaction or even a block typically modifies\nonly a relatively small subset of the database, this forest contains common\nbranches.\n\nEach tree in the forest is identified by pair `block_number`, `transaction_id`.\nEach block and each transaction have its own tree root. However, `children`\narray of the tree node may contain links to nodes of the previous block tree.\nThe general rule is the following: if the key is referenced by the `children`\narray and there is no key with the same values of `block_number`,\n`transaction_id`, then the child should be found in the most recent\n`block_number` (`transaction_id` is NULL). The rules of tree hashing are the\nsame as described in 'signing transaction' section.\n\nThe tree of a particular block must contain merged changes of every transaction\nin the same `block_number`.\n\n#### `credereum_tx_log` table\nThis table contains the list of transactions. It has the following columns:\n\n Column name     | Type     | Description\n---------------- | -------- | ------------------------------------------------------------\n`block_num`      | `bigint` | The number of the block that contains this transaction\n`transaction_id` | `bigint` | Transaction ID this node belongs to (NULL if it's a block node)\n`tx_hash`        | `bytea`  | Hash sum of the transaction\n`root_hash`      | `bytea`  | Root database hash after this transaction\n`prev_root_hash` | `bytea`  | Root database hash before this transaction\n`pubkey`         | `text`   | Public key of the user who signed this transaction\n`sign`           | `bytea`  | Digital signature of transaction\n\nTransaction hash (`credereum_tx_log.tx_hash`) is calculated as sha256 hash\nof concatenation of `root_hash`, `prev_root_hash`, `pubkey`, and `sign`.\n\n#### `credereum_block` table\nThis table contains the list of blocks. It has the following columns:\n\n Column name | Type        | Description\n------------ | ----------- | -------------------------------------------------\n`block_numr` | `bigint`    | The serial number of this block\n`hash`       | `bytea`     | Hash sum of this block\n`prev_hash`  | `bytea`     | Hash of previous block\n`root_hash`  | `bytea`     | Root database hash after completion of this block\n\nBlock hash (`credereum_block.hash`) is calculated as concetenation of sha256\nhash of concatenation of `prev_hash`, hashes of transactions in this block\nordered by `transaction_id`, `root_hash`.\n\nIf hashes of the blocks are also uploaded to a trusted storage in Ethereum, then\nhashes stored in `credereum_block.hash` need to be compared with the hashes\nstored in Ethereum.  Note, that some hashes might be missing in Ethereum,\nbecause block collector might skip some blocks on error.  However, the\nsequence of the blocks can't be altered.\n\nSample Application\n------------------\n\nThe `sample` folder of this repository contains an example of how to use\npg\\_credereum from a Python 3 application, as well as how to interface it with\nEthereum. Required Python packages can be installed with\n`pip3 install -r requirements.txt`. The bash scrip `run` demonsrates overall\nusage of pg\\_credereum. If you run this script directly, note that it will kill\nyour `geth` process. It also requires `geth`, `solc` and PostgreSQL binaries\nto be found in PATH. The script starts a PostgreSQL instance with a table\nmanaged by pg\\_credereum. It also starts a private Ethereum network with a\nsmart contract to store the block hashes. After updating the table with the\n`sample.py` script, it runs the `history_proof.py` script that checks that the\nstate of the database is consistent with what is stored in the Ethereum smart\ncontract.\n\nBesides the scripts mentioned above, there are some other files:\n * `hts_eth/HashStorage.sol` -- a reference implementation of the Ethereum hash\n   storage contract. \n * `credereum.py` -- Python helper functions for dealing with pg\\_credereum,\n   which are used by `sample.py` and `history_proof.py`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpostgrespro%2Fpg_credereum","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpostgrespro%2Fpg_credereum","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpostgrespro%2Fpg_credereum/lists"}