{"id":19271730,"url":"https://github.com/tzolov/calcite-sql-rewriter","last_synced_at":"2025-04-21T22:30:30.336Z","repository":{"id":145795696,"uuid":"84372863","full_name":"tzolov/calcite-sql-rewriter","owner":"tzolov","description":"JDBC driver that converts any INSERT, UPDATE and DELETE statements into append-only INSERTs. Instead of updating rows in-place it inserts the new version of the row along with version metadata","archived":false,"fork":false,"pushed_at":"2017-03-27T10:49:20.000Z","size":297,"stargazers_count":80,"open_issues_count":2,"forks_count":22,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-01T16:12:20.624Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tzolov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-03-08T22:39:07.000Z","updated_at":"2024-08-02T07:57:03.000Z","dependencies_parsed_at":"2023-05-18T18:45:18.444Z","dependency_job_id":null,"html_url":"https://github.com/tzolov/calcite-sql-rewriter","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tzolov%2Fcalcite-sql-rewriter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tzolov%2Fcalcite-sql-rewriter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tzolov%2Fcalcite-sql-rewriter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tzolov%2Fcalcite-sql-rewriter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tzolov","download_url":"https://codeload.github.com/tzolov/calcite-sql-rewriter/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250143547,"owners_count":21382031,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T20:33:52.049Z","updated_at":"2025-04-21T22:30:30.325Z","avatar_url":"https://github.com/tzolov.png","language":"Java","funding_links":[],"categories":["\u003ca name=\"Java\"\u003e\u003c/a\u003eJava"],"sub_categories":[],"readme":"# SQL Rewriter\nJDBC driver that converts any `INSERT`, `UPDATE` and `DELETE` statements into append-only `INSERT`s. Instead of\nupdating rows in-place it inserts the new version of the row along with version metadata.\n[ ![Download](https://api.bintray.com/packages/big-data/maven/calcite-sql-rewriter/images/download.svg) ](https://bintray.com/big-data/maven/calcite-sql-rewriter/_latestVersion)\n\n*SQL-on-Hadoop* data management systems such as [Apache HAWQ](http://hawq.incubator.apache.org/) do not offer the same\nstyle of INSERT, UPDATE, and DELETE that users expect of traditional RDBMS. Unlike transactional systems, big-data\nanalytical queries are dominated by SELECT over millions or billions of rows. Analytical databases are optimized for\nthis kind of workload. The storage systems are optimized for high throughput scans, and commonly implemented as\nimmutable (append-only) persistence stores. No in-place updates are allowed.\n\nThe *SQL-on-Hadoop* systems naturally support append-only operations such as `INSERT`. The `UPDATE` or `DELETE` demand\nan alternative approach:\n[HAWQ-304](https://issues.apache.org/jira/browse/HAWQ-304), [HIVE-5317](https://issues.apache.org/jira/browse/HIVE-5317)\n\nThis project _emulates_ `INSERT`, `UPDATE` and `DELETE` by turning them into append-only `INSERT`s. Instead of updating\nrows in-place it inserts the new version of the row using two additional metadata columns: `version_number` and\n`subsequent_version_number` of either `TIMESTAMP` or `BIGINT` type.\n\n---\n_Note that this project can be used as workaround. A complete solution will be provided by\n[HAWQ-304](https://issues.apache.org/jira/browse/HAWQ-304)._\n\n---\n\n### How to Use\n\n##### Code\n\nUse as a standard JDBC `Connection`:\n\n```java\npublic class Main {\n\tpublic static void main(String[] argv) throws Exception {\n\t\tClass.forName(org.apache.calcite.jdbc.Driver.class.getName());\n\t\tProperties info = new Properties();\n\t\tinfo.setProperty(\"lex\", \"JAVA\"); // Enables case sensitivity\n\t\tinfo.setProperty(\"model\", \"path/to/myModel.json\"); // See section below\n\t\tConnection connection = DriverManager.getConnection(\"jdbc:calcite:\", info);\n\n\t\t// use connection as usual\n\t}\n}\n```\nConsult [sql-rewriter-springboot-example](./sql-rewriter-springboot-example) and [journalled-sql-rewriter-example](./journalled-sql-rewriter-example) \nfor more elaborated examples. \n\n[SqlLine](#how-to-use-sqlline) offers a handy command-line tool for testing `sql-rewriter`.\n\n\n##### Model\n\nTo connect to the SQL-Rewrite JDBC driver you need to provide a [model](https://calcite.apache.org/docs/model.html).\nModels can be JSON files, or built programmatically. A model is comprised of two group of attributes:\n\n1. Calcite generic attributes, as explained here [model attributes](https://calcite.apache.org/docs/model.html). Note\n   that to use journalling on a schema, the `type` must be `custom` and the `factory` must be\n   `org.apache.calcite.adapter.jdbc.JournalledJdbcSchema$Factory` (see example below).\n1. `sql-rewrite` specific attributes set via the `operand` properties. The table below explains the specific properties.\n\n| Property                           | Description | Default |\n| ---------------------------------- |:------------|:--------|\n| `dataSource` \u003csup\u003e\u0026dagger;\u003c/sup\u003e   | Class name to use as the underlying `DataSource` | *none* |\n| `connection` \u003csup\u003e\u0026dagger;\u003c/sup\u003e   | Path to the backend jdbc connection configuration file | *none* |\n| `jdbcDriver` \u003csup\u003e\u0026dagger;\u003c/sup\u003e   | See section below | *none* |\n| `jdbcUrl` \u003csup\u003e\u0026dagger;\u003c/sup\u003e      | See section below | *none* |\n| `jdbcUser` \u003csup\u003e\u0026dagger;\u003c/sup\u003e     | See section below | *none* |\n| `jdbcPassword` \u003csup\u003e\u0026dagger;\u003c/sup\u003e | See section below | *none* |\n| `jdbcSchema`                       | The schema name in the database. Note that due to [CALCITE-1692](https://issues.apache.org/jira/browse/CALCITE-1692) this *must* match the `name` | *none* |\n| `journalSuffix`                    | Journal table suffix | `_journal` |\n| `journalVersionField`              | Journal table version number column name | `version_number` |\n| `journalSubsequentVersionField`    | Journal table delete flag column name | `subsequent_version_number` |\n| `journalVersionType`               | The type of the version columns. Either `TIMESTAMP` or `BIGINT` | `TIMESTAMP` |\n| `journalDefaultKey`                | List of columns to use as primary keys by default (applies when tables do not have an explicit list given in `journalTables`) | *none* |\n| `journalTables`                    | List of journalled tables to be managed. Expressions involving other tables will pass-through unchanged.\u003cbr /\u003eThis can be a list of table names, or a map of table names to primary key columns. | *none* |\n\n\u003csup\u003e\u0026dagger;\u003c/sup\u003e: Provide *one* of: `dataSource` *or* `connection` *or* `jdbcDriver` \u0026amp; `jdbcUrl`.\n\nFor example:\n\n```json\n{\n  \"version\": \"1.0\",\n  \"defaultSchema\": \"doesntmatter\",\n  \"schemas\": [\n    {\n      \"name\": \"hr\",\n      \"type\": \"custom\",\n      \"factory\": \"org.apache.calcite.adapter.jdbc.JournalledJdbcSchema$Factory\",\n      \"operand\": {\n        \"connection\": \"myTestConnection.json\",\n        \"jdbcSchema\": \"hr\",\n        \"journalSuffix\": \"_journal\",\n        \"journalVersionField\": \"version_number\",\n        \"journalSubsequentVersionField\": \"subsequent_version_number\",\n        \"journalDefaultKey\": [\"id\"],\n        \"journalTables\": {\n          \"emps\": [\"empid\"],\n          \"depts\": [\"deptno\"]\n        }\n      }\n    }\n  ]\n}\n```\n\n##### Backend DB Connection\n\nBackend DB connection configuration can be provided inside `model.json`, or in a separate file referenced by\n`model.json` (via the `connection` operand).\n\nThe connection configuration contains the common JDBC connection properties like driver, jdbc URL, and credentials.\n\n| Property        | Description                                                        | Default |\n| --------------- |:-------------------------------------------------------------------|:--------|\n| `jdbcDriver`    | JDBC driver Class name. For example: `org.postgresql.Driver`       | *none*  |\n| `jdbcUrl`       | JDBC URL. For example: `jdbc:postgresql://localhost:5432/postgres` | *none*  |\n| `jdbcUser`      | The database user on whose behalf the connection is being made.    | *blank* |\n| `jdbcPassword`  | The database user\u0026rsquo;s password.                                | *blank* |\n\nFor example:\n\n```json\n{\n  \"jdbcDriver\": \"org.postgresql.Driver\",\n  \"jdbcUrl\": \"jdbc:postgresql://localhost:5432/postgres\",\n  \"jdbcUser\": \"myDatabaseUser\",\n  \"jdbcPassword\": \"myDatabasePassword\"\n}\n```\n\n### How it Works\n\n`sql-rewrite` leverages [Apache Calcite](https://calcite.apache.org/) to implement a JDBC adapter between the end-users\nand the backend *SQL-on-Hadoop* system. It exposes a fully-fledged JDBC interface to the end-users while internally\nconverts the incoming `INSERT`, `UPDATE` and `DELETE` into append-only `INSERT`s and forwards later to the backend DB\n(e.g. [Apache HAWQ](http://hawq.incubator.apache.org/)).\n\nLets have a Department table called `depts`, with `deptno` (key) and `department_name` columns:\n```sql\nCREATE TABLE hr.depts (\n  deptno                    SERIAL                   NOT NULL,\n  department_name           TEXT                     NOT NULL,\n  PRIMARY KEY (deptno)\n);\n```\nThe `sql-rewrite` convention requires you to create a corresponding journal table named `\u003cyour-table-name\u003e_journal`,\nwith the same schema as the original table plus two metadata columns: `version_number` and `subsequent_version_number`\nof `TIMESTAMP` or `BIGINT` type.  The column order does not matter.\n```sql\nCREATE TABLE hr.depts_journal (\n  deptno                    SERIAL                   NOT NULL,\n  version_number            TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,\n  subsequent_version_number TIMESTAMP WITH TIME ZONE NULL     DEFAULT NULL,\n  department_name           TEXT                     NOT NULL,\n  PRIMARY KEY (deptno, version_number)\n);\n```\n* `version_number` \u0026mdash; version when the row that was inserted. An increasing number, the highest value represents\n  the current row state.\n* `subsequent_version_number` \u0026mdash; denotes the next version of the record. Since existing records cannot be updated,\n  this will usually be NULL (the exception being deleted records, where this is set to match the `version_number`).\n  Also background archival tasks can populate this for older records as an optimisation.\n\nNote that the new key is composed of the original key(s) (in this example `deptno`) and the `version_number`!\n\nBelow are few sample `INSERT`, `UPDATE`, `DELETE` and `SELECT` statements and their internal representation.\n\n1. Issuing an `INSERT` against the Calcite JDBC driver\n```sql\nINSERT INTO hr.depts (deptno, department_name) VALUES (666, 'Pivotal');\n```\nis translated into following SQL statement\n```sql\nINSERT INTO hr.depts_journal (deptno, department_name) VALUES (666, 'Pivotal');\n```\nNote that the table name is replaced from `depts` to `depts_journal`. Actually the `depts` table may not even exist.\nData is always stored by the `depts_journal` table!\n\n2. `UPDATE` issued against the Calcite JDBC\n```sql\nUPDATE hr.depts SET department_name='New Name' WHERE deptno = 666;\n```\nis expanded into an `INSERT` / `SELECT` statement like this\n```sql\nINSERT INTO hr.depts_journal (deptno, department_name)\n  SELECT\n    deptno,\n    'New Name' as department_name\n  FROM (\n    SELECT *, MAX(version_number) OVER (PARTITION BY deptno) AS last_version_number\n    FROM hr.depts_journal\n  ) AS last_link\n  WHERE subsequent_version_number IS NULL\n        AND version_number = last_version_number\n        AND deptno = 666;\n```\n3. `DELETE` issued against the Calcite JDBC\n```sql\nDELETE FROM hr.depts WHERE deptno=666;\n```\nis expanded into an `INSERT` / `SELECT` statement like this\n```sql\nINSERT INTO hr.depts_journal (deptno, department_name, version_number, subsequent_version_number)\n  SELECT\n    deptno,\n    department_name,\n    CURRENT_TIMESTAMP AS version_number,\n    CURRENT_TIMESTAMP AS subsequent_version_number\n  FROM (\n    SELECT *, MAX(version_number) OVER (PARTITION BY deptno) AS last_version_number\n    FROM hr.depts_journal\n  ) AS last_link\n  WHERE subsequent_version_number IS NULL\n        AND version_number = last_version_number\n        AND deptno = 666;\n```\n\n4. `SELECT` query against the Calcite JDBC\n```sql\nSELECT * FROM hr.depts;\n```\nis converted into `SELECT` such as\n```sql\nSELECT\n  deptno,\n  department_name\nFROM (\n  SELECT *, MAX(version_number) OVER (PARTITION BY deptno) AS last_version_number\n  FROM hr.depts_journal\n) AS link_last\nWHERE subsequent_version_number IS NULL AND version_number = last_version_number;\n```\nFor every `deptno` only the row with the highest `version_number` is returned.\n\nThe `MAX(version_number) OVER (PARTITION BY deptno)` [window function](https://www.postgresql.org/docs/9.6/static/tutorial-window.html)\ncomputes the max `version_number` per `deptno`.\n\n### Limitations\n\nWhen using this project, it is important to be aware of the following limitations:\n\n* When using `TIMESTAMP` versioning, concurrent updates to the same record can lead to data loss. If users A and B both\n  send an update to the same record simultaneously, one of the users changes will be lost, even if they were updating\n  different columns. Similarly, if one user deletes a record while another is updating it, the update may\n  \u0026ldquo;win\u0026rdquo;, causing the record to not be deleted. For `BIGINT` versioning, one of the users will get a\n  duplicate key error.\n* Unique indexes cannot be defined. Similarly, \u0026ldquo;UPSERT\u0026rdquo; (`ON CONFLICT UPDATE`) is not supported.\n* Table manipulations (DDL) are not supported.\n* Only ANSI SQL syntax can be used. For example, `INSERT`\u0026hellip;`RETURNING` is not supported.\n* Performing `INSERT`s with explicit key values will cause strange behaviour if the key currently or previously existed.\n  (for `BIGINT` versioning it will be rejected if the key ever existed, and for `TIMESTAMP` it will be accepted even\n  if an existing non-deleted record has the same key).\n\n### How to use SqlLine\n\nOn the target Posgres/Greenplum or HAWQ create test schema `hr` and table `depts_journal`:\n```sql\nDROP SCHEMA IF EXISTS hr CASCADE;\nCREATE SCHEMA hr;\n\nCREATE TABLE hr.depts_journal (\n  deptno                    SERIAL                   NOT NULL,\n  version_number            TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,\n  subsequent_version_number TIMESTAMP WITH TIME ZONE NULL     DEFAULT NULL,\n  department_name           TEXT                     NOT NULL,\n  PRIMARY KEY (deptno, version_number)\n);\n```\n\nstart `sqlline` from the root folder:\n```\n./sqlline\n```\nConnect to the [journalled-sql-rewriter-example model](journalled-sql-rewriter-example/src/main/resources/myTestModel.json).\n```\nsqlline\u003e !connect jdbc:calcite:lex=JAVA;model=journalled-sql-rewriter-example/src/main/resources/myTestModel.json\n```\nHit enter for username and password.\n\nInsert new rows:\n```sql\n0: jdbc:calcite:lex=JAVA\u003e INSERT INTO hr.depts (deptno, department_name) VALUES (666, 'TEST1');\n0: jdbc:calcite:lex=JAVA\u003e INSERT INTO hr.depts (deptno, department_name) VALUES (999, 'TEST2');\n```\nCheck content:\n```sql\n0: jdbc:calcite:lex=JAVA\u003e select * from hr.depts;\n+------------+-----------------+\n|   deptno   | department_name |\n+------------+-----------------+\n| 666        | TEST1           |\n| 999        | TEST2           |\n+------------+-----------------+\n2 rows selected (0.035 seconds)\n```\nUpdate a `deptno=666`\n\n```sql\n0: jdbc:calcite:lex=JAVA\u003e UPDATE hr.depts SET department_name='NEW VALUE' WHERE deptno=666;\n```\n\nDelete a `deptno=999`\n\n```sql\n0: jdbc:calcite:lex=JAVA\u003e DELETE FROM hr.depts WHERE deptno=999;\n```\nCheck table content:\n\n```sql\n0: jdbc:calcite:lex=JAVA\u003e select * from hr.depts;\n+------------+-----------------+\n|   deptno   | department_name |\n+------------+-----------------+\n| 666        | NEW VALUE       |\n+------------+-----------------+\n1 row selected (0.02 seconds)\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftzolov%2Fcalcite-sql-rewriter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftzolov%2Fcalcite-sql-rewriter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftzolov%2Fcalcite-sql-rewriter/lists"}