Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/starlake-ai/jsqltranspiler

Rewrite BigQuery, Redshift, Snowflake and Databricks queries into DuckDB compatible SQL (with deep transformation of functions, data types and format characters) using Java.
https://github.com/starlake-ai/jsqltranspiler

bigquery databricks duckdb java query redshift rewrite snowflake transpiler

Last synced: 2 months ago
JSON representation

Rewrite BigQuery, Redshift, Snowflake and Databricks queries into DuckDB compatible SQL (with deep transformation of functions, data types and format characters) using Java.

Awesome Lists containing this project

README

        

# [JSQLTranspiler](https://starlake.ai/starlake/index.html#sql-transpiler) - Transpile Dialect, Resolve Columns, Show Lineage

[![Sonatype Nexus (Snapshots)](https://img.shields.io/nexus/s/ai.starlake.jsqltranspiler/jsqltranspiler?server=https%3A%2F%2Fs01.oss.sonatype.org)](https://s01.oss.sonatype.org/#nexus-search;quick~ai.starlake.jsqltranspiler/jsqltranspiler)
[![JavaDoc](https://javadoc.io/badge2/ai.starlake.jsqltranspiler/jsqltranspiler/javadoc.svg)](https://javadoc.io/doc/ai.starlake.jsqltranspiler/jsqltranspiler)
[![Gradle CI](https://github.com/starlake-ai/jsqltranspiler/actions/workflows/snapshot.yml/badge.svg)](https://github.com/starlake-ai/jsqltranspiler/actions/workflows/snapshot.yml)
[![Code Quality](https://app.codacy.com/project/badge/Grade/80374649d914462ebd6e5b160a1ebdbb)](https://app.codacy.com/gh/starlake-ai/jsqltranspiler/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
[![Coverage](https://coveralls.io/repos/github/starlake-ai/jsqltranspiler/badge.svg)](https://coveralls.io/github/starlake-ai/jsqltranspiler)
[![License](https://img.shields.io/badge/License-Apache-blue)](#LICENSE)
[![Issues](https://img.shields.io/github/issues/starlake-ai/jsqltranspiler)](https://github.com/starlake-ai/jsqltranspiler/issues)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://egghead.io/courses/how-to-contribute-to-an-open-source-project-on-github?af=5236ad)

A pure Java stand-alone SQL Transpiler, Column- and Lineage Resolver for translating various large RDBMS SQL Dialects into a few smaller RDBMS Dialects for Unit Testing. Based on JSQLParser.

Supports `SELECT` queries as well as `INSERT`, `UPDATE`, `DELETE` and `MERGE` statements.

Internal Functions will be rewritten based on the actual meaning and purpose of the function (since the DuckDB `Any()` function does not necessarily behave like the RDBMS specific `Any()`). Respecting different function arguments count, order and type.

Rewrite of Window- and Aggregate-Functions with full coverage of the RDBMS specific published samples.
The [matrix of supported features and functions](https://docs.google.com/spreadsheets/d/1jK6E1s2c0CWcw9rFeDvALdZ5wCshztdtlAHuNDaKQt4/edit?usp=sharing) is shared on Google Sheets.

## Dialects

**Input**: Google BigQuery, Databricks, Snowflake, Amazon Redshift

**Output**: DuckDB

## Transpile Example

Google BigQuery specific SQL

```sql
-- BigQuery specific DATE() function
SELECT
DATE(2016, 12, 25) AS date_ymd,
DATE(DATETIME '2016-12-25 23:59:59') AS date_dt,
DATE(TIMESTAMP '2016-12-25 05:30:00+07', 'America/Los_Angeles') AS date_tstz;

/* Output
"date_ymd","date_dt","date_tstz"
"2016-12-15","2016-12-15","2016-12-15"
*/
```

will become DuckDB compatible SQL

```sql
-- DuckDB compliant rewrite producing the same result
SELECT
MAKE_DATE(2016, 12, 25) AS date_ymd,
CAST(DATETIME '2016-12-25 23:59:59' AS DATE) AS date_dt,
CAST(TIMESTAMP '2016-12-25 05:30:00+07' AS DATE) AS date_tstz;

/* Output
"date_ymd","date_dt","date_tstz"
"2016-12-15","2016-12-15","2016-12-15"
*/
```

## Column Lineage Example
For the simplified schema definition and the given query
```java
String[][] schemaDefinition = {
// Table A with Columns col1, col2, col3, colAA, colAB
{"a", "col1", "col2", "col3", "colAA", "colAB"},

// Table B with Columns col1, col2, col3, colBA, colBB
{"b", "col1", "col2", "col3", "colBA", "colBB"}
};

String sqlStr =
"SELECT Case when Sum(colBA + colBB)=0 then c.col1 else a.col2 end AS total FROM a INNER JOIN (SELECT * FROM b) c ON a.col1 = c.col1";

JdbcResultSetMetaData resultSetMetaData = new JSQLColumResolver(databaseMetaData).getResultSetMetaData(sqlStr);
```

the ResultSetMetaData return a list of JdbcColumns, each traversable using the `TreeNode` interface. The resulting Column Lineage can be illustrated as:
```
SELECT
└─total AS CaseExpression: CASE WHEN Sum(colBA + colBB) = 0 THEN c.col1 ELSE a.col2 END
├─WhenClause: WHEN Sum(colBA + colBB) = 0 THEN c.col1
│ ├─EqualsTo: Sum(colBA + colBB) = 0
│ │ └─Function: Sum(colBA + colBB)
│ │ └─Addition: colBA + colBB
│ │ ├─c.colBA → b.colBA : Other
│ │ └─c.colBB → b.colBB : Other
│ └─c.col1 → b.col1 : Other
└─a.col2 : Other
```

## Resolve ``*`` Star Operator Example

For the simplified schema definition and the given query with Star Operators
```java
String[][] schemaDefinition = {
// Table A with Columns col1, col2, col3, colAA, colAB
{"a", "col1", "col2", "col3", "colAA", "colAB"},

// Table B with Columns col1, col2, col3, colBA, colBB
{"b", "col1", "col2", "col3", "colBA", "colBB"}
};

String sqlStr = "SELECT * FROM ( (SELECT * FROM b) c inner join a on c.col1 = a.col1 ) d;";
String resolved = new JSQLColumResolver(schemaDefinition).getResolvedStatementText(sqlStr);
```

the query will be resolved and (optionally rewritten into):
```sql
SELECT d.col1 /* Resolved Column*/
, d.col2 /* Resolved Column*/
, d.col3 /* Resolved Column*/
, d.colBA /* Resolved Column*/
, d.colBB /* Resolved Column*/
, d.col1_1 /* Resolved Column*/
, d.col2_1 /* Resolved Column*/
, d.col3_1 /* Resolved Column*/
, d.colAA /* Resolved Column*/
, d.colAB /* Resolved Column*/
FROM ( ( SELECT b.col1 /* Resolved Column*/
, b.col2 /* Resolved Column*/
, b.col3 /* Resolved Column*/
, b.colba /* Resolved Column*/
, b.colbb /* Resolved Column*/
FROM b ) c
INNER JOIN a
ON c.col1 = a.col1 ) d
;
```

Alternatively, the information about returned columns can be fetched as JDBC `ResultsetMetaData` (without actually executing this query):

```java
import java.sql.DatabaseMetaData;

String sqlStr = "SELECT * FROM ( ( SELECT * FROM sales ) c INNER JOIN listing a ON c.listid = a.listid ) d;";
// the meta data of catalgogs, schemas, tables, columns, either virtually and physically
DatabaseMetaData databaseMetaData = ...;
ResultSetMetaData resultSetMetaData = new JSQLColumResolver(databaseMetaData).getResultSetMetaData(sqlStr);
System.out.println(resultSetMetaData.toString());

/*
"#","label","name","table","schema","catalog","type","type name","precision","scale","display size"
"1","salesid","salesid","d",,"JSQLTranspilerTest","INTEGER","INTEGER","0","32","0"
"2","listid","listid","d",,"JSQLTranspilerTest","INTEGER","INTEGER","0","32","0"
... (shortened) ...
"17","totalprice","totalprice","d",,"JSQLTranspilerTest","DECIMAL","DECIMAL(8,2)","0","8","0"
"18","listtime","listtime","d",,"JSQLTranspilerTest","TIMESTAMP","TIMESTAMP","0","0","0"
*/

```

## How to use

### Java Library

Maven Artifact with Snapshot support:
```xml


jsqltranspiler-snapshots

true

https://s01.oss.sonatype.org/content/repositories/snapshots/

ai.starlake.jsqltranspiler
jsqltranspiler
0.7-SNAPSHOT

```

Calling the Java class:
```java
import ai.starlake.transpiler.JSQLTranspiler;

String providedSQL="SELECT Nvl(null, 1) a";
String expectedSQL="SELECT Coalesce(null, 1) a";

String result = JSQLTranspiler.transpile(providedSQL, Dialect.AMAZON_REDSHIFT);
assertEquals(expectedSQL, result);
```

### Web API
```shell
curl -X 'POST' \
'https://starlake.ai/api/v1/transpiler/transpile?dialect=SNOWFLAKE' \
-H 'accept: text/plain' \
-H 'Content-Type: text/plain' \
-d 'SELECT Nvl(null, 1) a'
```

### Java Command Line Interface
```text
usage: java -jar JSQLTranspilerCLI.jar [-d | --any | --bigquery |
--databricks | --snowflake | --redshift] [-D | --duckdb]
[-i ] [-o ] [-h]

-d,--input-dialect The SQL dialect to parse.
[ANY*, GOOGLE_BIG_QUERY, DATABRICKS,
SNOWFLAKE, AMAZON_REDSHIFT]
--any Interpret the SQL as Generic Dialect
[DEFAULT].
--bigquery Interpret the SQL as Google BigQuery Dialect.
--databricks Interpret the SQL as DataBricks Dialect.
--snowflake Interpret the SQL as Snowflake Dialect.
--redshift Interpret the SQL as Amazon Snowflake
Dialect.
-D,--output-dialect The SQL dialect to write.
[DUCKDB*]
--duckdb Write the SQL in the Duck DB Dialect
[DEFAULT].
-i,--inputFile The input SQL file or folder.
- Read from STDIN when no input file
provided.
-o,--outputFile The out SQL file for the formatted
statements.
- Create new SQL file when folder provided.
- Append when existing file provided.
- Write to STDOUT when no output file
provided.
-h,--help Print the help synopsis.
```

### TimeKey substitution

The transpiler can substitute time key expressions such as `CURRENT_DATE` or `CURRENT_TIMESTAMP` with System's properties like

```java
System.setProperty("CURRENT_TIMESTAMP", "2024-06-09 16:24:23.123");
String expected = "SELECT TIMESTAMP WITHOUT TIME ZONE '2024-06-09T16:24:23.123'";
String actual = JSQLTranspiler.transpileQuery("SELECT CURRENT_TIMESTAMP", JSQLTranspiler.Dialect.ANY);

Assertions.assertThat(actual).isEqualTo(expected);
```

Alternatively parameters can be provided as `Map` (which would take precedence over any System's properties):

```java
String expected = "SELECT TIME WITHOUT TIME ZONE '17:24:23.123'";
String actual =
JSQLTranspiler.transpileQuery(
"SELECT CURRENT_TIME"
, JSQLTranspiler.Dialect.ANY
, Map.of("CURRENT_TIME", "17:24:23.123")
);

Assertions.assertThat(actual).isEqualTo(expected);
```
### Error Handling

In case the query refers to objects not existing in the provided database schema, the `JSQLColumnResolver` offers three modes:

- `STRICT` will let the resolution and lineage fail with an error message, which (first) object were not resolved
- `IGNORE` will simply ignore the node of the unresolvable object
- `LENIENT` will insert a "virtual" column node pointing on the unresolvable column of an unknown type

`STRICT` is the default error mode. It can be changed for the `JdbcMetaData` before passing it to the `JSQLColumnResolver` as shown in the code example below:

```java

String sqlStr =
"with \"mycte\" as (\n"
+ " select invalidColumn, \"c\".\"id\", CURRENT_TIMESTAMP() as \"timestamp\"\n"
+ " from nonExistingTable \"o\", \"sales\".\"customers\" \"c\"\n"
+ " where \"o\".\"customer_id\" = \"c\".\"id\"\n"
+ ")\n"
+ "select \"id\", sum(\"amount\") as sum, \"timestamp\"\n"
+ "from \"mycte\"\n"
+ "group by \"mycte\".\"id\", \"mycte\".\"timestamp\"";

// STRICT MODE will throw an Exception
ResultSetMetaData res =
JSQLColumResolver.getResultSetMetaData(sqlStr, JdbcMetaData.copyOf(metaData.setErrorMode(JdbcMetaData.ErrorMode.STRICT)));

// LENIENT MODE will show an unresolvable node
ResultSetMetaData res =
JSQLColumResolver.getResultSetMetaData(sqlStr, JdbcMetaData.copyOf(metaData.setErrorMode(JdbcMetaData.ErrorMode.LENIENT)));
String lineage =
"SELECT\n"
+ " ├─mycte.id → sales.customers.id : Other\n"
+ " ├─sum AS Function sum\n"
+ " │ └─unresolvable\n"
+ " └─mycte.timestamp → timestamp : Other\n";

// IGNORE will skip and supress the unresolvable node
ResultSetMetaData res =
JSQLColumResolver.getResultSetMetaData(sqlStr, JdbcMetaData.copyOf(metaData.setErrorMode(JdbcMetaData.ErrorMode.IGNORE)));
String lineage =
"SELECT\n"
+ " ├─mycte.id → sales.customers.id : Other\n"
+ " ├─sum AS Function sum\n"
+ " └─mycte.timestamp → timestamp : Other\n";
```
[More Details at JSQLColumnResolverTest](https://github.com/starlake-ai/jsqltranspiler/blob/f964a3e69e583abb637baa569cf96dd4b0350043/src/test/java/ai/starlake/transpiler/JSQLColumnResolverTest.java#L590)

### Unsupported features

Please refer to the [Feature Matrix](https://docs.google.com/spreadsheets/d/1jK6E1s2c0CWcw9rFeDvALdZ5wCshztdtlAHuNDaKQt4/edit?usp=sharing):

- DuckDB's Number and Currency formatting is very limited right now
- `Geography`, `JSon` and `XML` functions have not been implemented yet, but are planned
- `SELECT * REPLACE(...)` on DuckDB works very differently (replaces value instead of label)

## License

**JSQLTranspiler** is licensed under [**Apache License, Version 2.0**](https://www.apache.org/licenses/LICENSE-2.0).