https://github.com/51nb/marble

A high performance in-memory hive sql engine based on Apache Calcite
https://github.com/51nb/marble

Last synced: 2 months ago
JSON representation

A high performance in-memory hive sql engine based on Apache Calcite

Host: GitHub
URL: https://github.com/51nb/marble
Owner: 51nb
License: apache-2.0
Created: 2019-03-29T02:01:34.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2022-06-25T03:41:49.000Z (over 3 years ago)
Last Synced: 2025-11-20T21:05:20.466Z (4 months ago)
Language: Java
Homepage:
Size: 1.69 MB
Stars: 192
Watchers: 8
Forks: 45
Open Issues: 8
Metadata Files:
- Readme: README.MD
- License: LICENSE

Awesome Lists containing this project

awesome-java - Marble

README

          **Marble is a high performance in-memory hive sql engine based on [Apache Calcite](https://calcite.apache.org/).  

It can help you to migrate hive sql scripts to a real-time computing system.  

It also provides a convenient Table API to help you to build custom SQL engines.**

You may want another similar project: [direct-spark-sql](https://github.com/direct-spark-sql/direct-spark-sql)

## Build and run tests

**Requirements**

* Java 1.8 as a build JDK

* Maven

1.build marble

```$xslt

cd marble

mvn clean install -DskipTests

```

**(Optional)**  

if you need modify the patches of Calcite, build [calcite-patch](https://github.com/51nb/calcite-patch) project first

```$xslt

git clone https://github.com/51nb/calcite-patch.git

cd calcite-patch

mvn clean install -DskipTests

```

In the long term,we hope to merge the patches to Calcite finally.

2.import `marble` project into IDE, but **please don't import `calcite-patch` as a submodule of marble project**

3.run the test `TableEnvTest` and `HiveTableEnvTest`

## Usage

**Maven dependency**

```$xslt

        

            org.codehaus.janino

            janino

            3.0.11

        

        

            org.codehaus.janino

            commons-compiler

            3.0.11

        

        

            com.u51.marble

            marble-table-hive

            1.0.0

            

                

                    org.apache.calcite

                    calcite-core

                

                

                    org.apache.calcite

                    calcite-linq4j

                

                

                    org.codehaus.janino

                    janino

                

                

                    org.codehaus.janino

                    commons-compiler

                

            

        

```

**API Overview**

```$xslt

TableEnv.enableSqlPlanCacheSize(200);

TableEnv tableEnv = HiveTableEnv.getTableEnv();

DataTable t1 = tableEnv.fromJavaPojoList(pojoList);

DataTable t2 = tableEnv.fromJdbcResultSet(resultSet);

DataTable t3=tableEnv.fromRowListWithSqlTypeMap(rowList,sqlTypeMap);

tableEnv.addSubSchema("test");

tableEnv.registerTable("test","t1",t1);

tableEnv.registerTable("test","t2", t2);

DataTable queryResult = tableEnv.sqlQuery("select * from test.t1 join test.t2 on t1.id=t2.id");

List> rowList=queryResult.toMapList();

```

It's recommended to enable plan cache for the same sql query:

```

TableEnv.enableSqlPlanCacheSize(200);

```

`TableEnv` is the main table api to execute sql queries on a dataSet.  

It can be used to:

* convert a java pojo List or jdbc ResultSet to a `DataTable`

* register a `DataTable` in TableEnv's catalog

* add subSchemas and customized functions in TableEnv's catalog

* execute a sql query to get the result `DataTable`

The `TableEnv` supports Calcite's sql dialect by default,see it's [sql reference](https://calcite.apache.org/docs/reference.html).  

The goal of `HiveTableEnv` is to support hive sql as far as possible，developers can  aslo use

a `TableConfig` to create a new TableEnv to support other sql dialects(MysqlTableEnv,PostgreTableEnv ..etc).

**Supported hive sql features**

* specific keywords and operators

* all of UDF,UDAF

* part of UDTF

* implicit type casting

* load customized UDF,UDAF by package name

  ```

  HiveTableEnv.registerHiveFunctionPackages("com.u51.data.hive.udf"); 

  ```

  

## Benchmark

There're some benchmark tests in the  `benchmark` module,it compares flink,spark and marble on some simple

sql queries.

## Design

It shows how marble customized calcite in the sql processing flow:

![how_marble_customized_calcite](./how_marble_customized_calcite.jpg)  

You can find more details from calcite-patch's commit history.Now Marble uses calcite `1.18.0`.

The main type mapping between calcite and hive is:  

| CalciteSqlType | JavaStorageType | HiveObjectInspector |

| :---:        |     :---:      |     :---:    |

| BIGINT   | Long    | LongObjectInspector    |

| INTEGER     | Int      | IntObjectInspector    |

| DOUBLE   | Double    | DoubleObjectInspectors    |

| DECIMAL     | BigDecimal       | HiveDecimalObjectInspector      |

| VARCHAR   | String     | StringObjectInspector    |

| DATE     | Int      | DateObjectInspector      |

| TIMESTAMP     | Long       | TimestampObjectInspector     |

| ARRAY     | List      | StandardListObjectInspector     |

| ......     | ......      | ......     |

## Roadmap

*  improve compatibility with hive sql.(high priority)

*  submit patches to Calcite,make it easy to upgrade calcite-core,

some related issues:[CALCITE-2282](https://issues.apache.org/jira/browse/CALCITE-2282),[CALCITE-2973](https://issues.apache.org/jira/browse/CALCITE-2973),[CALCITE-2992](https://issues.apache.org/jira/browse/CALCITE-2992).(high priority)

*  implements UDTF in a generic way.(high priority)

*  constant folded for hive udf.(low priority)

*  use a customized sql Planner to replace the default PlannerImpl.(low priority)

*  TPC-DS queries with a customized scale.(low priority)

*  vectorized udf execution.(experimental)

*  distributed broadcast join.(experimental)

*  cost based optimizer.(experimental)  

More issues see [issues](https://github.com/51nb/marble/milestone/1).

## Contributing

Welcome contributions.

Please use the Calcite-idea-code-style.xml under the marble directory to reformat code,

and ensure that the validation of maven checker-style plugin is success after source code building.

## License

This library is distributed under terms of Apache 2 License

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/51nb/marble

Awesome Lists containing this project

README