https://github.com/51nb/marble
A high performance in-memory hive sql engine based on Apache Calcite
https://github.com/51nb/marble
Last synced: about 2 months ago
JSON representation
A high performance in-memory hive sql engine based on Apache Calcite
- Host: GitHub
- URL: https://github.com/51nb/marble
- Owner: 51nb
- License: apache-2.0
- Created: 2019-03-29T02:01:34.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2022-06-25T03:41:49.000Z (over 3 years ago)
- Last Synced: 2025-08-04T14:57:14.862Z (5 months ago)
- Language: Java
- Homepage:
- Size: 1.69 MB
- Stars: 188
- Watchers: 8
- Forks: 45
- Open Issues: 8
-
Metadata Files:
- Readme: README.MD
- License: LICENSE
Awesome Lists containing this project
- awesome-java - Marble
README
**Marble is a high performance in-memory hive sql engine based on [Apache Calcite](https://calcite.apache.org/).
It can help you to migrate hive sql scripts to a real-time computing system.
It also provides a convenient Table API to help you to build custom SQL engines.**
You may want another similar project: [direct-spark-sql](https://github.com/direct-spark-sql/direct-spark-sql)
## Build and run tests
**Requirements**
* Java 1.8 as a build JDK
* Maven
1.build marble
```$xslt
cd marble
mvn clean install -DskipTests
```
**(Optional)**
if you need modify the patches of Calcite, build [calcite-patch](https://github.com/51nb/calcite-patch) project first
```$xslt
git clone https://github.com/51nb/calcite-patch.git
cd calcite-patch
mvn clean install -DskipTests
```
In the long term,we hope to merge the patches to Calcite finally.
2.import `marble` project into IDE, but **please don't import `calcite-patch` as a submodule of marble project**
3.run the test `TableEnvTest` and `HiveTableEnvTest`
## Usage
**Maven dependency**
```$xslt
org.codehaus.janino
janino
3.0.11
org.codehaus.janino
commons-compiler
3.0.11
com.u51.marble
marble-table-hive
1.0.0
org.apache.calcite
calcite-core
org.apache.calcite
calcite-linq4j
org.codehaus.janino
janino
org.codehaus.janino
commons-compiler
```
**API Overview**
```$xslt
TableEnv.enableSqlPlanCacheSize(200);
TableEnv tableEnv = HiveTableEnv.getTableEnv();
DataTable t1 = tableEnv.fromJavaPojoList(pojoList);
DataTable t2 = tableEnv.fromJdbcResultSet(resultSet);
DataTable t3=tableEnv.fromRowListWithSqlTypeMap(rowList,sqlTypeMap);
tableEnv.addSubSchema("test");
tableEnv.registerTable("test","t1",t1);
tableEnv.registerTable("test","t2", t2);
DataTable queryResult = tableEnv.sqlQuery("select * from test.t1 join test.t2 on t1.id=t2.id");
List> rowList=queryResult.toMapList();
```
It's recommended to enable plan cache for the same sql query:
```
TableEnv.enableSqlPlanCacheSize(200);
```
`TableEnv` is the main table api to execute sql queries on a dataSet.
It can be used to:
* convert a java pojo List or jdbc ResultSet to a `DataTable`
* register a `DataTable` in TableEnv's catalog
* add subSchemas and customized functions in TableEnv's catalog
* execute a sql query to get the result `DataTable`
The `TableEnv` supports Calcite's sql dialect by default,see it's [sql reference](https://calcite.apache.org/docs/reference.html).
The goal of `HiveTableEnv` is to support hive sql as far as possible,developers can aslo use
a `TableConfig` to create a new TableEnv to support other sql dialects(MysqlTableEnv,PostgreTableEnv ..etc).
**Supported hive sql features**
* specific keywords and operators
* all of UDF,UDAF
* part of UDTF
* implicit type casting
* load customized UDF,UDAF by package name
```
HiveTableEnv.registerHiveFunctionPackages("com.u51.data.hive.udf");
```
## Benchmark
There're some benchmark tests in the `benchmark` module,it compares flink,spark and marble on some simple
sql queries.
## Design
It shows how marble customized calcite in the sql processing flow:

You can find more details from calcite-patch's commit history.Now Marble uses calcite `1.18.0`.
The main type mapping between calcite and hive is:
| CalciteSqlType | JavaStorageType | HiveObjectInspector |
| :---: | :---: | :---: |
| BIGINT | Long | LongObjectInspector |
| INTEGER | Int | IntObjectInspector |
| DOUBLE | Double | DoubleObjectInspectors |
| DECIMAL | BigDecimal | HiveDecimalObjectInspector |
| VARCHAR | String | StringObjectInspector |
| DATE | Int | DateObjectInspector |
| TIMESTAMP | Long | TimestampObjectInspector |
| ARRAY | List | StandardListObjectInspector |
| ...... | ...... | ...... |
## Roadmap
* improve compatibility with hive sql.(high priority)
* submit patches to Calcite,make it easy to upgrade calcite-core,
some related issues:[CALCITE-2282](https://issues.apache.org/jira/browse/CALCITE-2282),[CALCITE-2973](https://issues.apache.org/jira/browse/CALCITE-2973),[CALCITE-2992](https://issues.apache.org/jira/browse/CALCITE-2992).(high priority)
* implements UDTF in a generic way.(high priority)
* constant folded for hive udf.(low priority)
* use a customized sql Planner to replace the default PlannerImpl.(low priority)
* TPC-DS queries with a customized scale.(low priority)
* vectorized udf execution.(experimental)
* distributed broadcast join.(experimental)
* cost based optimizer.(experimental)
More issues see [issues](https://github.com/51nb/marble/milestone/1).
## Contributing
Welcome contributions.
Please use the Calcite-idea-code-style.xml under the marble directory to reformat code,
and ensure that the validation of maven checker-style plugin is success after source code building.
## License
This library is distributed under terms of Apache 2 License