https://github.com/alibaba/SparkCube

SparkCube is an open-source project for extremely fast OLAP data analysis. SparkCube is an extension of Apache Spark.
https://github.com/alibaba/SparkCube

Last synced: over 1 year ago
JSON representation

SparkCube is an open-source project for extremely fast OLAP data analysis. SparkCube is an extension of Apache Spark.

Host: GitHub
URL: https://github.com/alibaba/SparkCube
Owner: alibaba
License: apache-2.0
Created: 2020-03-16T08:51:09.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2023-03-06T14:16:24.000Z (over 3 years ago)
Last Synced: 2024-11-05T04:34:03.478Z (over 1 year ago)
Language: Scala
Homepage:
Size: 154 KB
Stars: 130
Watchers: 12
Forks: 52
Open Issues: 6
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

awesome-java - SparkCube

README

# SparkCube ![](https://api.travis-ci.org/alibaba/SparkCube.svg?branch=master)

SparkCube is an open-source project for extremely fast OLAP data analysis. SparkCube is an extension of [Apache Spark](http://spark.apache.org).

## Build from source

```
mvn -DskipTests package
```

The default Spark version used is 2.4.4.

## Run tests

```
mvn test
```

## Use with Apache Spark

There are several configs you should add to your Spark configuration.

| config | value | comment | |
| ---- | ---- | ---- | ---- |
| spark.sql.extensions | com.alibaba.sparkcube.SparkCube | Add extension. | Required |
| spark.sql.cache.tab.display | true | To show web UI in the certain application, typically Spark Thriftserver. | Required |
| spark.sql.cache.useDatabase | db1,db2,dbn | A list of database names separated by comma. Only tables and views from these databases will be considered for cube building. | Required |
| spark.sql.cache.cacheByPartition | true/false | To store cache by partition. | Optional |
| spark.driver.extraClassPath | /path/to/this/jar | For web UI resources. | Required |

With the configurations above set in your Spark thriftserver, you should be able to see "Cube Management" Tab from the UI of Spark Thriftserver after any `SELECT` command is run. Then you can create/delete/build cubes from this web page.

After you have created appropriate cube, you can query the cube from any spark-sql client using Spark SQL. Note that the cube can be created against table or view, so you can join tables as view to create a complex cube.

If you want a more detailed tutorial for cube creating/building/dropping etc., please refer to
https://help.aliyun.com/document_detail/149293.html

## Learning materials

(Slides)

https://www.slidestalk.com/AliSpark/SparkRelationalCache78971

https://www.slidestalk.com/AliSpark/SparkRelationalCache2019_57927

(Blogs)

https://yq.aliyun.com/articles/703046

https://yq.aliyun.com/articles/703154

https://yq.aliyun.com/articles/713746

https://yq.aliyun.com/articles/725413

(Blogs In English)

https://community.alibabacloud.com/blog/rewriting-the-execution-plan-in-the-emr-spark-relational-cache_595267

https://www.alibabacloud.com/blog/use-emr-spark-relational-cache-to-synchronize-data-across-clusters_595301

https://www.alibabacloud.com/blog/using-data-preorganization-for-faster-queries-in-spark-on-emr_595599

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/alibaba/SparkCube

Awesome Lists containing this project

README