https://github.com/netease/lakehouse-benchmark
A benchmark tool for lakehouses.
https://github.com/netease/lakehouse-benchmark
Last synced: 3 months ago
JSON representation
A benchmark tool for lakehouses.
- Host: GitHub
- URL: https://github.com/netease/lakehouse-benchmark
- Owner: NetEase
- License: other
- Created: 2022-10-08T09:08:45.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2023-03-12T10:01:27.000Z (over 3 years ago)
- Last Synced: 2025-04-14T17:07:42.779Z (about 1 year ago)
- Language: Java
- Homepage:
- Size: 114 MB
- Stars: 11
- Watchers: 13
- Forks: 4
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Ch-Benchmark for Data-Lake
Base on https://github.com/timveil-cockroach/oltpbench with a focus on chbenchmark for data lake. Support Trino and Presto.
## Data Lake Ch-Benchmarks

- Generate the initial data set to mysql. The cofig of mysql is config/mysql/sample_chbenchmark_config.xml. User need to
modify config. The param "scalefactor" is the number of warehouses to determine the size of data. The shell to generate
data is
```
java -jar lakehouse-benchmark.jar -b tpcc,chbenchmark -c config/mysql/sample_chbenchmark_config.xml --create=true --load=true
```
- Synchronize the static data from mysql to data lake through flink CDC tools [cdc-porject]()
- Turn on the TPC-C and generate incremental data to mysql. Shell is
```
java -jar lakehouse-benchmark.jar -b tpcc,chbenchmark -c config/mysql/sample_chbenchmark_config.xml --execute=true -s 5
```
- Perform TPC-H queries through Trino/Presto. The config of Trino/Presto is config/trino/sample_chbenchmark_config.xml,
The param "terminals" is the query parallelism. "works.work.time" is the
duration to run TPC-H query. The shell is
```
java -jar lakehouse-benchmark.jar -b chbenchmarkForTrino -c config/trino/trino_chbenchmark_config.xml --create=false --load=false --execute=true
```
Notices:
1. Trino for Arctic and Delta-Lake, Presto for Hudi.
2. Need java 17
3. Many table will with suffix like "oorder_rt, oorder_ro, oorder#base", User can set "export tpcc_name_suffix=_rt" to config suffix.
4. Presto jdbc client need two PR [Allow committing empty transaction](https://github.com/prestodb/presto/pull/18136), [Allow AutoCommit](https://github.com/prestodb/presto/pull/18135)
We supply a can use client in presto-client/ dir, You need to modify and compile code by yourself when you want to use other version
5. The config trino/trino_chbenchmark_config.xml is for trino, If you use presto you need to use trino/presto_chbenchmark_config.xml:
```
java -jar lakehouse-benchmark.jar -b chbenchmarkForTrino -c config/trino/presto_chbenchmark_config.xml --create=false --load=false --execute=true
```
## How to Build
Run the following command to build the distribution:
```bash
./mvnw clean package
```
The following files will be placed in the `./target` folder, `lakehouse-benchmark-x.y.z.tar` and `lakehouse-benchmark-x.y.z.zip`. Pick your poison.
The resulting `.zip` or `.tar` file will have the following contents:
```text
├── CONTRIBUTORS.md
├── LICENSE
├── README.md
├── config
│ ├── cockroachdb
│ │ ├── sample_auctionmark_config.xml
│ │ ├── sample_chbenchmark_config.xml
│ │ ├── sample_epinions_config.xml
│ │ ├── sample_noop_config.xml
│ │ ├── sample_resourcestresser_config.xml
│ │ ├── sample_seats_config.xml
│ │ ├── sample_sibench_config.xml
│ │ ├── sample_smallbank_config.xml
│ │ ├── sample_tatp_config.xml
│ │ ├── sample_tpcc_config.xml
│ │ ├── sample_tpcds_config.xml
│ │ ├── sample_tpch_config.xml
│ │ ├── sample_twitter_config.xml
│ │ ├── sample_voter_config.xml
│ │ ├── sample_wikipedia_config.xml
│ │ └── sample_ycsb_config.xml
│ ├── plugin.xml
│ └── postgres
│ └── ...
├── data
│ ├── tpch
│ │ ├── customer.tbl
│ │ ├── lineitem.tbl
│ │ ├── nation.tbl
│ │ ├── orders.tbl
│ │ ├── part.tbl
│ │ ├── partsupp.tbl
│ │ ├── region.tbl
│ │ └── supplier.tbl
│ └── twitter
│ ├── twitter_tweetids.txt
│ └── twitter_user_ids.txt
├── lib
│ └── ...
└── lakehouse-benchmark.jar
```
## How to Run
Once you build and unpack the distribution, you can run `lakehouse-benchmark` just like any other executable jar. The following examples assume you are running from the root of the expanded `.zip` or `.tgz` distribution. If you attempt to run `oltpbench2` outside of the distribution structure you may encounter a variety of errors including `java.lang.NoClassDefFoundError`.
To bring up help contents:
```bash
java -jar lakehouse-benchmark.jar -h
```
To execute the `tpcc` benchmark:
```bash
java -jar lakehouse-benchmark.jar -b tpcc -c config/cockroachdb/sample_tpcc_config.xml --create=true --load=true --execute=true -s 5
```
For composite benchmarks like `chbenchmark`, which require multiple schemas to be created and loaded, you can provide a comma separated list: `
```bash
java -jar lakehouse-benchmark.jar -b tpcc,chbenchmark -c config/cockroachdb/sample_chbenchmark_config.xml --create=true --load=true --execute=true -s 5
```
The following options are provided:
```text
usage: lakehouse-benchmark
-b,--bench [required] Benchmark class. Currently
supported: [tpcc, tpch, tatp, wikipedia,
resourcestresser, twitter, epinions, ycsb,
seats, auctionmark, chbenchmark, voter,
sibench, noop, smallbank, hyadapt]
-c,--config [required] Workload configuration file
--clear Clear all records in the database for this
benchmark
--create Initialize the database for this benchmark
-d,--directory Base directory for the result files,
default is current directory
--dialects-export Export benchmark SQL to a dialects file
--execute Execute the benchmark workload
-h,--help Print this help
-im,--interval-monitor Throughput Monitoring Interval in
milliseconds
--load Load data using the benchmark's data
loader
-s,--sample Sampling window
```
## How to see Postgres Driver logging
To enable logging for the PostgreSQL JDBC driver, add the following JVM property when starting...
```
-Djava.util.logging.config.file=src/main/resources/logging.properties
```
To modify the logging level you can update `logging.properties`
## How to Release
```
./mvnw -B release:prepare
./mvnw -B release:perform
```