{"id":20572016,"url":"https://github.com/netease/lakehouse-benchmark","last_synced_at":"2026-03-08T16:01:40.470Z","repository":{"id":146657174,"uuid":"547774445","full_name":"NetEase/lakehouse-benchmark","owner":"NetEase","description":"A benchmark tool for lakehouses.","archived":false,"fork":false,"pushed_at":"2023-03-12T10:01:27.000Z","size":119181,"stargazers_count":11,"open_issues_count":7,"forks_count":4,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-04-14T17:07:42.779Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NetEase.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-08T09:08:45.000Z","updated_at":"2024-03-31T14:26:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"c8cf8482-b79a-455d-a52d-5b00147800c6","html_url":"https://github.com/NetEase/lakehouse-benchmark","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NetEase%2Flakehouse-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NetEase%2Flakehouse-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NetEase%2Flakehouse-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NetEase%2Flakehouse-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NetEase","download_url":"https://codeload.github.com/NetEase/lakehouse-benchmark/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248923765,"owners_count":21183953,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-16T05:18:03.215Z","updated_at":"2026-03-08T16:01:40.422Z","avatar_url":"https://github.com/NetEase.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Ch-Benchmark for Data-Lake\nBase on https://github.com/timveil-cockroach/oltpbench with a focus on chbenchmark for data lake. Support Trino and Presto.\n## Data Lake Ch-Benchmarks\n![design](benchmark-design.png)\n\n- Generate the initial data set to mysql. The cofig of mysql is config/mysql/sample_chbenchmark_config.xml. User need to\n  modify config. The param \"scalefactor\" is the number of  warehouses to determine the size of data. The shell to generate\n  data is \n  ```\n  java -jar lakehouse-benchmark.jar -b tpcc,chbenchmark -c config/mysql/sample_chbenchmark_config.xml --create=true --load=true\n  ```\n- Synchronize the static data from mysql to data lake through flink CDC tools [cdc-porject]()\n- Turn on the TPC-C and generate incremental data to mysql. Shell is \n  ```\n  java -jar lakehouse-benchmark.jar -b tpcc,chbenchmark -c config/mysql/sample_chbenchmark_config.xml --execute=true -s 5\n  ```\n- Perform TPC-H queries through Trino/Presto. The config of Trino/Presto is config/trino/sample_chbenchmark_config.xml, \n  The param \"terminals\" is the query parallelism. \"works.work.time\" is the \n  duration to run TPC-H query. The shell is\n  ```\n  java -jar lakehouse-benchmark.jar -b chbenchmarkForTrino -c config/trino/trino_chbenchmark_config.xml --create=false --load=false --execute=true\n  ```\n\nNotices:\n1. Trino for Arctic and Delta-Lake, Presto for Hudi.\n2. Need java 17\n3. Many table will with suffix like \"oorder_rt, oorder_ro, oorder#base\", User can set \"export tpcc_name_suffix=_rt\" to config suffix. \n4. Presto jdbc client need two PR [Allow committing empty transaction](https://github.com/prestodb/presto/pull/18136), [Allow AutoCommit](https://github.com/prestodb/presto/pull/18135)\n   We supply a can use client in presto-client/ dir, You need to modify and compile code by yourself when you want to use other version\n5. The config trino/trino_chbenchmark_config.xml is for trino, If you use presto you need to use trino/presto_chbenchmark_config.xml:\n   ```\n   java -jar lakehouse-benchmark.jar -b chbenchmarkForTrino -c config/trino/presto_chbenchmark_config.xml --create=false --load=false --execute=true\n   ```\n\n\n## How to Build\nRun the following command to build the distribution:\n```bash\n./mvnw clean package\n```\n\nThe following files will be placed in the `./target` folder, `lakehouse-benchmark-x.y.z.tar` and `lakehouse-benchmark-x.y.z.zip`.  Pick your poison.\n\nThe resulting `.zip` or `.tar` file will have the following contents: \n\n```text\n├── CONTRIBUTORS.md\n├── LICENSE\n├── README.md\n├── config\n│   ├── cockroachdb\n│   │   ├── sample_auctionmark_config.xml\n│   │   ├── sample_chbenchmark_config.xml\n│   │   ├── sample_epinions_config.xml\n│   │   ├── sample_noop_config.xml\n│   │   ├── sample_resourcestresser_config.xml\n│   │   ├── sample_seats_config.xml\n│   │   ├── sample_sibench_config.xml\n│   │   ├── sample_smallbank_config.xml\n│   │   ├── sample_tatp_config.xml\n│   │   ├── sample_tpcc_config.xml\n│   │   ├── sample_tpcds_config.xml\n│   │   ├── sample_tpch_config.xml\n│   │   ├── sample_twitter_config.xml\n│   │   ├── sample_voter_config.xml\n│   │   ├── sample_wikipedia_config.xml\n│   │   └── sample_ycsb_config.xml\n│   ├── plugin.xml\n│   └── postgres\n│       └── ...\n├── data\n│   ├── tpch\n│   │   ├── customer.tbl\n│   │   ├── lineitem.tbl\n│   │   ├── nation.tbl\n│   │   ├── orders.tbl\n│   │   ├── part.tbl\n│   │   ├── partsupp.tbl\n│   │   ├── region.tbl\n│   │   └── supplier.tbl\n│   └── twitter\n│       ├── twitter_tweetids.txt\n│       └── twitter_user_ids.txt\n├── lib\n│   └── ...\n└── lakehouse-benchmark.jar\n```\n\n## How to Run\nOnce you build and unpack the distribution, you can run `lakehouse-benchmark` just like any other executable jar.  The following examples assume you are running from the root of the expanded `.zip` or `.tgz` distribution.  If you attempt to run `oltpbench2` outside of the distribution structure you may encounter a variety of errors including `java.lang.NoClassDefFoundError`.\n\nTo bring up help contents:\n```bash\njava -jar lakehouse-benchmark.jar -h\n```\n\nTo execute the `tpcc` benchmark:\n```bash\njava -jar lakehouse-benchmark.jar -b tpcc -c config/cockroachdb/sample_tpcc_config.xml --create=true --load=true --execute=true -s 5\n```\n\nFor composite benchmarks like `chbenchmark`, which require multiple schemas to be created and loaded, you can provide a comma separated list: `\n```bash\njava -jar lakehouse-benchmark.jar -b tpcc,chbenchmark -c config/cockroachdb/sample_chbenchmark_config.xml --create=true --load=true --execute=true -s 5\n```\n\nThe following options are provided:\n\n```text\nusage: lakehouse-benchmark\n -b,--bench \u003carg\u003e               [required] Benchmark class. Currently\n                                supported: [tpcc, tpch, tatp, wikipedia,\n                                resourcestresser, twitter, epinions, ycsb,\n                                seats, auctionmark, chbenchmark, voter,\n                                sibench, noop, smallbank, hyadapt]\n -c,--config \u003carg\u003e              [required] Workload configuration file\n    --clear \u003carg\u003e               Clear all records in the database for this\n                                benchmark\n    --create \u003carg\u003e              Initialize the database for this benchmark\n -d,--directory \u003carg\u003e           Base directory for the result files,\n                                default is current directory\n    --dialects-export \u003carg\u003e     Export benchmark SQL to a dialects file\n    --execute \u003carg\u003e             Execute the benchmark workload\n -h,--help                      Print this help\n -im,--interval-monitor \u003carg\u003e   Throughput Monitoring Interval in\n                                milliseconds\n    --load \u003carg\u003e                Load data using the benchmark's data\n                                loader\n -s,--sample \u003carg\u003e              Sampling window\n```\n\n## How to see Postgres Driver logging\nTo enable logging for the PostgreSQL JDBC driver, add the following JVM property when starting...\n```\n-Djava.util.logging.config.file=src/main/resources/logging.properties\n```\nTo modify the logging level you can update `logging.properties`\n\n## How to Release\n```\n./mvnw -B release:prepare\n./mvnw -B release:perform\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnetease%2Flakehouse-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnetease%2Flakehouse-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnetease%2Flakehouse-benchmark/lists"}