https://github.com/starrocks/benchmarktool
Benchmark tool to test StarRocks using several benchmarks.
https://github.com/starrocks/benchmarktool
Last synced: about 1 year ago
JSON representation
Benchmark tool to test StarRocks using several benchmarks.
- Host: GitHub
- URL: https://github.com/starrocks/benchmarktool
- Owner: StarRocks
- Created: 2022-01-26T12:30:28.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2022-02-15T03:32:33.000Z (over 4 years ago)
- Last Synced: 2025-04-03T09:03:00.120Z (about 1 year ago)
- Language: Python
- Size: 388 KB
- Stars: 12
- Watchers: 20
- Forks: 12
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Benchmark tool
Benchmark tool to test StarRocks using several benchmarks.
## Tool description
### Requirements
* **python3**
* python libraries: **pymysql**
> Use command `pip3 install pymysql` to install.
>
> Use command `yum install python-pip` to install pip3 if the machine does not have **pip3**.
* **mysqlslap**: This benchmark tool uses mysqlslap to test the StarRocks's performance
> Use command `yum install mysql` to install mysqlslap.
### Project directories
* `bin`: directory for some scripts
* `conf`: directory for conf files
* `result`: directory to store query results
* `sql`: directory for all SQL files, there will be some sub-directories for different benchmarks
* `tpch`: tpch benchmark SQL files including `create`, `load` and `query`
* `ssb`: ssb benchmark SQL files including `create`, `load` and `query`
* `src`: directory for tool codes
* `thirdparty`: directory to store third party modules, such as dbgen for tpch, ssb
### Scripts
All the scripts under `bin` directory:
* `gen_data`: tools to gen data like tpch, ssb, ...
* **gen-tpch.sh**: script to gen tpch data
* **gen-ssb.sh**: script to gen ssb data
* **create_db_table.sh**: script to create tables
* **stream_load.sh**: script to load data into StarRocks using `stream load`
* **broker_load.sh**: script to load data into StarRocks using `broker load` (not finished yet)
* **flat_insert.sh**: script to load data into StarRocks using `insert into` (not finished yet)
* **benchmark.sh**: script to test the performance or check the result correctness
### Test steps
1. Make sure the `Requirements` finished.
2. Compile the dbgen tool under `thirdparty` directory that you want.
* tpch's `dbgen` binary is directly provided, we will add `Makefile` later.
3. Make sure a StarRocks cluster is ready,
and you know the configuration that will be used in `conf/starrocks.conf` file.
4. Choose the benchmark you want, follow the specified steps bellow.
## SSB (Star Schema benchmark)
> not finished yet
## TPC-H benchmark
1. Configure the StarRocks cluster info in file `conf/starrocks.conf`
You should check and modify the IP, port, database info if needed.
You can change other parameters if know them well.
2. Create tables
```bash
# create tables for 100GB data
./bin/create_db_table.sh ddl_100
```
You can specify other directory name (under sql/tpch directory)
in which there are `create table` SQL files.
There are some subtle differences between the same table's SQL files under different directories,
like: different bucket size, different column order, which are for performance only.
You can directly use `create table` SQL files under ddl_100 for smaller data, such as 1GB.
3. Generate data
```bash
# generate 100GB data under the `data_100` directory
./bin/gen_data/gen-tpch.sh 100 data_100
# generate 1TB data under the `data_1T` directory
./bin/gen_data/gen-tpch.sh 1000 data_1T
```
You can change `100` to `1` to gen 1G data quickly for test.
Such as: `./bin/gen_data/gen-tpch.sh 1 data_1G`
You can use either absolute or relative directory path to store generated data.
Such as: `./bin/gen_data/gen-tpch.sh 1 data/data_1G-2`
> This *gen-tpch.sh* script just wraps the tpch-dbgen tool for convenience.
>
> You can run command `make` under `thirdparty/tpch-dbgen` directory to gen `dbgen` binary, where the dbgen source version is 3.0.0 downloaded from [tpc.org](http://tpc.org/tpc_documents_current_versions/current_specifications5.asp) .
>
> You can also download the latest version of **tpch-dbgen** tool from [tpc.org](http://www.tpc.org) directly by yourself, or see more information from other web pages, like [Data generation tool](https://docs.deistercloud.com/content/Databases.30/TPCH%20Benchmark.90/Data%20generation%20tool.30.xml), etc.
4. Load data using stream load
```bash
# load 100GB data into StarRocks
./bin/stream_load.sh data_100
```
`data_100` is the directory path with data you generated.
You can either specify a absolute path or a relative path.
5. Test the performance
```bash
./bin/benchmark.sh -v -p -d tpch
```
See more information with `./bin/benchmark.sh -h`
6. Check the result
```bash
./bin/benchmark.sh -v -c -d tpch
```
Recently, you can check the result in the logs.
(The expected result hasn't been put in the `result` directory yet)
## Project directories in detail
It's for developers or testers.
You can add in more benchmarks, including **data gen** tool, **SQL query** file, etc.
### SQL directory
All SQL files are under the `sql` directory.
There are several sub-directories for different benchmarks, one benchmark a directory.
Such as `ssb`, `tpch`, `tpcds`, etc.
Under each benchmark directory (just take the `tpch` directory for an example), there are serveral kinds of directories:
* `ddl*`: There is usually a `***_create.sql` file to create all the tables.
Different directories are for different data size with some different `create table` properties.
See detail info in [tpch-README](sql/tpch/README.md)
* `query`: There may be several sub-directories for different query purposes.
Take the `ssb` benchmark for an example, there are `ssb`, `ssb-flat`, `ssb-low_cardinality` sub-directories,
where the `ssb-flat` is for queries on the flatten table `lineorder_flat`,
and the `ssb-low_cardinality` is for queries in **low cardinality** situation.
* `insert`: We can insert data into a flatten **wide table** from other tables, mainly for `ssb` benchmark recently.
### Thirdparty directory
Tools to generate data for different benchmarks.
A simple copy for each.
> Add links here (TODO)