{"id":19427790,"url":"https://github.com/starrocks/benchmarktool","last_synced_at":"2025-04-24T17:31:50.381Z","repository":{"id":43871951,"uuid":"452267338","full_name":"StarRocks/BenchmarkTool","owner":"StarRocks","description":"Benchmark tool to test StarRocks using several benchmarks.","archived":false,"fork":false,"pushed_at":"2022-02-15T03:32:33.000Z","size":397,"stargazers_count":12,"open_issues_count":0,"forks_count":12,"subscribers_count":20,"default_branch":"main","last_synced_at":"2025-04-03T09:03:00.120Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/StarRocks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-01-26T12:30:28.000Z","updated_at":"2024-10-23T14:18:12.000Z","dependencies_parsed_at":"2022-08-26T13:23:20.553Z","dependency_job_id":null,"html_url":"https://github.com/StarRocks/BenchmarkTool","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StarRocks%2FBenchmarkTool","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StarRocks%2FBenchmarkTool/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StarRocks%2FBenchmarkTool/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StarRocks%2FBenchmarkTool/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/StarRocks","download_url":"https://codeload.github.com/StarRocks/BenchmarkTool/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250674383,"owners_count":21469214,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T14:12:53.739Z","updated_at":"2025-04-24T17:31:49.746Z","avatar_url":"https://github.com/StarRocks.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Benchmark tool\n\nBenchmark tool to test StarRocks using several benchmarks.\n\n## Tool description\n\n### Requirements\n\n* **python3**\n* python libraries: **pymysql**\n    \u003e Use command `pip3 install pymysql` to install.\n    \u003e\n    \u003e Use command `yum install python-pip` to install pip3 if the machine does not have **pip3**.\n* **mysqlslap**: This benchmark tool uses mysqlslap to test the StarRocks's performance\n    \u003e Use command `yum install mysql` to install mysqlslap.\n\n### Project directories\n\n* `bin`: directory for some scripts\n* `conf`: directory for conf files\n* `result`: directory to store query results\n* `sql`: directory for all SQL files, there will be some sub-directories for different benchmarks\n  * `tpch`: tpch benchmark SQL files including `create`, `load` and `query`\n  * `ssb`: ssb benchmark SQL files including `create`, `load` and `query`\n* `src`: directory for tool codes\n* `thirdparty`: directory to store third party modules, such as dbgen for tpch, ssb\n\n### Scripts\n\nAll the scripts under `bin` directory:\n\n* `gen_data`: tools to gen data like tpch, ssb, ...\n  * **gen-tpch.sh**: script to gen tpch data\n  * **gen-ssb.sh**: script to gen ssb data\n* **create_db_table.sh**: script to create tables\n* **stream_load.sh**: script to load data into StarRocks using `stream load`\n* **broker_load.sh**: script to load data into StarRocks using `broker load` (not finished yet)\n* **flat_insert.sh**: script to load data into StarRocks using `insert into` (not finished yet)\n* **benchmark.sh**: script to test the performance or check the result correctness\n\n### Test steps\n\n1. Make sure the `Requirements` finished.\n2. Compile the dbgen tool under `thirdparty` directory that you want.\n    * tpch's `dbgen` binary is directly provided, we will add `Makefile` later.\n3. Make sure a StarRocks cluster is ready,\n   and you know the configuration that will be used in `conf/starrocks.conf` file.\n4. Choose the benchmark you want, follow the specified steps bellow.\n\n## SSB (Star Schema benchmark)\n\n\u003e not finished yet\n\n## TPC-H benchmark\n\n1. Configure the StarRocks cluster info in file `conf/starrocks.conf`\n\n    You should check and modify the IP, port, database info if needed.\n\n    You can change other parameters if know them well.\n\n2. Create tables\n\n    ```bash\n    # create tables for 100GB data\n    ./bin/create_db_table.sh ddl_100\n    ```\n\n    You can specify other directory name (under sql/tpch directory)\n    in which there are `create table` SQL files.\n    There are some subtle differences between the same table's SQL files under different directories,\n    like: different bucket size, different column order, which are for performance only.\n    You can directly use `create table` SQL files under ddl_100 for smaller data, such as 1GB.\n\n3. Generate data\n\n    ```bash\n    # generate 100GB data under the `data_100` directory\n    ./bin/gen_data/gen-tpch.sh 100 data_100\n\n    # generate 1TB data under the `data_1T` directory\n    ./bin/gen_data/gen-tpch.sh 1000 data_1T\n    ```\n\n    You can change `100` to `1` to gen 1G data quickly for test.\n    Such as: `./bin/gen_data/gen-tpch.sh 1 data_1G`\n\n    You can use either absolute or relative directory path to store generated data.\n    Such as: `./bin/gen_data/gen-tpch.sh 1 data/data_1G-2`\n\n    \u003e This *gen-tpch.sh* script just wraps the tpch-dbgen tool for convenience.\n    \u003e\n    \u003e You can run command `make` under `thirdparty/tpch-dbgen` directory to gen `dbgen` binary, where the dbgen source version is 3.0.0 downloaded from [tpc.org](http://tpc.org/tpc_documents_current_versions/current_specifications5.asp) .\n    \u003e\n    \u003e You can also download the latest version of **tpch-dbgen** tool from [tpc.org](http://www.tpc.org) directly by yourself, or see more information from other web pages, like [Data generation tool](https://docs.deistercloud.com/content/Databases.30/TPCH%20Benchmark.90/Data%20generation%20tool.30.xml), etc.\n\n4. Load data using stream load\n\n    ```bash\n    # load 100GB data into StarRocks\n    ./bin/stream_load.sh data_100\n    ```\n\n    `data_100` is the directory path with data you generated.\n    You can either specify a absolute path or a relative path.\n\n5. Test the performance\n\n    ```bash\n    ./bin/benchmark.sh -v -p -d tpch\n    ```\n\n    See more information with `./bin/benchmark.sh -h`\n\n6. Check the result\n\n    ```bash\n    ./bin/benchmark.sh -v -c -d tpch\n    ```\n\n    Recently, you can check the result in the logs.\n    (The expected result hasn't been put in the `result` directory yet)\n\n## Project directories in detail\n\nIt's for developers or testers.\nYou can add in more benchmarks, including **data gen** tool, **SQL query** file, etc.\n\n### SQL directory\n\nAll SQL files are under the `sql` directory.\nThere are several sub-directories for different benchmarks, one benchmark a directory.\nSuch as `ssb`, `tpch`, `tpcds`, etc.\n\nUnder each benchmark directory (just take the `tpch` directory for an example), there are serveral kinds of directories:\n\n* `ddl*`: There is usually a `***_create.sql` file to create all the tables.\n    Different directories are for different data size with some different `create table` properties.\n\n    See detail info in [tpch-README](sql/tpch/README.md)\n\n* `query`: There may be several sub-directories for different query purposes.\n\n    Take the `ssb` benchmark for an example, there are `ssb`, `ssb-flat`, `ssb-low_cardinality` sub-directories,\n    where the `ssb-flat` is for queries on the flatten table `lineorder_flat`,\n    and the `ssb-low_cardinality` is for queries in **low cardinality** situation.\n\n* `insert`: We can insert data into a flatten **wide table** from other tables, mainly for `ssb` benchmark recently.\n\n### Thirdparty directory\n\nTools to generate data for different benchmarks.\nA simple copy for each.\n\n\u003e Add links here (TODO)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstarrocks%2Fbenchmarktool","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstarrocks%2Fbenchmarktool","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstarrocks%2Fbenchmarktool/lists"}