{"id":14973870,"url":"https://github.com/swarm64/s64da-benchmark-toolkit","last_synced_at":"2025-08-20T16:31:31.593Z","repository":{"id":40967018,"uuid":"249651364","full_name":"swarm64/s64da-benchmark-toolkit","owner":"swarm64","description":"Swarm64 DA Benchmark Toolkit","archived":false,"fork":false,"pushed_at":"2024-11-25T15:20:05.000Z","size":12757,"stargazers_count":30,"open_issues_count":11,"forks_count":13,"subscribers_count":7,"default_branch":"master","last_synced_at":"2024-12-06T02:11:40.669Z","etag":null,"topics":["postgresql","psql"],"latest_commit_sha":null,"homepage":"https://swarm64.com/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/swarm64.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-24T08:23:01.000Z","updated_at":"2024-08-07T17:04:55.000Z","dependencies_parsed_at":"2024-06-22T02:24:58.739Z","dependency_job_id":"7019bb8e-3c66-4af2-aab9-2ca341171fbf","html_url":"https://github.com/swarm64/s64da-benchmark-toolkit","commit_stats":null,"previous_names":[],"tags_count":46,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/swarm64%2Fs64da-benchmark-toolkit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/swarm64%2Fs64da-benchmark-toolkit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/swarm64%2Fs64da-benchmark-toolkit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/swarm64%2Fs64da-benchmark-toolkit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/swarm64","download_url":"https://codeload.github.com/swarm64/s64da-benchmark-toolkit/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230438185,"owners_count":18225870,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["postgresql","psql"],"created_at":"2024-09-24T13:49:36.704Z","updated_at":"2024-12-19T13:07:23.762Z","avatar_url":"https://github.com/swarm64.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Summary\n\nThis toolkit provides methods to execute the TPC-H, TPC-DS, and SSB benchmarks on:\n- PostgreSQL\n- EDB Postgres Advanced Server (EPAS)\n- PostgreSQL with Swarm64 DA\n- EPAS with Swarm64 DA\n\nImportant note: in order to guarantee compatibility between S64 DA and\ns64da-benchmark-toolkit, please check out the Git tag that corresponds to your\nversion of S64 DA. For example, if your version of S64 DA is 5.1.0, clone this\nrepository and run `git checkout v5.1.0` in the the repository’s root folder\nbefore proceeding. For S64 DA versions 4.0.0 and below checkout v4.0.0_and_below.\n\n# Prerequisites\n\n- Python min. 3.6 and pip3\n- For TPC-DS only: Linux package `recode`\n- Install additional packages, for Python 3.6 eg. with:\n  `/usr/bin/python3.6 -m pip install -r requirements.txt`\n- The `psql` PostgreSQL client\n- For loading the data, the database must be accessible with the user\n  `postgres` or `enterprisedb` *without password*\n\n\n# Creating a Database and Loading Data\n\nLoad a database with a dataset. If the database does not exist, it will be\ncreated. If it does exist, it will be deleted and recreated.\n\n    ./prepare_benchmark \\\n        --dsn postgresql://postgres@localhost/\u003ctarget-db\u003e \\\n        --benchmark \u003ctpch|tpcds|ssb|htap\u003e \\\n        --schema=\u003cschema-to-deploy\u003e \\\n        --scale-factor=\u003cscale-factor-to-use\u003e\n\nFor example in order to load tpch dataset using PostgreSQL with Swarm64 DA\nperformance schema:\n\n    ./prepare_benchmark \\\n        --dsn=postgresql://postgres@localhost:5432/example-database \\\n        --benchmark=tpch \\\n        --schema=s64da_performance \\\n        --scale-factor=1000\n\n## Required Parameters\n\nParameter      | Description\n-------------- | -----------\n`dsn`          | The full DSN of the DB to connect to. DSN layout: \u003cpre\u003epostgresql://\u0026lt;user\u0026gt;@\u0026lt;host\u0026gt;:\u0026lt;target-port\u0026gt;/\u0026lt;target-db\u0026gt;\u003c/pre\u003e The port is optional and the default is 5432. Example with port 5444 and use of EPAS: \u003cpre\u003e--dsn postgresql://enterprisedb@localhost:5444/example-database\u003c/pre\u003e\n`benchmark`    | The benchmark to use: `tpch`, `tpcds` or `ssb`\n`schema`       | The schema to deploy. Schemas are directories in the benchmarks/\\\u003cbenchmark\\\u003e/schemas directory. See the table below for the supported schemas.\n`scale-factor` | The scale factor to use, such as `10`, `100` or `1000`.\n\n### Schema Parameter Values\n\nValue                         | Description\n----------------------------- | -----------\n`psql_native`                 | the standard PostgreSQL schema\n`s64da_native`                | as above but with the S64 DA extension with its default feature set enabled\n`s64da_native_enhanced`       | as above but with some of the S64 DA opt-in features enabled, such as `columnstore` index\n`s64da_performance`           | schema that provides the best performance for S64 DA (includes removal of btree indexes, keys, and use of floating point)\n`*_partitioned_id_hashed`     | schema like one of first four schemas but partitioning some tables using hash on main id column of the table\n`*_partitioned_date_week`     | schema like one of first four schemas but partitioning tables with dates by weeks\n\n## Optional Parameters\n\nParameter                      | Description\n------------------------------ | -----------------------------------------------\n`chunks`                       | Chunk large tables into smaller pieces during ingestion. Default: `10`\n`max-jobs`                     | Limit the overall loading parallelism to this amount of jobs. Default: `8`\n`check-diskspace-of-directory` | If flag is present, a disk space check on the passed storage directory will be performed prior to ingestion\n`data-dir`                     | The directory holding the data files to ingest from. Default: none\n`num-partitions`               | The number of partitions for partitioned schemas. Default: none\n`start-date`                   | The data start date for HTAP benchmark\n\nDepending on the scale factor you chose, it might take several hours for the\nscript to finish. After the script creates the database, it loads the data,\ncreates primary keys, foreign keys, and indices. Afterwards, it runs VACUUM\nand ANALYZE.\n\n\n# Runnning a Benchmark\n\nStart a benchmark:\n\n    ./run_benchmark \\\n        --dsn postgresql://postgres@localhost/\u003ctarget-db\u003e \\\n        [--benchmark] \u003ctpch|tpcds|ssb|htap\u003e \\\n        \u003coptional benchmark-specific arguments\u003e\n\nThis runs the benchmark with the default runtime restriction per query.\nSome benchmarks support a `--timeout` parameter to adjust this limit.\n\nNote: The `--benchmark` parameter has been deprecated and is ignored. The name of the benchmark\nshould directly follow the specification of `--dsn`.\n\n## Required Parameters\n\nParameter   | Description\n----------- | -----------------------------------------------\n`dsn`       | The full DSN of the DB to connect to. DSN layout: \u003cpre\u003epostgresql://\u0026lt;user\u0026gt;@\u0026lt;host\u0026gt;:\u0026lt;target-port\u0026gt;/\u0026lt;target-db\u0026gt;\u003c/pre\u003e The port is optional and the default is 5432. Example with port 5444 and use of EPAS: \u003cpre\u003e--dsn postgresql://enterprisedb@localhost:5444/example-database\u003c/pre\u003e\n\u0026nbsp;      | Name of the the benchmark to use: `tpch`, `tpcds`, `ssb`, or `htap`\n\nNote: if you enable correctness checks with the `--check-correctness` flag, the\nparameter `--scale-factor` is required.\n\n## Optional Parameters\n\nParameter                 | Description\n--------------------------|-------------------------------------\n`use-server-side-cursors` | Use server-side cursors for executing the queries.\n\nThe optional parameters differ by benchmark.\nThe ones for TPC-H, TPC-DS, and SSB are described in this section.\nThe parameters supported by HTAP are described in a separate section below.\n\n\nParameter             | Description\n--------------------- | -----------\n`config`              | Path to additional YAML configuration file\n`timeout`             | The maximum time a query may run, such as `30min`\n`streams`             | The number of parallel query streams, can be used for throughput tests.\n`steam-offset`        | With which stream to start if running multiple streams. Defaults: `1`\n`netdata-output-file` | File to write Netdata stats to. Requires `netdata` key to be present in configuration. Default: none\n`output`              | How the results should be formatted. Multiple options possible. Default: none\n`csv-file`            | Path to the CSV file for output if `csv` output is selected. Default: `results.csv` in the current directory.\n`check-correctness`   | Compares each query result with pre-recorded results and stores them in the `query_results` directory. Requires `scale-factor` to be set.\n`scale-factor`        | Scale factor for the correctness comparison. Default: none\n`explain-analyze`     | Whether to run EXPLAIN ANALYZE. Query plans will be saved into the `plans` directory.\n\n# Test Parameterization with Additional YAML Configuration\n\nYou can modify the existing configuration files located under the configs\ndirectory. By default, the toolkit loads loads the respective `default.yaml`\nconfiguration file for each benchmark.\nAlternatively, you can create an additional configuration file to control\ntest execution more granularly. An example YAML file for the TPC-H benchmark\nmight look as follows:\n\n    timeout: 30min\n    ignore:\n      - 18\n      - 20\n      - 21\n\n    dbconfig:\n      max_parallel_workers: 96\n      max_parallel_workers_per_gather: 32\n\nTo use this file, pass the `--config=\u003cpath-to-file\u003e` argument to the test\nexecutor. In this example, the query timeout is set to `30min` and queries 18,\n20, and 21 will not be run. Additionally, the database parameters\n`max_parallel_workers` and `max_parallel_workers_per_gather` will be set to\n`96` and `32`, respectively.\n\nIn order to perform changes to the database configuration, the user needs to\nhave superuser privileges. Any change to the database configuration is applied\nto the whole database system before the benchmark starts. If any change was\napplied manually, the whole database configuration will be reset to that in the\nPostgreSQL configuration file after the benchmark completes.\n\nSome options can be passed on the command line and in a config file.\nAny such option passed on the command line will override the value set in the\nconfig file.\n\nNote: This feature is not supported by HTAP benchmark.\n\n\n# HTAP Benchmark\n\nA mixed workload benchmark implementation using a hybrid TPC-C/TPC-H schema is available in `benchmarks/htap`.\nIt draws inspiration from [sysbench-tpcc](https://github.com/Percona-Lab/sysbench-tpcc), [CHbenCHmark](https://db.in.tum.de/research/projects/CHbenCHmark/?lang=en), and [HTAPBench](https://github.com/faclc4/HTAPBench).\n\nData preparation is identical to the other benchmarks (see \"Creating a database and loading data\"\nabove).\n\nThe HTAP benchmark requires command line arguments that differ from the ones described above.\nThe `--dsn` argument is shared with the other benchmarks and must be provided.\nThe `--benchmark` argument is not used, instead the name `htap` must be provided directly after the `--dsn` argument.\nTo run an HTAP benchmark with 4 OLTP workers and 2 OLAP workers for 30 minutes, run the folowing:\n\n    ./run_benchmark \\\n        --dsn postgresql://postgres@localhost/htap\n        [--benchmark] htap \\\n        --oltp-workers 4 \\\n        --olap-workers 2 \\\n        --duration 1800\n\n## Required Parameters\n\nParameter      | Description\n-------------- | -----------------------------------------------\n`dsn`          | The full DSN of the DB to connect to. DSN layout: \u003cpre\u003epostgresql://\u0026lt;user\u0026gt;@\u0026lt;host\u0026gt;:\u0026lt;target-port\u0026gt;/\u0026lt;target-db\u0026gt;\u003c/pre\u003e The port is optional and the default is 5432. Example with port 5444 and use of EPAS: \u003cpre\u003e--dsn postgresql://enterprisedb@localhost:5444/example-database\u003c/pre\u003e\n`htap`         | Enables parsing of the command line arguments below, do not prefix with `--`.\n\n## Optional Parameters\n\nParameter             | Description\n--------------------- | -----------------------------------------------\n`oltp-workers`        | The number of OLTP workers executing TPC-C transactions (i.e. simulated clients), default: 1\n`olap-workers`        | The number of OLAP workers running modified TPC-H queries, default: 1.\n`duration`            | The number of seconds the benchmark should run for, default: 60 seconds\n`olap-timeout`        | Timeout for OLAP queries in seconds, default: 900\n`dry-run`             | Only generate transactions and queries but don't send them to the DB. Can be useful for measuring script throughput.\n`monitoring-interval` | Number of seconds to wait between updates of the monitoring display, default: 1\n`stats-dsn`           | The DSN to use for collecting statistics into a database. Not defining it will disable statistics collection.\n\n## Monitoring\n\nDuring a benchmark run the HTAP benchmark presents you with the following monitoring screen.\nThis requires a VT100 compatible terminal emulator.\n\n    Detected scale factor: 1                                 \u003c- scale factor, detected by counting the number of warehouses\n    Database statistics collection is disabled.              \u003c- this will be shown if you didn't provide a `stats-dsn`\n    OK  -\u003e Total TX:         87 | Current rate:   58.0 tps   \u003c- the current transaction rate (tansactions per second)\n    ERR -\u003e Total TX:          1 | Current rate:    0.0 tps   \u003c- the current error rate (failed transactions per second)\n\n    Stream   |    1      |    2      |                       \u003c- one column per OLAP stream\n    ----------------------------------\n    Query  1 |           |           |                       \u003c- The state of each query that was\n    Query  2 |      0.43 |           |                          recently run or is running currently.\n    Query  3 |           |      0.72 |                          Also shows when a query timed out or\n    Query  4 |           |           |                          caused an error in the database.\n    Query  5 |           |           |                          For finished queries the runtime is\n    Query  6 |      0.07 |           |                          displayed.\n    Query  7 |           |           |\n    Query  8 |           |           |\n    Query  9 |      0.63 |           |\n    Query 10 |           |           |\n    Query 11 |           |           |\n    Query 12 |           |           |\n    Query 13 |           |           |\n    Query 14 |      0.25 |           |\n    Query 15 |           |           |\n    Query 16 |           |           |\n    Query 17 |  Running  |           |\n    Query 18 |           |  Running  |\n    Query 19 |           |           |\n    Query 20 |      0.45 |           |\n    Query 21 |           |      0.74 |\n    Query 22 |           |           |\n\n    Elapsed: 2 seconds\n\n\n# Testing\n\nFor testing, install the test requirements,\n\n    /usr/bin/python3.6 -m pip install -r requirements-test.txt\n\nand run `python -m pytest tests`. Some benchmark modules provide their own tests. To run, for example\nthe test for the HTAP benchmark, execute `python -m pytest benchmarks/htap/tests`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fswarm64%2Fs64da-benchmark-toolkit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fswarm64%2Fs64da-benchmark-toolkit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fswarm64%2Fs64da-benchmark-toolkit/lists"}