{"id":25949827,"url":"https://github.com/johansolbakken/benchmark","last_synced_at":"2026-04-30T03:31:58.697Z","repository":{"id":274442609,"uuid":"922916171","full_name":"johansolbakken/benchmark","owner":"johansolbakken","description":"A repository for benchmarking Optimistic Order-Preserving Hash Join (OOHJ) in MySQL using the Join Order Benchmark (JOB). Part of a Master’s thesis project, it includes scripts for setting up the database, running SQL queries, and evaluating performance.","archived":false,"fork":false,"pushed_at":"2025-06-11T11:48:37.000Z","size":136617,"stargazers_count":1,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-03T22:03:29.106Z","etag":null,"topics":["benchmarking","join-order-benchmark","mysql","query-optimization","ruby","sql"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/johansolbakken.png","metadata":{"files":{"readme":"README.org","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-01-27T10:19:20.000Z","updated_at":"2025-06-11T11:48:40.000Z","dependencies_parsed_at":"2025-02-26T16:31:27.214Z","dependency_job_id":"3476b23f-7a11-4133-9fce-400de80dd8f5","html_url":"https://github.com/johansolbakken/benchmark","commit_stats":null,"previous_names":["johansolbakken/benchmark"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/johansolbakken/benchmark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johansolbakken%2Fbenchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johansolbakken%2Fbenchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johansolbakken%2Fbenchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johansolbakken%2Fbenchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/johansolbakken","download_url":"https://codeload.github.com/johansolbakken/benchmark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johansolbakken%2Fbenchmark/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32453746,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T22:27:22.272Z","status":"online","status_checked_at":"2026-04-30T02:00:05.929Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmarking","join-order-benchmark","mysql","query-optimization","ruby","sql"],"created_at":"2025-03-04T12:29:28.437Z","updated_at":"2026-04-30T03:31:57.591Z","avatar_url":"https://github.com/johansolbakken.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"#+title: Benchmarking for Optimistic Order-Preserving Hash Join (OOHJ)\n\n#+BEGIN_QUOTE\n[!WARNING]\nThis software is /not/ intended for production use and is intended for our own private research purposes. Although others may use the code, please be aware that it is prone to change without notice.\n#+END_QUOTE\n\nThis repository is part of our Master's thesis, focused on benchmarking modifications made to MySQL. Our fork of MySQL, which implements the Optimistic Order-Preserving Hash Join (OOHJ) feature, is available at: https://github.com/johansolbakken/mysql-server/tree/oohj/oohj-iterator.\n\nLink to Master Thesis will be added later.\n\n* Requirements\n\n#+begin_src\nbrew install ruby\ngem install pq_query\ngem install terminal-table\nbrew install hyperfine\n#+END_SRC\n\n* JOB from a fresh start\n\nIf MySQL has lost JOB data then do the following.\n\nFirst, make sure MySQL is running. (Tips use =mtr --start-dirty= or =make run-dirty= to avoid resetting MySQL).\n\n#+begin_src\nmake job-dataset            # Download job dataset\nmake job-order-queries      # Convert job queries to ORDER BY\n\n# Ensure MySQL is running\nmake job-setup              # Create databases\nmake inline-local           # Allow inserting data from CSV\nmake job-feed               # Feed data to database\nmake prepare                # Enable hypergraph optimizer and disable stats_auto_recalc and set size-values for join-buffer and innodb-pool\nmake job-analyze            # Analyze job tables\n#+end_src\n\n* Testing the Optimistic Hash Join\n\n1. Start the MySQL server\n2. Fill out config.yaml\n\n#+begin_src shell\nmake test\n#+end_src\n\nThis will compare .sql files in homemade_dataset folder against .expected files in homemade_dataset folder. It will also do other types tests defined in tests.yaml file.\n\nThese tests are passing under our current assumptions. As we improve optimistic order-preserving hash join these tests will fail and we need to take new snapshots of the new expected state.\n\n* Commands\n\n** Download JOB dataset\n\n#+begin_src shell\nmake job-dataset\n#+end_src\n\n** Start MySQL\n\n1. Download and build the MySQL source code.\n2. Start MySQL using MySQL Test Run (MTR), MySQL's testing script:\n\nFirst-time setup:\n\n#+begin_src shell\ncd build # Navigate to the MySQL build folder\n./mysql-test/mtr --start\n#+end_src\n\nFor subsequent runs, use:\n\n#+begin_src shell\n./mysql-test/mtr --start-dirty\n#+end_src\n\n3. Configure the path to the MySQL binary in `config.yaml` (e.g., `build/bin/mysql`).\n\n** config.yaml\n\nCopy the `config.yaml.example` file to `config.yaml` and update it with the path to your MySQL binary.\n\nIt is important that /path/ points to a mysql client executable.\n\n** Setup the Database\n\nThen, initialize the database by creating tables and indexes:\n\n#+begin_src shell\nmake job-setup\n#+end_src\n\n** Load the Data\n\nFeed the downloaded dataset into the database:\n\n#+begin_src shell\nmake job-feed\n#+end_src\n\n** Prepare MySQL environment\n\nThis command sets the environment to ensure the environment is the same for every test.\n\n#+begin_src shell\nruby bin/benchmark.rb --prepare-mysql\n#+end_src\n\n** Run SQL Queries\n\nTo run SQL queries, use the following commands:\n\n- Execute a query:\n#+begin_src shell\nruby bin/benchmark.rb --run ./job/1a.sql\n#+end_src\n\n- Execute a query with EXPLAIN ANALYZE to analyze execution:\n#+begin_src shell\nruby bin/benchmark.rb --run ./job/1a.sql --analyze\n#+end_src\n\n- Execute a query with EXPLAIN FORMAT=TREE to analyze plan:\n#+begin_src shell\nruby bin/benchmark.rb --run ./job/1a.sql --tree\n#+end_src\n\n* tests.yaml\n\nThe YAML configuration is structured under a top-level =tests= key that divides tests into two categories: *diff_test* and *contain_test*. Each category may include a global =setup= section to prepare the environment before running tests, followed by a list of test cases under the =tests= key. In *diff_test*, each test is defined with a =name=, an SQL file specified by the =sql= key, and an =expected= file for output comparison; tests can also have individual /setup/ commands. In *contain_test*, tests may include individual =setup= commands and verify outputs by checking for specific substrings using a =contains= list. To add a new test, choose the appropriate category based on whether you want a full output comparison or substring validation. Then, include any necessary setup commands and define the test with a unique =name=, the path to the SQL file, and either an =expected= file (for *diff_test*) or a =contains= list (for *contain_test*). Note that tests run /sequentially/, so the environment setup for one test may affect subsequent tests.\n\n#+begin_src yaml\ntests:\n  diff_test:\n    setup:\n      - \"ruby ./bin/generate_sort_hashjoin_dataset.rb 10000 10000\"\n      - \"ruby ./bin/benchmark.rb --prepare-mysql\"\n    tests:\n      - name: \"Basic test\"\n        sql: \"./homemade_dataset/homemade.sql\"\n        expected: \"./homemade_dataset/homemade.expected\"\n      - name: \"Disable optimistic hash join\"\n        sql: \"./homemade_dataset/homemade_disabled.sql\"\n        expected: \"./homemade_dataset/homemade_disabled.expected\"\n\n  contain_test:\n    # Global setup is optional here.\n    tests:\n      - name: \"went_on_disk=false, n=100 m=100\"\n        setup:\n          - \"ruby ./bin/generate_sort_hashjoin_dataset.rb 100 100\"\n          - \"ruby ./bin/benchmark.rb --prepare-mysql\"\n        sql: \"./homemade_dataset/went_on_disk.sql\"\n        contains:\n          - \"(optimistic hash join!)\"\n          - \"(went_on_disk=false)\"\n#+end_src\n\n* C++ Debugging Tools\n\n** Header-only Logging File\n\nThe =debug/logger.h=  is a class that can be used to fast log to a file.\n\nUsage:\n\n#+begin_src c++\n#include \"/absolute_path_to_benchmark/debug/logger.h\"\n\nstatic Logger* s_logger = nullptr;\n\nClassToTest::ClassToTest() {\n    s_logger = new Logger(\"~/path_to_output/log.txt\");\n}\n\nvoid ClassToTest::functionToTest() {\n    // Lets write CSV information to the logger.\n    auto\u0026 logger = *s_logger;\n\n    while (someCondition) {\n        logger \u003c\u003c logger.timestamp() \u003c\u003c \",\" this-\u003egetSomeValue() \u003c\u003c \",\";\n        logger \u003c\u003c this-\u003egetState() \u003c\u003c \"\\n\";\n    }\n}\n\n#+end_src\n\nThis class will delete the log-file on construction.\n\nThere is a =timestamp()= function for getting timestamps easily.\n\nCurrently using streams.\n\n* Generate TPC-H for MacOS\n\n#+begin_src shell\npodman run --rm -it \\\n  -v $(pwd):/src \\\n  -w /src \\\n  ubuntu:22.04 \\\n  bash\n# now in podman ubuntu\nsudo apt update \u0026\u0026 sudo apt install -y gcc make ruby bison flex\nruby bin/build-tpc-h.rb\n#+end_src\n\nThis will generate folders:\n- =tpc-h-queries=\n- =tpc-h-ddl=\n- =tpc-h-dataset=\n\n* Generate TPC-DS for MacOS\n\n#+begin_src shell\n# copy the Makefile.suite and add -fcommon to CFLAGS\nCFLAGS = $(BASE_CFLAGS) -D$(OS) $($(OS)_CFLAGS) -fcommon\n\n# Start podman\npodman run --rm -it \\\n  -v $(pwd):/src \\\n  -w /src \\\n  ubuntu:22.04 \\\n  bash\n\n# now in podman ubuntu\nsudo apt update \u0026\u0026 sudo apt install -y gcc make ruby bison flex\nruby bin/build-tpc-ds.rb\n#+end_src\n\n* Join Order Benchmark Commands\n\n** Setup database and indexes in MySQL\n\nRequires MySQL to be running.\n\n#+begin_src shell\nmake job-setup\n#+end_src\n\nTo wipe database and recreate:\n\n#+begin_src shell\nruby bin/job-setup.rb --force\n#+end_src\n\n** Download job dataset\n\nCreates job-dataset folder.\n\n#+begin_src shell\nmake job-dataset\n#+end_src\n\nThe job-dataset folder contains all the data as csv files.\n\nDo this before feeding.\n\n\n** Feed job data\n\nFeed data in job-dataset to MySQL database imdbload.\n\n#+begin_src shell\nmake job-feed\n#+end_src\n\n** Convert queries: remove MIN(...)\n\nThe job files we were provided is altered such that each column is in a MIN aggregate.\n\nWe therefore have created scripts for removing MIN and additionally adding ORDER BY clauses.\n\nTo generate the queries without MIN or ORDER BY:\n\n#+begin_src shell\nmake job-queries\n#+end_src\n\nTo make ordered queries:\n\n#+begin_src shell\nmake job-order-queries\n#+end_src\n\n** Run JOB queries\n\n#+begin_src shell\nmake run-file DATABASE=imdbload FILE=./job-queries/10a.sql\n#+end_src\n\n** Delete JOB artifacts\n\n#+begin_src shell\nmake job-clean\n#+end_src\n\n** Warmup MySQL\n\nEssentially means loading all data into memory.\n\n#+begin_src shell\nmake job-warmup\n#+end_src\n\n* Analyze\n\nTo analyze run the script:\n\n#+begin_src shell\nruby bin/analyze.rb --job\n#+end_src\n\n* Check if any query fails for a database\n#+begin_src shell\nruby ./bin/test-sql-files.rb --folder ./job-queries --database imdbload\n#+end_src\n\n* Run any file\n\n#+begin_src shell\nmake run-file DATABASE=imdbload FILE=./job-queries/10b.sql\n#+end_src\n\n* Homemade Dataset\n\n** Setup database and indexes in MySQL\n\nRequires MySQL to be running.\n\n#+begin_src shell\nmake homemade-setup\n#+end_src\n\nTo wipe database and recreate:\n\n#+begin_src shell\nruby bin/homemade-setup.rb --force\n#+end_src\n\n** Generate dataset\n\nFor instance with table a size and table b size set to 10000. Default for both is 10000.\n\n#+begin_src shell\nmake homemade-dataset TABLE_A_SIZE=10000 TABLE_B_SIZE=10000\n#+end_src\n\n** Feed homemade data\n\n#+begin_src shell\nmake homemade-feed\n#+end_src\n\n** Analyze tables\n\n#+begin_src shell\nmake run-file DATABASE=homemade FILE=./sql/analyze_homemade.sql\n#+end_src\n\n** Count Optimistic Hash Join\n\n`bin/count-ohj.rb`, counts occurrences of \"optimistic hash join\" in SQL execution plans. It works by:\n\n#+begin_src sh\nruby bin/count-ohj.rb [--join-buffer-size SIZE]\n#+end_src\n\nExample (setting join buffer size to 16MB):\n\n#+begin_src sh\nruby bin/count-ohj.rb --join-buffer-size 16777216\n#+end_src\n\n** Export query plan as DOT\n\nGenerates a graphical representation of a query plan from an input SQL file.\n\n*Run with an SQL file:*\n#+begin_src shell\nruby bin/benchmark.rb ./job/1a.sql\n#+end_src\n\n*Display the JSON output:*\n#+begin_src shell\nruby bin/benchmark.rb --show-json ./job/1a.sql\n#+end_src\n\n*Specify a custom output PNG file:*\n#+begin_src shell\nruby bin/benchmark.rb -o custom_plan.png ./job/1a.sql\n#+end_src\n\n*Keep the DOT file:*\n#+begin_src shell\nruby bin/benchmark.rb -c ./job/1a.sql\n#+end_src\n\n*With hints*\n#+begin_src shell\nruby bin/print-plan-as-graphwiz.rb ./job-order-queries/1a.sql -o./1a.png --hint \"/*+ SET_OPTIMISM_FUNC(SIGMOID) */\"\n#+end_src\n\n** Enable and disable indexes for JOB\n\n#+begin_src bash\nmake job-index-enable\nmake job-index-disable\n#+end_src\n\n* TPC-H setup\n\n- There must exist an folder in root of this repo called =tpc-h-tool= with =dbgen= and others.\n\n#+begin_src shell\n# Ensure you have data\nls tpc-h-tool/dbgen/*\n\n# Generate tpc-h data\ncargo run -p tpc-h-datagen\n\nmake experimental-setup      # activate hypergraph optimizer\nmake tpc-h-setup             # will setup both s1 and s10\nmake tpc-h-feed              # will feed both\nmake tpc-h-analyze           # this will analyze both s1 and s10\nruby bin/tpc-h-warmup.rb s1\nrbuy bin/tpc-h-warmup.rb s10\n#+end_src\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohansolbakken%2Fbenchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjohansolbakken%2Fbenchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohansolbakken%2Fbenchmark/lists"}