{"id":26947160,"url":"https://github.com/lsds/imputationboss","last_synced_at":"2025-10-07T09:20:08.181Z","repository":{"id":248996730,"uuid":"829528166","full_name":"lsds/ImputationBOSS","owner":"lsds","description":null,"archived":false,"fork":false,"pushed_at":"2024-07-18T08:33:00.000Z","size":10477,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-02T20:18:00.934Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lsds.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-16T15:59:26.000Z","updated_at":"2024-10-11T12:08:34.000Z","dependencies_parsed_at":"2024-07-18T10:59:09.934Z","dependency_job_id":null,"html_url":"https://github.com/lsds/ImputationBOSS","commit_stats":null,"previous_names":["lsds/imputationboss"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lsds/ImputationBOSS","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsds%2FImputationBOSS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsds%2FImputationBOSS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsds%2FImputationBOSS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsds%2FImputationBOSS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lsds","download_url":"https://codeload.github.com/lsds/ImputationBOSS/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsds%2FImputationBOSS/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278748498,"owners_count":26038926,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-02T20:18:09.877Z","updated_at":"2025-10-07T09:20:08.166Z","avatar_url":"https://github.com/lsds.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# BOSS Benchmarks\n\nSee [here](./Documentation/CodeStructure.md) for the code structure.\n\nSee [here](./Documentation/Specification.md) for the formal specification of the operators.\n\n\n## Requirements\n\nfor compiling BOSS, MonetDB, DuckDB and benchmark code:\n\n```\ncmake \u003e= 3.10\nclang \u003e= 9.0\nlibstdc++-dev \u003e= 8.0\ngit\nunzip\n```\n\nfor generating missing data:\n```\npython3\npip\n```\n\nfor Mathematica baseline:\n```\ninstall Wolfram Engine\nauthenticate to Wolfram\n```\n\nfor Racket baseline:\n```\nRacket BC (CS racket has a different C API). Racket CS is the default from 8.0 onward so you need to compile it with the right flags\n```\n\ncompatible (and tested) with:\n* Linux Ubuntu 18.04 LTS (Bionic)\n* Linux Ubuntu 20.04 LTS (Focal)\n* Linux Debian 11 (Bullseye)\n\n\\+ compatible with most setup on MacOS (Clang) and Windows 10/11 (MSVC or WSL Clang) with custom adjustments to the instructions below.\n\n## Instructions (for Linux Ubuntu/Debian)\n\n### 1) installing dependencies (if required)\n\n```\n\u003e sudo apt update\n\u003e sudo apt install cmake git unzip clang-9 libstdc++-8-dev\n\u003e sudo apt install python3 python3-pip\n```\n\nNote #1:  \nDebian Bullseye provides only `libstdc++-9-dev` or `libstdc++-10-dev` which can be installed with an alternative command such as `apt install libstdc++-10-dev`.\n\nNote #2:  \nInstalling the default `clang-10` on Ubuntu Focal or `clang-11` on Debian Bullseye with `apt install clang` is a working alternative, but the cmake command below need to be adjusted accordingly.\n\n### 2) configuring and compiling project\n\n```\n\u003e mkdir build\n\u003e cd build\n\u003e cmake -DCMAKE_INSTALL_PREFIX:PATH=.. -DCMAKE_C_COMPILER=clang-9   -DCMAKE_CXX_COMPILER=clang++-9 -DCMAKE_BUILD_TYPE=Release -B. ..\n\u003e cd ..\n\u003e cmake --build build --target install\n```\n\nto compile with Mathematica baseline support,\ninit the Mathematics CMake submodule with:\n```\ngit submodule init\ngit submodule update\n```\nand add this flag to the cmake setup command:\n```\n-DBUILD_WOLFRAM_ENGINE=ON\n```\n\n### 3) Generating TPC-H dataset\n\n(A) for all scale factors (0.001, 0.01, 0.1, 1, 2, 5, 10, 100):\n```\n\u003e ./generate_tpch_data.sh\n```\n\n(B) for up to SF 1 only:\n```\n\u003e ./generate_tpch_data.sh 0 4\n```\n\n### 4) Generating missing data for TPC-H\n\ninstall python dependencies:\n```\n\u003e pip install numpy pandas\n```\n\n(A) for all scale factors (0.001, 0.01, 0.1, 1, 2, 5, 10, 100):\n```\n\u003e ./generate_missing_data.sh\n```\n\n(B) for up to SF 1 only:\n```\n\u003e ./generate_missing_data.sh 0 4\n```\n\n### 5) Running the TPC-H benchmarks (without imputation)\n\nwith BOSS, MonetDB and DuckDB  \nQueries Q1, Q3, Q6, Q9, Q18  \nScale factors 0.001, 0.01, 0.1, 1, 2, 5, 10, 100:\n\n```\n\u003e cd bin\n\u003e LD_LIBRARY_PATH=../lib ./Benchmarks --library libBulkEngine.so --benchmark_filter=\"TPCH_Q\"\n```\n\n### 6) Running the TPC-H benchmarks (with imputation)\n\nwith BOSS  \nQueries Q1, Q3, Q6, Q9, Q18  \nScale factors 0.001, 0.01, 0.1, 1, 2, 5, 10, 100:\n\n```\n\u003e cd bin\n\u003e LD_LIBRARY_PATH=../lib ./Benchmarks --library libBulkEngine.so --benchmark_filter=\"TPCH_I\"\n```\n\n### 7) Running the CDC/FCC/ACS imputation benchmarks\n\nwith BOSS  \nCDC dataset (queries Q1 to Q5):\n\n```\n\u003e cd bin\n\u003e LD_LIBRARY_PATH=../lib ./Benchmarks --library libBulkEngine.so --benchmark_filter=\"CDC_I\"\n```\n\nwith BOSS  \nFCC dataset (queries Q6 to Q9):\n\n```\n\u003e cd bin\n\u003e LD_LIBRARY_PATH=../lib ./Benchmarks --library libBulkEngine.so --benchmark_filter=\"FCC_I\"\n```\n\nwith BOSS  \nACS dataset (column average):\n\n```\n\u003e cd bin\n\u003e LD_LIBRARY_PATH=../lib ./Benchmarks --library libBulkEngine.so --benchmark_filter=\"ACS_I\"\n```\n\n### 8) Running the TPC-H benchmarks with MonetDB baseline\n\nQueries Q1, Q3, Q6, Q9, Q18  \nScale factors 0.001, 0.01, 0.1, 1, 2, 5, 10, 100:\n\n```\n\u003e cd bin\n\u003e LD_LIBRARY_PATH=../lib ./Benchmarks --disable-indexes --benchmark_filter=\"TPCH_Q[0-9]+/MonetDB\"\n```\n\n### 9) Running the TPC-H benchmarks with DuckDB baseline\n\nQueries Q1, Q3, Q6, Q9, Q18  \nScale factors 0.001, 0.01, 0.1, 1, 2, 5, 10, 100:\n\n```\n\u003e cd bin\n\u003e LD_LIBRARY_PATH=../lib ./Benchmarks --disable-indexes --benchmark_filter=\"TPCH_Q[0-9]+/DuckDB\"\n```\n\n### 10) Running the TPC-H benchmarks with Mathematica baseline\n\nQueries Q1, Q3, Q6, Q9, Q18  \nScale factors 0.001, 0.01, 0.1, 1, 2, 5, 10, 100:\n\n```\n\u003e cd bin\n\u003e LD_LIBRARY_PATH=../lib ./Benchmarks --library libWolframEngine.so --benchmark_filter=\"TPCH_Q\"\n```\n\n### 11) Running the TPC-H benchmarks with Racket baseline\n\nQueries Q1, Q3, Q6, Q9, Q18  \nScale factor 1:\n\n```\nracket RacketBaseline/Engine.rkt data/tpch_1000MB RacketBaseline/Q1.rkt\nracket RacketBaseline/Engine.rkt data/tpch_1000MB RacketBaseline/Q3.rkt\nracket RacketBaseline/Engine.rkt data/tpch_1000MB RacketBaseline/Q6.rkt\nracket RacketBaseline/Engine.rkt data/tpch_1000MB RacketBaseline/Q9.rkt\nracket RacketBaseline/Engine.rkt data/tpch_1000MB RacketBaseline/Q18.rkt\n```\n\n### 12) Running the order preservation indexes benchmark\n\nmethods: PartitionIndexes, PartitionIndexesUnrolled, PartitionIndexesUnrolledAndCompressed, TwoPartitionIndexesUnrolled, GlobalIndex, CompressedGlobalIndex\n\nPartition size: 1M\n\nNumber of partitions: 4, 16, 64\n\nZipf skew factors: 0.0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2, 3, 4, 5, 6, 7, 8, 16\n\n```\n\u003e cd bin\n\u003e ./MicroBenchmarks\"\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flsds%2Fimputationboss","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flsds%2Fimputationboss","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flsds%2Fimputationboss/lists"}