https://github.com/lsds/imputationboss

Last synced: 9 months ago
JSON representation

Host: GitHub
URL: https://github.com/lsds/imputationboss
Owner: lsds
License: mit
Created: 2024-07-16T15:59:26.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-07-18T08:33:00.000Z (almost 2 years ago)
Last Synced: 2025-04-02T20:18:00.934Z (over 1 year ago)
Language: C++
Size: 9.99 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# BOSS Benchmarks

See [here](./Documentation/CodeStructure.md) for the code structure.

See [here](./Documentation/Specification.md) for the formal specification of the operators.

## Requirements

for compiling BOSS, MonetDB, DuckDB and benchmark code:

```
cmake >= 3.10
clang >= 9.0
libstdc++-dev >= 8.0
git
unzip
```

for generating missing data:
```
python3
pip
```

for Mathematica baseline:
```
install Wolfram Engine
authenticate to Wolfram
```

for Racket baseline:
```
Racket BC (CS racket has a different C API). Racket CS is the default from 8.0 onward so you need to compile it with the right flags
```

compatible (and tested) with:
* Linux Ubuntu 18.04 LTS (Bionic)
* Linux Ubuntu 20.04 LTS (Focal)
* Linux Debian 11 (Bullseye)

\+ compatible with most setup on MacOS (Clang) and Windows 10/11 (MSVC or WSL Clang) with custom adjustments to the instructions below.

## Instructions (for Linux Ubuntu/Debian)

### 1) installing dependencies (if required)

```
> sudo apt update
> sudo apt install cmake git unzip clang-9 libstdc++-8-dev
> sudo apt install python3 python3-pip
```

Note #1:
Debian Bullseye provides only `libstdc++-9-dev` or `libstdc++-10-dev` which can be installed with an alternative command such as `apt install libstdc++-10-dev`.

Note #2:
Installing the default `clang-10` on Ubuntu Focal or `clang-11` on Debian Bullseye with `apt install clang` is a working alternative, but the cmake command below need to be adjusted accordingly.

### 2) configuring and compiling project

```
> mkdir build
> cd build
> cmake -DCMAKE_INSTALL_PREFIX:PATH=.. -DCMAKE_C_COMPILER=clang-9 -DCMAKE_CXX_COMPILER=clang++-9 -DCMAKE_BUILD_TYPE=Release -B. ..
> cd ..
> cmake --build build --target install
```

to compile with Mathematica baseline support,
init the Mathematics CMake submodule with:
```
git submodule init
git submodule update
```
and add this flag to the cmake setup command:
```
-DBUILD_WOLFRAM_ENGINE=ON
```

### 3) Generating TPC-H dataset

(A) for all scale factors (0.001, 0.01, 0.1, 1, 2, 5, 10, 100):
```
> ./generate_tpch_data.sh
```

(B) for up to SF 1 only:
```
> ./generate_tpch_data.sh 0 4
```

### 4) Generating missing data for TPC-H

install python dependencies:
```
> pip install numpy pandas
```

(A) for all scale factors (0.001, 0.01, 0.1, 1, 2, 5, 10, 100):
```
> ./generate_missing_data.sh
```

(B) for up to SF 1 only:
```
> ./generate_missing_data.sh 0 4
```

### 5) Running the TPC-H benchmarks (without imputation)

with BOSS, MonetDB and DuckDB
Queries Q1, Q3, Q6, Q9, Q18
Scale factors 0.001, 0.01, 0.1, 1, 2, 5, 10, 100:

```
> cd bin
> LD_LIBRARY_PATH=../lib ./Benchmarks --library libBulkEngine.so --benchmark_filter="TPCH_Q"
```

### 6) Running the TPC-H benchmarks (with imputation)

with BOSS
Queries Q1, Q3, Q6, Q9, Q18
Scale factors 0.001, 0.01, 0.1, 1, 2, 5, 10, 100:

```
> cd bin
> LD_LIBRARY_PATH=../lib ./Benchmarks --library libBulkEngine.so --benchmark_filter="TPCH_I"
```

### 7) Running the CDC/FCC/ACS imputation benchmarks

with BOSS
CDC dataset (queries Q1 to Q5):

```
> cd bin
> LD_LIBRARY_PATH=../lib ./Benchmarks --library libBulkEngine.so --benchmark_filter="CDC_I"
```

with BOSS
FCC dataset (queries Q6 to Q9):

```
> cd bin
> LD_LIBRARY_PATH=../lib ./Benchmarks --library libBulkEngine.so --benchmark_filter="FCC_I"
```

with BOSS
ACS dataset (column average):

```
> cd bin
> LD_LIBRARY_PATH=../lib ./Benchmarks --library libBulkEngine.so --benchmark_filter="ACS_I"
```

### 8) Running the TPC-H benchmarks with MonetDB baseline

Queries Q1, Q3, Q6, Q9, Q18
Scale factors 0.001, 0.01, 0.1, 1, 2, 5, 10, 100:

```
> cd bin
> LD_LIBRARY_PATH=../lib ./Benchmarks --disable-indexes --benchmark_filter="TPCH_Q[0-9]+/MonetDB"
```

### 9) Running the TPC-H benchmarks with DuckDB baseline

Queries Q1, Q3, Q6, Q9, Q18
Scale factors 0.001, 0.01, 0.1, 1, 2, 5, 10, 100:

```
> cd bin
> LD_LIBRARY_PATH=../lib ./Benchmarks --disable-indexes --benchmark_filter="TPCH_Q[0-9]+/DuckDB"
```

### 10) Running the TPC-H benchmarks with Mathematica baseline

Queries Q1, Q3, Q6, Q9, Q18
Scale factors 0.001, 0.01, 0.1, 1, 2, 5, 10, 100:

```
> cd bin
> LD_LIBRARY_PATH=../lib ./Benchmarks --library libWolframEngine.so --benchmark_filter="TPCH_Q"
```

### 11) Running the TPC-H benchmarks with Racket baseline

Queries Q1, Q3, Q6, Q9, Q18
Scale factor 1:

```
racket RacketBaseline/Engine.rkt data/tpch_1000MB RacketBaseline/Q1.rkt
racket RacketBaseline/Engine.rkt data/tpch_1000MB RacketBaseline/Q3.rkt
racket RacketBaseline/Engine.rkt data/tpch_1000MB RacketBaseline/Q6.rkt
racket RacketBaseline/Engine.rkt data/tpch_1000MB RacketBaseline/Q9.rkt
racket RacketBaseline/Engine.rkt data/tpch_1000MB RacketBaseline/Q18.rkt
```

### 12) Running the order preservation indexes benchmark

methods: PartitionIndexes, PartitionIndexesUnrolled, PartitionIndexesUnrolledAndCompressed, TwoPartitionIndexesUnrolled, GlobalIndex, CompressedGlobalIndex

Partition size: 1M

Number of partitions: 4, 16, 64

Zipf skew factors: 0.0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2, 3, 4, 5, 6, 7, 8, 16

```
> cd bin
> ./MicroBenchmarks"
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lsds/imputationboss

Awesome Lists containing this project

README