https://github.com/bobluppes/spark-baseline

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/bobluppes/spark-baseline
Owner: bobluppes
Created: 2020-12-08T20:29:57.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2020-12-08T20:47:23.000Z (over 4 years ago)
Last Synced: 2025-02-10T07:30:35.686Z (3 months ago)
Language: Scala
Size: 1000 Bytes
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: Readme.md

Awesome Lists containing this project

README

# Spark Baseline
Measures the execution times of reading a parquet file and performing a sql query encompassing a filter and an aggregate sum operation.

## Setup
Four parquet files with 10e5, 50e6, 100e6, and 150e6 rows should be present in a datafolder specified by `baseFile`. These filenames should be appended by `10M`, `50M`, `100M`, and `150M` respectively.

TODO: Make base file path input parameter and determine sizes dynamically

## Running
To execute the baseline, run

sbt compile
sbt run

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bobluppes/spark-baseline

Awesome Lists containing this project

README