https://github.com/astrolabsoftware/sparkioref

Benchmark the IO performance of Apache Spark in the context of Astro data
https://github.com/astrolabsoftware/sparkioref

apache-spark benchmark fitsio python scala

Last synced: 2 months ago
JSON representation

Benchmark the IO performance of Apache Spark in the context of Astro data

Host: GitHub
URL: https://github.com/astrolabsoftware/sparkioref
Owner: astrolabsoftware
License: apache-2.0
Created: 2018-09-03T07:23:13.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2018-10-01T18:06:53.000Z (over 7 years ago)
Last Synced: 2025-02-28T07:49:01.250Z (over 1 year ago)
Topics: apache-spark, benchmark, fitsio, python, scala
Language: Jupyter Notebook
Homepage:
Size: 634 KB
Stars: 1
Watchers: 5
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# sparkioref

Benchmark the IO performance of Apache Spark (Scala/Python).
Currently supported: csv/json, parquet, [FITS](https://github.com/astrolabsoftware/spark-fits).

## Run the benchmark

Edit the `run_benchmark.sh` file with your data and cluster configuration, and launch it using

```bash
./run_benchmark.sh
```

## Example

Configuration:
- Spark 2.3.1
- HDFS 2.8.4
- Input dataset: 1,100,000,000 objects (x, y, z)
- 153 cores (9 executors), 300 GB RAM total
- No cache: 100 iterations (data distributed and read)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/astrolabsoftware/sparkioref

Awesome Lists containing this project

README