https://github.com/astrolabsoftware/sparkioref
Benchmark the IO performance of Apache Spark in the context of Astro data
https://github.com/astrolabsoftware/sparkioref
apache-spark benchmark fitsio python scala
Last synced: 2 months ago
JSON representation
Benchmark the IO performance of Apache Spark in the context of Astro data
- Host: GitHub
- URL: https://github.com/astrolabsoftware/sparkioref
- Owner: astrolabsoftware
- License: apache-2.0
- Created: 2018-09-03T07:23:13.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2018-10-01T18:06:53.000Z (over 7 years ago)
- Last Synced: 2025-02-28T07:49:01.250Z (over 1 year ago)
- Topics: apache-spark, benchmark, fitsio, python, scala
- Language: Jupyter Notebook
- Homepage:
- Size: 634 KB
- Stars: 1
- Watchers: 5
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# sparkioref
Benchmark the IO performance of Apache Spark (Scala/Python).
Currently supported: csv/json, parquet, [FITS](https://github.com/astrolabsoftware/spark-fits).
## Run the benchmark
Edit the `run_benchmark.sh` file with your data and cluster configuration, and launch it using
```bash
./run_benchmark.sh
```
## Example
Configuration:
- Spark 2.3.1
- HDFS 2.8.4
- Input dataset: 1,100,000,000 objects (x, y, z)
- 153 cores (9 executors), 300 GB RAM total
- No cache: 100 iterations (data distributed and read)