https://github.com/guanhuawang/benefault
A way for task preemption in Big data analytics platform
https://github.com/guanhuawang/benefault
Last synced: about 1 month ago
JSON representation
A way for task preemption in Big data analytics platform
- Host: GitHub
- URL: https://github.com/guanhuawang/benefault
- Owner: GuanhuaWang
- Created: 2016-11-28T21:25:23.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-03-28T01:12:18.000Z (about 7 years ago)
- Last Synced: 2025-02-08T09:11:34.232Z (3 months ago)
- Language: XSLT
- Size: 2.01 MB
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Benefault
[](LICENSE)
A way for task preemption in Big data analytics platform
## What we have done
* a simple shell script for monitoring node's metadata (e.g. disk access, network Tx Rx etc) in a cluster
* read and write for chekcpointing data (note: checkpointRead is private in spark, we need to package function into org.apache.spark)### We have already done some simulation about the JCT gain we can get using Benefault
The performance gain is 15-30%### We test latency in varied scenarios
* Measure checkpoint latency using Spark
* Word Count with checkpointing
* Sorting with checkpoint
* GroupByKey with Checkpointing
* DecisionTree with periodic Checkpointing
* We now design schemes for evaluate best gain we can get using Benefault
* find sweet spot for whether kill or preempt