Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/guanhuawang/benefault
A way for task preemption in Big data analytics platform
https://github.com/guanhuawang/benefault
Last synced: 18 days ago
JSON representation
A way for task preemption in Big data analytics platform
- Host: GitHub
- URL: https://github.com/guanhuawang/benefault
- Owner: GuanhuaWang
- Created: 2016-11-28T21:25:23.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-03-28T01:12:18.000Z (almost 7 years ago)
- Last Synced: 2024-10-28T06:11:35.058Z (2 months ago)
- Language: XSLT
- Size: 2.01 MB
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Benefault
[![License](https://img.shields.io/badge/license-BSD-blue.svg)](LICENSE)
A way for task preemption in Big data analytics platform
## What we have done
* a simple shell script for monitoring node's metadata (e.g. disk access, network Tx Rx etc) in a cluster
* read and write for chekcpointing data (note: checkpointRead is private in spark, we need to package function into org.apache.spark)### We have already done some simulation about the JCT gain we can get using Benefault
The performance gain is 15-30%### We test latency in varied scenarios
* Measure checkpoint latency using Spark
* Word Count with checkpointing
* Sorting with checkpoint
* GroupByKey with Checkpointing
* DecisionTree with periodic Checkpointing
* We now design schemes for evaluate best gain we can get using Benefault
* find sweet spot for whether kill or preempt