https://github.com/guanhuawang/benefault

A way for task preemption in Big data analytics platform
https://github.com/guanhuawang/benefault

Last synced: about 1 month ago
JSON representation

A way for task preemption in Big data analytics platform

Host: GitHub
URL: https://github.com/guanhuawang/benefault
Owner: GuanhuaWang
Created: 2016-11-28T21:25:23.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2018-03-28T01:12:18.000Z (about 7 years ago)
Last Synced: 2025-02-08T09:11:34.232Z (3 months ago)
Language: XSLT
Size: 2.01 MB
Stars: 2
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Benefault

[![License](https://img.shields.io/badge/license-BSD-blue.svg)](LICENSE)

A way for task preemption in Big data analytics platform

## What we have done

* a simple shell script for monitoring node's metadata (e.g. disk access, network Tx Rx etc) in a cluster

* read and write for chekcpointing data (note: checkpointRead is private in spark, we need to package function into org.apache.spark)

### We have already done some simulation about the JCT gain we can get using Benefault

The performance gain is 15-30%

### We test latency in varied scenarios

* Measure checkpoint latency using Spark

* Word Count with checkpointing

* Sorting with checkpoint

* GroupByKey with Checkpointing

* DecisionTree with periodic Checkpointing

* We now design schemes for evaluate best gain we can get using Benefault

* find sweet spot for whether kill or preempt

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/guanhuawang/benefault

Awesome Lists containing this project

README